diff options
Diffstat (limited to 'Doc')
-rw-r--r-- | Doc/tut.tex | 598 | ||||
-rw-r--r-- | Doc/tut/tut.tex | 598 |
2 files changed, 1196 insertions, 0 deletions
diff --git a/Doc/tut.tex b/Doc/tut.tex index 83a0d8b..ac6c5f5 100644 --- a/Doc/tut.tex +++ b/Doc/tut.tex @@ -57,6 +57,7 @@ a more formal definition of the language. \pagenumbering{arabic} + \chapter{Whetting Your Appetite} If you ever wrote a large shell script, you probably know this @@ -141,6 +142,7 @@ should read the Library Reference, which gives complete (though terse) reference material about built-in and standard types, functions and modules that can save you a lot of time when writing Python programs. + \chapter{Using the Python Interpreter} \section{Invoking the Interpreter} @@ -380,6 +382,7 @@ completion mechanism might use the interpreter's symbol table. A command to check (or even suggest) matching parentheses, quotes etc. would also be useful. + \chapter{An Informal Introduction to Python} In the following examples, input and output are distinguished by the @@ -786,6 +789,7 @@ prompt if the last line was not completed. \end{itemize} + \chapter{More Control Flow Tools} Besides the {\tt while} statement just introduced, Python knows the @@ -1065,6 +1069,7 @@ it is equivalent to {\tt result = result + [b]}, but more efficient. \end{itemize} + \chapter{Odds and Ends} This chapter describes some things you've learned about already in @@ -1359,6 +1364,7 @@ to their numeric value, so 0 equals 0.0, etc.% the language. } + \chapter{Modules} If you quit from the Python interpreter and enter it again, the @@ -1581,6 +1587,7 @@ meError', 'SystemError', 'TypeError', 'abs', 'chr', 'dir', 'divmod', 'eval', >>> \end{verbatim}\ecode + \chapter{Output Formatting} So far we've encountered two ways of writing values: {\em expression @@ -1675,6 +1682,7 @@ signs:% >>> \end{verbatim}\ecode + \chapter{Errors and Exceptions} Until now error messages haven't been more than mentioned, but if you @@ -1963,4 +1971,594 @@ handler (and even if another exception occurred in the handler). It is also executed when the {\tt try} statement is left via a {\tt break} or {\tt return} statement. + +\chapter{Classes} + +Python's class mechanism adds classes to the language with a minimum +of new syntax and semantics. It is a mixture of the class mechanisms +found in C++ and Modula-3. As is true for modules, classes in Python +do not put an absolute barrier between definition and user, but rather +rely on the politeness of the user not to ``break into the +definition.'' The most important features of classes are retained +with full power, however: the class inheritance mechanism allows +multiple base classes, a derived class can override any methods of its +base class(es), a method can call the method of a base class with the +same name. Objects can contain an arbitrary amount of private data. + +In C++ terminology, all class members (including the data members) are +{\em public}, and all member functions are {\em virtual}. There are +no special constructors or desctructors. As in Modula-3, there are no +shorthands for referencing the object's members from its methods: the +method function is declared with an explicit first argument +representing the object, which is provided implicitly by the call. As +in Smalltalk, classes themselves are objects, albeit in the wider +sense of the word: in Python, all data types are objects. This +provides semantics for importing and renaming. But, just like in C++ +or Modula-3, built-in types cannot be used as base classes for +extension by the user. Also, like in Modula-3 but unlike in C++, the +built-in operators with special syntax (arithmetic operators, +subscriptong etc.) cannot be redefined for class members. + + +\section{A word about terminology} + +Lacking universally accepted terminology to talk about classes, I'll +make occasional use of Smalltalk and C++ terms. (I'd use Modula-3 +terms, since its object-oriented semantics are closer to those of +Python than C++, but I expect that few readers have heard of it...) + +I also have to warn you that there's a terminological pitfall for +object-oriented readers: the word ``object'' in Python does not +necessarily mean a class instance. Like C++ and Modula-3, and unlike +Smalltalk, not all types in Python are classes: the basic built-in +types like integers and lists aren't, and even somewhat more exotic +types like files aren't. However, {\em all} Python types share a little +bit of common semantics that is best described by using the word +object. + +Objects have individuality, and multiple names (in multiple scopes) +can be bound to the same object. This is known as aliasing in other +languages. This is usually not appreciated on a first glance at +Python, and can be safely ignored when dealing with immutable basic +types (numbers, strings, tuples). However, aliasing has an +(intended!) effect on the semantics of Python code involving mutable +objects such as lists, dictionaries, and most types representing +entities outside the program (files, windows, etc.). This is usually +used to the benefit of the program, since aliases behave like pointers +in some respects. For example, passing an object is cheap since only +a pointer is passed by the implementation; and if a function modifies +an object passed as an argument, the caller will see the change --- this +obviates the need for two different argument passing mechanisms as in +Pascal. + + +\section{Python scopes and name spaces} + +Before introducing classes, I first have to tell you something about +Python's scope rules. Class definitions play some neat tricks with +name spaces, and you need to know how scopes and name spaces work to +fully understand what's going on. Incidentally, knowledge about this +subject is useful for any advanced Python programmer. + +Let's begin with some definitions. + +A {\em name space} is a mapping from names to objects. Most name +spaces are currently implemented as Python dictionaries, but that's +normally not noticeable in any way (except for performance), and it +may change in the future. Examples of name spaces are: the set of +built-in names (functions such as \verb\abs()\, and built-in exception +names); the global names in a module; and the local names in a +function invocation. In a sense the set of attributes of an object +also form a name space. The important things to know about name +spaces is that there is absolutely no relation between names in +different name spaces; for instance, two different modules may both +define a function ``maximize'' without confusion --- users of the +modules must prefix it with the module name. + +By the way, I use the word {\em attribute} for any name following a +dot --- for example, in the expression \verb\z.real\, \verb\real\ is +an attribute of the object \verb\z\. Strictly speaking, references to +names in modules are attribute references: in the expression +\verb\modname.funcname\, \verb\modname\ is a module object and +\verb\funcname\ is an attribute of it. In this case there happens to +be a straightforward mapping between the module's attributes and the +global names defined in the module: they share the same name space!% +\footnote{ + Except for one thing. Module objects have a secret read-only + attribute called {\tt __dict__} which returns the dictionary + used to implement the module's name space; the name + {\tt __dict__} is an attribute but not a global name. + Obviously, using this violates the abstraction of name space + implementation, and should be restricted to things like + post-mortem debuggers... +} + +Attributes may be read-only or writable. In the latter case, +assignment to attributes is possible. Module attributes are writable: +you can write \verb\modname.the_answer = 42\. Writable attributes may +also be deleted with the del statement, e.g. +\verb\del modname.the_answer\. + +Name spaces are created at different moments and have different +lifetimes. The name space containing the built-in names is created +when the Python interpreter starts up, and is never deleted. The +global name space for a module is created when the module definition +is read in; normally, module name spaces also last until the +interpreter quits. The statements executed by the top-level +invocation of the interpreter, either read from a script file or +interactively, are considered part of a module called \verb\__main__\, +so they have their own global name space. (The built-in names +actually also live in a module; this is called \verb\builtin\, +although it should really have been called \verb\__builtin__\.) + +The local name space for a function is created when the function is +called, and deleted when the function returns or raises an exception +that is not handled within the function. (Actually, forgetting would +be a better way to describe what actually happens.) Of course, +recursive invocations each have their own local name space. + +A {\em scope} is a textual region of a Python program where a name space +is directly accessible. ``Directly accessible'' here means that an +unqualified reference to a name attempts to find the name in the name +space. + +Although scopes are determined statically, they are used dynamically. +At any time during execution, exactly three nested scopes are in use +(i.e., exactly three name spaces are directly accessible): the +innermost scope, which is searched first, contains the local names, +the middle scope, searched next, contains the current module's global +names, and the outermost scope (searched last) is the name space +containing built-in names. + +Usually, the local scope references the local names of the (textually) +current function. Outside functions, the the local scope references +the same name space as the global scope: the module's name space. +Class definitions place yet another name space in the local scope. + +It is important to realize that scopes are determined textually: the +global scope of a function defined in a module is that module's name +space, no matter from where or by what alias the function is called. +On the other hand, the actual search for names is done dynamically, at +run time --- however, the the language definition is evolving towards +static name resolution, at ``compile'' time, so don't rely on dynamic +name resolution! (In fact, local variables are already determined +statically.) + +A special quirk of Python is that assignments always go into the +innermost scope. Assignments do not copy data --- they just +bind names to objects. The same is true for deletions: the statement +\verb\del x\ removes the binding of x from the name space referenced by the +local scope. In fact, all operations that introduce new names use the +local scope: in particular, import statements and function definitions +bind the module or function name in the local scope. (The +\verb\global\ statement can be used to indicate that particular +variables live in the global scope.) + + +\section{A first look at classes} + +Classes introduce a little bit of new syntax, three new object types, +and some new semantics. + + +\subsection{Class definition syntax} + +The simplest form of class definition looks like this: + +\begin{verbatim} + class ClassName: + <statement-1> + . + . + . + <statement-N> +\end{verbatim} + +Class definitions, like function definitions (\verb\def\ statements) +must be executed before they have any effect. (You could conceivably +place a class definition in a branch of an \verb\if\ statement, or +inside a function.) + +In practice, the statements inside a class definition will usually be +function definitions, but other statements are allowed, and sometimes +useful --- we'll come back to this later. The function definitions +inside a class normally have a peculiar form of argument list, +dictated by the calling conventions for methods --- again, this is +explained later. + +When a class definition is entered, a new name space is created, and +used as the local scope --- thus, all assignments to local variables +go into this new name space. In particular, function definitions bind +the name of the new function here. + +When a class definition is left normally (via the end), a {\em class +object} is created. This is basically a wrapper around the contents +of the name space created by the class definition; we'll learn more +about class objects in the next section. The original local scope +(the one in effect just before the class definitions was entered) is +reinstated, and the class object is bound here to class name given in +the class definition header (ClassName in the example). + + +\subsection{Class objects} + +Class objects support two kinds of operations: attribute references +and instantiation. + +{\em Attribute references} use the standard syntax used for all +attribute references in Python: \verb\obj.name\. Valid attribute +names are all the names that were in the class's name space when the +class object was created. So, if the class definition looked like +this: + +\begin{verbatim} + class MyClass: + i = 12345 + def f(x): + return 'hello world' +\end{verbatim} + +then \verb\MyClass.i\ and \verb\MyClass.f\ are valid attribute +references, returning an integer and a function object, respectively. +Class attributes can also be assigned to, so you can change the +value of \verb\MyClass.i\ by assignment. + +Class {\em instantiation} uses function notation. Just pretend that +the class object is a parameterless function that returns a new +instance of the class. For example, (assuming the above class): + +\begin{verbatim} + x = MyClass() +\end{verbatim} + +creates a new {\em instance} of the class and assigns this object to +the local variable \verb\x\. + + +\subsection{Instance objects} + +Now what can we do with instance objects? The only operations +understood by instance objects are attribute references. There are +two kinds of valid attribute names. + +The first I'll call {\em data attributes}. These correspond to +``instance variables'' in Smalltalk, and to ``data members'' in C++. +Data attributes need not be declared; like local variables, they +spring into existence when they are first assigned to. For example, +if \verb\x\ in the instance of \verb\MyClass\ created above, the +following piece of code will print the value 16, without leaving a +trace: + +\begin{verbatim} + x.counter = 1 + while x.counter < 10: + x.counter = x.counter * 2 + print x.counter + del x.counter +\end{verbatim} + +The second kind of attribute references understood by instance objects +are {\em methods}. A method is a function that ``belongs to'' an +object. (In Python, the term method is not unique to class instances: +other object types can have methods as well, e.g., list objects have +methods called append, insert, remove, sort, and so on. However, +below, we'll use the term method exclusively to mean methods of class +instance objects, unless explicitly stated otherwise.) + +Valid method names of an instance object depend on its class. By +definition, all attributes of a class that are (user-defined) function +objects define corresponding methods of its instances. So in our +example, \verb\x.f\ is a valid method reference, since +\verb\MyClass.f\ is a function, but \verb\x.i\ is not, since +\verb\MyClass.i\ is not. But \verb\x.f\ is not the +same thing as \verb\MyClass.f\ --- it is a {\em method object}, not a +function object. + + +\subsection{Method objects} + +Usually, a method is called immediately, e.g.: + +\begin{verbatim} + x.f() +\end{verbatim} + +In our example, this will return the string \verb\'hello world'\. +However, it is not necessary to call a method right away: \verb\x.f\ +is a method object, and can be stored away and called at a later +moment, for example: + +\begin{verbatim} + xf = x.f + while 1: + print xf() +\end{verbatim} + +will continue to print \verb\hello world\ until the end of time. + +What exactly happens when a method is called? You may have noticed +that \verb\x.f()\ was called without an argument above, even though +the function definition for \verb\f\ specified an argument. What +happened to the argument? Surely Python raises an exception when a +function that requires an argument is called without any --- even if +the argument isn't actually used... + +Actually, you may have guessed the answer: the special thing about +methods is that the object is passed as the first argument of the +function. In our example, the call \verb\x.f()\ is exactly equivalent +to \verb\MyClass.f(x)\. In general, calling a method with a list of +{\em n} arguments is equivalent to calling the corresponding function +with an argument list that is created by inserting the method's object +before the first argument. + +If you still don't understand how methods work, a look at the +implementation can perhaps clarify matters. When an instance +attribute is referenced that isn't a data attribute, its class is +searched. If the name denotes a valid class attribute that is a +function object, a method object is created by packing (pointers to) +the instance object and the function object just found together in an +abstract object: this is the method object. When the method object is +called with an argument list, it is unpacked again, a new argument +list is constructed from the instance object and the original argument +list, and the function object is called with this new argument list. + + +\section{Random remarks} + + +[These should perhaps be placed more carefully...] + + +Data attributes override method attributes with the same name; to +avoid accidental name conflicts, which may cause hard-to-find bugs in +large programs, it is wise to use some kind of convention that +minimizes the chance of conflicts, e.g., capitalize method names, +prefix data attribute names with a small unique string (perhaps just +an undescore), or use verbs for methods and nouns for data attributes. + + +Data attributes may be referenced by methods as well as by ordinary +users (``clients'') of an object. In other words, classes are not +usable to implement pure abstract data types. In fact, nothing in +Python makes it possible to enforce data hiding --- it is all based +upon convention. (On the other hand, the Python implementation, +written in C, can completely hide implementation details and control +access to an object if necessary; this can be used by extensions to +Python written in C.) + + +Clients should use data attributes with care --- clients may mess up +invariants maintained by the methods by stamping on their data +attributes. Note that clients may add data attributes of their own to +an instance object without affecting the validity of the methods, as +long as name conflicts are avoided --- again, a naming convention can +save a lot of headaches here. + + +There is no shorthand for referencing data attributes (or other +methods!) from within methods. I find that this actually increases +the readability of methods: there is no chance of confusing local +variables and instance variables when glancing through a method. + + +Conventionally, the first argument of methods is often called +\verb\self\. This is nothing more than a convention: the name +\verb\self\ has absolutely no special meaning to Python. (Note, +however, that by not following the convention your code may be less +readable by other Python programmers, and it is also conceivable that +a {\em class browser} program be written which relies upon such a +convention.) + + +Any function object that is a class attribute defines a method for +instances of that class. It is not necessary that the function +definition is textually enclosed in the class definition: assigning a +function object to a local variable in the class is also ok. For +example: + +\begin{verbatim} + # Function defined outside the class + def f1(self, x, y): + return min(x, x+y) + + class C: + f = f1 + def g(self): + return 'hello world' + h = g +\end{verbatim} + +Now \verb\f\, \verb\g\ and \verb\h\ are all attributes of class +\verb\C\ that refer to function objects, and consequently they are all +methods of instances of \verb\C\ --- \verb\h\ being exactly equivalent +to \verb\g\. Note that this practice usually only serves to confuse +the reader of a program. + + +Methods may call other methods by using method attributes of the +\verb\self\ argument, e.g.: + +\begin{verbatim} + class Bag: + def empty(self): + self.data = [] + def add(self, x): + self.data.append(x) + def addtwice(self, x): + self.add(x) self.add(x) +\end{verbatim} + + +The instantiation operation (``calling'' a class object) creates an +empty object. Many classes like to create objects in a known initial +state. There is no special syntax to enforce this, but a convention +works almost as well: add a method named \verb\init\ to the class, +which initializes the instance (by assigning to some important data +attributes) and returns the instance itself. For example, class +\verb\Bag\ above could have the following method: + +\begin{verbatim} + def init(self): + self.empty() + return self +\end{verbatim} + +The client can then create and initialize an instance in one +statement, as follows: + +\begin{verbatim} + x = Bag().init() +\end{verbatim} + +Of course, the \verb\init\ method may have arguments for greater +flexibility. + +Warning: a common mistake is to forget the \verb\return self\ at the +end of an init method! + + +Methods may reference global names in the same way as ordinary +functions. The global scope associated with a method is the module +containing the class definition. (The class itself is never used as a +global scope!) While one rarely encounters a good reason for using +global data in a method, there are many legitimate uses of the global +scope: for one thing, functions and modules imported into the global +scope can be used by methods, as well as functions and classes defined +in it. Usually, the class containing the method is itself defined in +this global scope, and in the next section we'll find some good +reasons why a method would want to reference its own class! + + +\section{Inheritance} + +Of course, a language feature would not be worthy of the name ``class'' +without supporting inheritance. The syntax for a derived class +definition looks as follows: + +\begin{verbatim} + class DerivedClassName(BaseClassName): + <statement-1> + . + . + . + <statement-N> +\end{verbatim} + +The name \verb\BaseClassName\ must be defined in a scope containing +the derived class definition. Instead of a base class name, an +expression is also allowed. This is useful when the base class is +defined in another module, e.g., + +\begin{verbatim} + class DerivedClassName(modname.BaseClassName): +\end{verbatim} + +Execution of a derived class definition proceeds the same as for a +base class. When the class object is constructed, the base class is +remembered. This is used for resolving attribute references: if a +requested attribute is not found in the class, it is searched in the +base class. This rule is applied recursively if the base class itself +is derived from some other class. + +There's nothing special about instantiation of derived classes: +\verb\DerivedClassName()\ creates a new instance of the class. Method +references are resolved as follows: the corresponding class attribute +is searched, descending down the chain of base classes if necessary, +and the method reference is valid if this yields a function object. + +Derived classes may override methods of their base classes. Because +methods have no special privileges when calling other methods of the +same object, a method of a base class that calls another method +defined in the same base class, may in fact end up calling a method of +a derived class that overrides it. (For C++ programmers: all methods +in Python are ``virtual functions''.) + +An overriding method in a derived class may in fact want to extend +rather than simply replace the base class method of the same name. +There is a simple way to call the base class method directly: just +call \verb\BaseClassName.methodname(self, arguments)\. This is +occasionally useful to clients as well. (Note that this only works if +the base class is defined or imported directly in the global scope.) + + +\subsection{Multiple inheritance} + +Poython supports a limited form of multiple inheritance as well. A +class definition with multiple base classes looks as follows: + +\begin{verbatim} + class DerivedClassName(Base1, Base2, Base3): + <statement-1> + . + . + . + <statement-N> +\end{verbatim} + +The only rule necessary to explain the semantics is the resolution +rule used for class attribute references. This is depth-first, +left-to-right. Thus, if an attribute is not found in +\verb\DerivedClassName\, it is searched in \verb\Base1\, then +(recursively) in the base classes of \verb\Base1\, and only if it is +not found there, it is searched in \verb\Base2\, and so on. + +(To some people breadth first --- searching \verb\Base2\ and +\verb\Base3\ before the base classes of \verb\Base1\ --- looks more +natural. However, this would require you to know whether a particular +attribute of \verb\Base1\ is actually defined in \verb\Base1\ or in +one of its base classes before you can figure out the consequences of +a name conflict with an attribute of \verb\Base2\. The depth-first +rule makes no differences between direct and inherited attributes of +\verb\Base1\.) + +It is clear that indiscriminate use of multiple inheritance is a +maintenance nightmare, given the reliance in Python on conventions to +avoid accidental name conflicts. A well-known problem with multiple +inheritance is a class derived from two classes that happen to have a +common base class. While it is easy enough to figure out what happens +in this case (the instance will have a single copy of ``instance +variables'' or data attributes used by the common base class), it is +not clear that these semantics are in any way useful. + + +\section{Odds and ends} + +Sometimes it is useful to have a data type similar to the Pascal +``record'' or C ``struct'', bundling together a couple of named data +items. An empty class definition will do nicely, e.g.: + +\begin{verbatim} + class Employee: + pass + + john = Employee() # Create an empty employee record + + # Fill the fields of the record + john.name = 'John Doe' + john.dept = 'computer lab' + john.salary = 1000 +\end{verbatim} + + +A piece of Python code that expects a particular abstract data type +can often be passed a class that emulates the methods of that data +type instead. For instance, if you have a function that formats some +data from a file object, you can define a class with methods +\verb\read()\ and \verb\readline()\ that gets the data from a string +buffer instead, and pass it as an argument. (Unfortunately, this +technique has its limitations: a class can't define operations that +are accessed by special syntax such as sequence subscripting or +arithmetic operators, and assigning such a ``pseudo-file'' to +\verb\sys.stdin\ will not cause the interpreter to read further input +from it.) + + +Instance method objects have attributes, too: \verb\m.im_self\ is the +object of which the method is an instance, and \verb\m.im_func\ is the +function object corresponding to the method. + + +XXX Mention bw compat hacks. + + \end{document} diff --git a/Doc/tut/tut.tex b/Doc/tut/tut.tex index 83a0d8b..ac6c5f5 100644 --- a/Doc/tut/tut.tex +++ b/Doc/tut/tut.tex @@ -57,6 +57,7 @@ a more formal definition of the language. \pagenumbering{arabic} + \chapter{Whetting Your Appetite} If you ever wrote a large shell script, you probably know this @@ -141,6 +142,7 @@ should read the Library Reference, which gives complete (though terse) reference material about built-in and standard types, functions and modules that can save you a lot of time when writing Python programs. + \chapter{Using the Python Interpreter} \section{Invoking the Interpreter} @@ -380,6 +382,7 @@ completion mechanism might use the interpreter's symbol table. A command to check (or even suggest) matching parentheses, quotes etc. would also be useful. + \chapter{An Informal Introduction to Python} In the following examples, input and output are distinguished by the @@ -786,6 +789,7 @@ prompt if the last line was not completed. \end{itemize} + \chapter{More Control Flow Tools} Besides the {\tt while} statement just introduced, Python knows the @@ -1065,6 +1069,7 @@ it is equivalent to {\tt result = result + [b]}, but more efficient. \end{itemize} + \chapter{Odds and Ends} This chapter describes some things you've learned about already in @@ -1359,6 +1364,7 @@ to their numeric value, so 0 equals 0.0, etc.% the language. } + \chapter{Modules} If you quit from the Python interpreter and enter it again, the @@ -1581,6 +1587,7 @@ meError', 'SystemError', 'TypeError', 'abs', 'chr', 'dir', 'divmod', 'eval', >>> \end{verbatim}\ecode + \chapter{Output Formatting} So far we've encountered two ways of writing values: {\em expression @@ -1675,6 +1682,7 @@ signs:% >>> \end{verbatim}\ecode + \chapter{Errors and Exceptions} Until now error messages haven't been more than mentioned, but if you @@ -1963,4 +1971,594 @@ handler (and even if another exception occurred in the handler). It is also executed when the {\tt try} statement is left via a {\tt break} or {\tt return} statement. + +\chapter{Classes} + +Python's class mechanism adds classes to the language with a minimum +of new syntax and semantics. It is a mixture of the class mechanisms +found in C++ and Modula-3. As is true for modules, classes in Python +do not put an absolute barrier between definition and user, but rather +rely on the politeness of the user not to ``break into the +definition.'' The most important features of classes are retained +with full power, however: the class inheritance mechanism allows +multiple base classes, a derived class can override any methods of its +base class(es), a method can call the method of a base class with the +same name. Objects can contain an arbitrary amount of private data. + +In C++ terminology, all class members (including the data members) are +{\em public}, and all member functions are {\em virtual}. There are +no special constructors or desctructors. As in Modula-3, there are no +shorthands for referencing the object's members from its methods: the +method function is declared with an explicit first argument +representing the object, which is provided implicitly by the call. As +in Smalltalk, classes themselves are objects, albeit in the wider +sense of the word: in Python, all data types are objects. This +provides semantics for importing and renaming. But, just like in C++ +or Modula-3, built-in types cannot be used as base classes for +extension by the user. Also, like in Modula-3 but unlike in C++, the +built-in operators with special syntax (arithmetic operators, +subscriptong etc.) cannot be redefined for class members. + + +\section{A word about terminology} + +Lacking universally accepted terminology to talk about classes, I'll +make occasional use of Smalltalk and C++ terms. (I'd use Modula-3 +terms, since its object-oriented semantics are closer to those of +Python than C++, but I expect that few readers have heard of it...) + +I also have to warn you that there's a terminological pitfall for +object-oriented readers: the word ``object'' in Python does not +necessarily mean a class instance. Like C++ and Modula-3, and unlike +Smalltalk, not all types in Python are classes: the basic built-in +types like integers and lists aren't, and even somewhat more exotic +types like files aren't. However, {\em all} Python types share a little +bit of common semantics that is best described by using the word +object. + +Objects have individuality, and multiple names (in multiple scopes) +can be bound to the same object. This is known as aliasing in other +languages. This is usually not appreciated on a first glance at +Python, and can be safely ignored when dealing with immutable basic +types (numbers, strings, tuples). However, aliasing has an +(intended!) effect on the semantics of Python code involving mutable +objects such as lists, dictionaries, and most types representing +entities outside the program (files, windows, etc.). This is usually +used to the benefit of the program, since aliases behave like pointers +in some respects. For example, passing an object is cheap since only +a pointer is passed by the implementation; and if a function modifies +an object passed as an argument, the caller will see the change --- this +obviates the need for two different argument passing mechanisms as in +Pascal. + + +\section{Python scopes and name spaces} + +Before introducing classes, I first have to tell you something about +Python's scope rules. Class definitions play some neat tricks with +name spaces, and you need to know how scopes and name spaces work to +fully understand what's going on. Incidentally, knowledge about this +subject is useful for any advanced Python programmer. + +Let's begin with some definitions. + +A {\em name space} is a mapping from names to objects. Most name +spaces are currently implemented as Python dictionaries, but that's +normally not noticeable in any way (except for performance), and it +may change in the future. Examples of name spaces are: the set of +built-in names (functions such as \verb\abs()\, and built-in exception +names); the global names in a module; and the local names in a +function invocation. In a sense the set of attributes of an object +also form a name space. The important things to know about name +spaces is that there is absolutely no relation between names in +different name spaces; for instance, two different modules may both +define a function ``maximize'' without confusion --- users of the +modules must prefix it with the module name. + +By the way, I use the word {\em attribute} for any name following a +dot --- for example, in the expression \verb\z.real\, \verb\real\ is +an attribute of the object \verb\z\. Strictly speaking, references to +names in modules are attribute references: in the expression +\verb\modname.funcname\, \verb\modname\ is a module object and +\verb\funcname\ is an attribute of it. In this case there happens to +be a straightforward mapping between the module's attributes and the +global names defined in the module: they share the same name space!% +\footnote{ + Except for one thing. Module objects have a secret read-only + attribute called {\tt __dict__} which returns the dictionary + used to implement the module's name space; the name + {\tt __dict__} is an attribute but not a global name. + Obviously, using this violates the abstraction of name space + implementation, and should be restricted to things like + post-mortem debuggers... +} + +Attributes may be read-only or writable. In the latter case, +assignment to attributes is possible. Module attributes are writable: +you can write \verb\modname.the_answer = 42\. Writable attributes may +also be deleted with the del statement, e.g. +\verb\del modname.the_answer\. + +Name spaces are created at different moments and have different +lifetimes. The name space containing the built-in names is created +when the Python interpreter starts up, and is never deleted. The +global name space for a module is created when the module definition +is read in; normally, module name spaces also last until the +interpreter quits. The statements executed by the top-level +invocation of the interpreter, either read from a script file or +interactively, are considered part of a module called \verb\__main__\, +so they have their own global name space. (The built-in names +actually also live in a module; this is called \verb\builtin\, +although it should really have been called \verb\__builtin__\.) + +The local name space for a function is created when the function is +called, and deleted when the function returns or raises an exception +that is not handled within the function. (Actually, forgetting would +be a better way to describe what actually happens.) Of course, +recursive invocations each have their own local name space. + +A {\em scope} is a textual region of a Python program where a name space +is directly accessible. ``Directly accessible'' here means that an +unqualified reference to a name attempts to find the name in the name +space. + +Although scopes are determined statically, they are used dynamically. +At any time during execution, exactly three nested scopes are in use +(i.e., exactly three name spaces are directly accessible): the +innermost scope, which is searched first, contains the local names, +the middle scope, searched next, contains the current module's global +names, and the outermost scope (searched last) is the name space +containing built-in names. + +Usually, the local scope references the local names of the (textually) +current function. Outside functions, the the local scope references +the same name space as the global scope: the module's name space. +Class definitions place yet another name space in the local scope. + +It is important to realize that scopes are determined textually: the +global scope of a function defined in a module is that module's name +space, no matter from where or by what alias the function is called. +On the other hand, the actual search for names is done dynamically, at +run time --- however, the the language definition is evolving towards +static name resolution, at ``compile'' time, so don't rely on dynamic +name resolution! (In fact, local variables are already determined +statically.) + +A special quirk of Python is that assignments always go into the +innermost scope. Assignments do not copy data --- they just +bind names to objects. The same is true for deletions: the statement +\verb\del x\ removes the binding of x from the name space referenced by the +local scope. In fact, all operations that introduce new names use the +local scope: in particular, import statements and function definitions +bind the module or function name in the local scope. (The +\verb\global\ statement can be used to indicate that particular +variables live in the global scope.) + + +\section{A first look at classes} + +Classes introduce a little bit of new syntax, three new object types, +and some new semantics. + + +\subsection{Class definition syntax} + +The simplest form of class definition looks like this: + +\begin{verbatim} + class ClassName: + <statement-1> + . + . + . + <statement-N> +\end{verbatim} + +Class definitions, like function definitions (\verb\def\ statements) +must be executed before they have any effect. (You could conceivably +place a class definition in a branch of an \verb\if\ statement, or +inside a function.) + +In practice, the statements inside a class definition will usually be +function definitions, but other statements are allowed, and sometimes +useful --- we'll come back to this later. The function definitions +inside a class normally have a peculiar form of argument list, +dictated by the calling conventions for methods --- again, this is +explained later. + +When a class definition is entered, a new name space is created, and +used as the local scope --- thus, all assignments to local variables +go into this new name space. In particular, function definitions bind +the name of the new function here. + +When a class definition is left normally (via the end), a {\em class +object} is created. This is basically a wrapper around the contents +of the name space created by the class definition; we'll learn more +about class objects in the next section. The original local scope +(the one in effect just before the class definitions was entered) is +reinstated, and the class object is bound here to class name given in +the class definition header (ClassName in the example). + + +\subsection{Class objects} + +Class objects support two kinds of operations: attribute references +and instantiation. + +{\em Attribute references} use the standard syntax used for all +attribute references in Python: \verb\obj.name\. Valid attribute +names are all the names that were in the class's name space when the +class object was created. So, if the class definition looked like +this: + +\begin{verbatim} + class MyClass: + i = 12345 + def f(x): + return 'hello world' +\end{verbatim} + +then \verb\MyClass.i\ and \verb\MyClass.f\ are valid attribute +references, returning an integer and a function object, respectively. +Class attributes can also be assigned to, so you can change the +value of \verb\MyClass.i\ by assignment. + +Class {\em instantiation} uses function notation. Just pretend that +the class object is a parameterless function that returns a new +instance of the class. For example, (assuming the above class): + +\begin{verbatim} + x = MyClass() +\end{verbatim} + +creates a new {\em instance} of the class and assigns this object to +the local variable \verb\x\. + + +\subsection{Instance objects} + +Now what can we do with instance objects? The only operations +understood by instance objects are attribute references. There are +two kinds of valid attribute names. + +The first I'll call {\em data attributes}. These correspond to +``instance variables'' in Smalltalk, and to ``data members'' in C++. +Data attributes need not be declared; like local variables, they +spring into existence when they are first assigned to. For example, +if \verb\x\ in the instance of \verb\MyClass\ created above, the +following piece of code will print the value 16, without leaving a +trace: + +\begin{verbatim} + x.counter = 1 + while x.counter < 10: + x.counter = x.counter * 2 + print x.counter + del x.counter +\end{verbatim} + +The second kind of attribute references understood by instance objects +are {\em methods}. A method is a function that ``belongs to'' an +object. (In Python, the term method is not unique to class instances: +other object types can have methods as well, e.g., list objects have +methods called append, insert, remove, sort, and so on. However, +below, we'll use the term method exclusively to mean methods of class +instance objects, unless explicitly stated otherwise.) + +Valid method names of an instance object depend on its class. By +definition, all attributes of a class that are (user-defined) function +objects define corresponding methods of its instances. So in our +example, \verb\x.f\ is a valid method reference, since +\verb\MyClass.f\ is a function, but \verb\x.i\ is not, since +\verb\MyClass.i\ is not. But \verb\x.f\ is not the +same thing as \verb\MyClass.f\ --- it is a {\em method object}, not a +function object. + + +\subsection{Method objects} + +Usually, a method is called immediately, e.g.: + +\begin{verbatim} + x.f() +\end{verbatim} + +In our example, this will return the string \verb\'hello world'\. +However, it is not necessary to call a method right away: \verb\x.f\ +is a method object, and can be stored away and called at a later +moment, for example: + +\begin{verbatim} + xf = x.f + while 1: + print xf() +\end{verbatim} + +will continue to print \verb\hello world\ until the end of time. + +What exactly happens when a method is called? You may have noticed +that \verb\x.f()\ was called without an argument above, even though +the function definition for \verb\f\ specified an argument. What +happened to the argument? Surely Python raises an exception when a +function that requires an argument is called without any --- even if +the argument isn't actually used... + +Actually, you may have guessed the answer: the special thing about +methods is that the object is passed as the first argument of the +function. In our example, the call \verb\x.f()\ is exactly equivalent +to \verb\MyClass.f(x)\. In general, calling a method with a list of +{\em n} arguments is equivalent to calling the corresponding function +with an argument list that is created by inserting the method's object +before the first argument. + +If you still don't understand how methods work, a look at the +implementation can perhaps clarify matters. When an instance +attribute is referenced that isn't a data attribute, its class is +searched. If the name denotes a valid class attribute that is a +function object, a method object is created by packing (pointers to) +the instance object and the function object just found together in an +abstract object: this is the method object. When the method object is +called with an argument list, it is unpacked again, a new argument +list is constructed from the instance object and the original argument +list, and the function object is called with this new argument list. + + +\section{Random remarks} + + +[These should perhaps be placed more carefully...] + + +Data attributes override method attributes with the same name; to +avoid accidental name conflicts, which may cause hard-to-find bugs in +large programs, it is wise to use some kind of convention that +minimizes the chance of conflicts, e.g., capitalize method names, +prefix data attribute names with a small unique string (perhaps just +an undescore), or use verbs for methods and nouns for data attributes. + + +Data attributes may be referenced by methods as well as by ordinary +users (``clients'') of an object. In other words, classes are not +usable to implement pure abstract data types. In fact, nothing in +Python makes it possible to enforce data hiding --- it is all based +upon convention. (On the other hand, the Python implementation, +written in C, can completely hide implementation details and control +access to an object if necessary; this can be used by extensions to +Python written in C.) + + +Clients should use data attributes with care --- clients may mess up +invariants maintained by the methods by stamping on their data +attributes. Note that clients may add data attributes of their own to +an instance object without affecting the validity of the methods, as +long as name conflicts are avoided --- again, a naming convention can +save a lot of headaches here. + + +There is no shorthand for referencing data attributes (or other +methods!) from within methods. I find that this actually increases +the readability of methods: there is no chance of confusing local +variables and instance variables when glancing through a method. + + +Conventionally, the first argument of methods is often called +\verb\self\. This is nothing more than a convention: the name +\verb\self\ has absolutely no special meaning to Python. (Note, +however, that by not following the convention your code may be less +readable by other Python programmers, and it is also conceivable that +a {\em class browser} program be written which relies upon such a +convention.) + + +Any function object that is a class attribute defines a method for +instances of that class. It is not necessary that the function +definition is textually enclosed in the class definition: assigning a +function object to a local variable in the class is also ok. For +example: + +\begin{verbatim} + # Function defined outside the class + def f1(self, x, y): + return min(x, x+y) + + class C: + f = f1 + def g(self): + return 'hello world' + h = g +\end{verbatim} + +Now \verb\f\, \verb\g\ and \verb\h\ are all attributes of class +\verb\C\ that refer to function objects, and consequently they are all +methods of instances of \verb\C\ --- \verb\h\ being exactly equivalent +to \verb\g\. Note that this practice usually only serves to confuse +the reader of a program. + + +Methods may call other methods by using method attributes of the +\verb\self\ argument, e.g.: + +\begin{verbatim} + class Bag: + def empty(self): + self.data = [] + def add(self, x): + self.data.append(x) + def addtwice(self, x): + self.add(x) self.add(x) +\end{verbatim} + + +The instantiation operation (``calling'' a class object) creates an +empty object. Many classes like to create objects in a known initial +state. There is no special syntax to enforce this, but a convention +works almost as well: add a method named \verb\init\ to the class, +which initializes the instance (by assigning to some important data +attributes) and returns the instance itself. For example, class +\verb\Bag\ above could have the following method: + +\begin{verbatim} + def init(self): + self.empty() + return self +\end{verbatim} + +The client can then create and initialize an instance in one +statement, as follows: + +\begin{verbatim} + x = Bag().init() +\end{verbatim} + +Of course, the \verb\init\ method may have arguments for greater +flexibility. + +Warning: a common mistake is to forget the \verb\return self\ at the +end of an init method! + + +Methods may reference global names in the same way as ordinary +functions. The global scope associated with a method is the module +containing the class definition. (The class itself is never used as a +global scope!) While one rarely encounters a good reason for using +global data in a method, there are many legitimate uses of the global +scope: for one thing, functions and modules imported into the global +scope can be used by methods, as well as functions and classes defined +in it. Usually, the class containing the method is itself defined in +this global scope, and in the next section we'll find some good +reasons why a method would want to reference its own class! + + +\section{Inheritance} + +Of course, a language feature would not be worthy of the name ``class'' +without supporting inheritance. The syntax for a derived class +definition looks as follows: + +\begin{verbatim} + class DerivedClassName(BaseClassName): + <statement-1> + . + . + . + <statement-N> +\end{verbatim} + +The name \verb\BaseClassName\ must be defined in a scope containing +the derived class definition. Instead of a base class name, an +expression is also allowed. This is useful when the base class is +defined in another module, e.g., + +\begin{verbatim} + class DerivedClassName(modname.BaseClassName): +\end{verbatim} + +Execution of a derived class definition proceeds the same as for a +base class. When the class object is constructed, the base class is +remembered. This is used for resolving attribute references: if a +requested attribute is not found in the class, it is searched in the +base class. This rule is applied recursively if the base class itself +is derived from some other class. + +There's nothing special about instantiation of derived classes: +\verb\DerivedClassName()\ creates a new instance of the class. Method +references are resolved as follows: the corresponding class attribute +is searched, descending down the chain of base classes if necessary, +and the method reference is valid if this yields a function object. + +Derived classes may override methods of their base classes. Because +methods have no special privileges when calling other methods of the +same object, a method of a base class that calls another method +defined in the same base class, may in fact end up calling a method of +a derived class that overrides it. (For C++ programmers: all methods +in Python are ``virtual functions''.) + +An overriding method in a derived class may in fact want to extend +rather than simply replace the base class method of the same name. +There is a simple way to call the base class method directly: just +call \verb\BaseClassName.methodname(self, arguments)\. This is +occasionally useful to clients as well. (Note that this only works if +the base class is defined or imported directly in the global scope.) + + +\subsection{Multiple inheritance} + +Poython supports a limited form of multiple inheritance as well. A +class definition with multiple base classes looks as follows: + +\begin{verbatim} + class DerivedClassName(Base1, Base2, Base3): + <statement-1> + . + . + . + <statement-N> +\end{verbatim} + +The only rule necessary to explain the semantics is the resolution +rule used for class attribute references. This is depth-first, +left-to-right. Thus, if an attribute is not found in +\verb\DerivedClassName\, it is searched in \verb\Base1\, then +(recursively) in the base classes of \verb\Base1\, and only if it is +not found there, it is searched in \verb\Base2\, and so on. + +(To some people breadth first --- searching \verb\Base2\ and +\verb\Base3\ before the base classes of \verb\Base1\ --- looks more +natural. However, this would require you to know whether a particular +attribute of \verb\Base1\ is actually defined in \verb\Base1\ or in +one of its base classes before you can figure out the consequences of +a name conflict with an attribute of \verb\Base2\. The depth-first +rule makes no differences between direct and inherited attributes of +\verb\Base1\.) + +It is clear that indiscriminate use of multiple inheritance is a +maintenance nightmare, given the reliance in Python on conventions to +avoid accidental name conflicts. A well-known problem with multiple +inheritance is a class derived from two classes that happen to have a +common base class. While it is easy enough to figure out what happens +in this case (the instance will have a single copy of ``instance +variables'' or data attributes used by the common base class), it is +not clear that these semantics are in any way useful. + + +\section{Odds and ends} + +Sometimes it is useful to have a data type similar to the Pascal +``record'' or C ``struct'', bundling together a couple of named data +items. An empty class definition will do nicely, e.g.: + +\begin{verbatim} + class Employee: + pass + + john = Employee() # Create an empty employee record + + # Fill the fields of the record + john.name = 'John Doe' + john.dept = 'computer lab' + john.salary = 1000 +\end{verbatim} + + +A piece of Python code that expects a particular abstract data type +can often be passed a class that emulates the methods of that data +type instead. For instance, if you have a function that formats some +data from a file object, you can define a class with methods +\verb\read()\ and \verb\readline()\ that gets the data from a string +buffer instead, and pass it as an argument. (Unfortunately, this +technique has its limitations: a class can't define operations that +are accessed by special syntax such as sequence subscripting or +arithmetic operators, and assigning such a ``pseudo-file'' to +\verb\sys.stdin\ will not cause the interpreter to read further input +from it.) + + +Instance method objects have attributes, too: \verb\m.im_self\ is the +object of which the method is an instance, and \verb\m.im_func\ is the +function object corresponding to the method. + + +XXX Mention bw compat hacks. + + \end{document} |