diff options
-rw-r--r-- | Doc/whatsnew/whatsnew22.tex | 507 |
1 files changed, 480 insertions, 27 deletions
diff --git a/Doc/whatsnew/whatsnew22.tex b/Doc/whatsnew/whatsnew22.tex index 3ebb803..e38d31f 100644 --- a/Doc/whatsnew/whatsnew22.tex +++ b/Doc/whatsnew/whatsnew22.tex @@ -29,9 +29,143 @@ for a particular new feature. The final release of Python 2.2 is planned for October 2001. %====================================================================== +% It looks like this set of changes will likely get into 2.2, +% so I need to read and digest the relevant PEPs. +%\section{PEP 252: Type and Class Changes} + +%XXX + +%\begin{seealso} + +%\seepep{252}{Making Types Look More Like Classes}{Written and implemented +%by GvR.} + +%\end{seealso} + +%====================================================================== \section{PEP 234: Iterators} -XXX +A significant addition to 2.2 is an iteration interface at both the C +and Python levels. Objects can define how they can be looped over by +callers. + +In Python versions up to 2.1, the usual way to make \code{for item in +obj} work is to define a \method{__getitem__()} method that looks +something like this: + +\begin{verbatim} + def __getitem__(self, index): + return <next item> +\end{verbatim} + +\method{__getitem__()} is more properly used to define an indexing +operation on an object so that you can write \code{obj[5]} to retrieve +the fifth element. It's a bit misleading when you're using this only +to support \keyword{for} loops. Consider some file-like object that +wants to be looped over; the \var{index} parameter is essentially +meaningless, as the class probably assumes that a series of +\method{__getitem__()} calls will be made, with \var{index} +incrementing by one each time. In other words, the presence of the +\method{__getitem__()} method doesn't mean that \code{file[5]} will +work, though it really should. + +In Python 2.2, iteration can be implemented separately, and +\method{__getitem__()} methods can be limited to classes that really +do support random access. The basic idea of iterators is quite +simple. A new built-in function, \function{iter(obj)}, returns an +iterator for the object \var{obj}. (It can also take two arguments: +\code{iter(\var{C}, \var{sentinel})} will call the callable \var{C}, until it +returns \var{sentinel}, which will signal that the iterator is done. This form probably won't be used very often.) + +Python classes can define an \method{__iter__()} method, which should +create and return a new iterator for the object; if the object is its +own iterator, this method can just return \code{self}. In particular, +iterators will usually be their own iterators. Extension types +implemented in C can implement a \code{tp_iter} function in order to +return an iterator, too. + +So what do iterators do? They have one required method, +\method{next()}, which takes no arguments and returns the next value. +When there are no more values to be returned, calling \method{next()} +should raise the \exception{StopIteration} exception. + +\begin{verbatim} +>>> L = [1,2,3] +>>> i = iter(L) +>>> print i +<iterator object at 0x8116870> +>>> i.next() +1 +>>> i.next() +2 +>>> i.next() +3 +>>> i.next() +Traceback (most recent call last): + File "<stdin>", line 1, in ? +StopIteration +>>> +\end{verbatim} + +In 2.2, Python's \keyword{for} statement no longer expects a sequence; +it expects something for which \function{iter()} will return something. +For backward compatibility, and convenience, an iterator is +automatically constructed for sequences that don't implement +\method{__iter__()} or a \code{tp_iter} slot, so \code{for i in +[1,2,3]} will still work. Wherever the Python interpreter loops over +a sequence, it's been changed to use the iterator protocol. This +means you can do things like this: + +\begin{verbatim} +>>> i = iter(L) +>>> a,b,c = i +>>> a,b,c +(1, 2, 3) +>>> +\end{verbatim} + +Iterator support has been added to some of Python's basic types. The +\keyword{in} operator now works on dictionaries, so \code{\var{key} in +dict} is now equivalent to \code{dict.has_key(\var{key})}. +Calling \function{iter()} on a dictionary will return an iterator which loops over their keys: + +\begin{verbatim} +>>> m = {'Jan': 1, 'Feb': 2, 'Mar': 3, 'Apr': 4, 'May': 5, 'Jun': 6, +... 'Jul': 7, 'Aug': 8, 'Sep': 9, 'Oct': 10, 'Nov': 11, 'Dec': 12} +>>> for key in m: print key, m[key] +... +Mar 3 +Feb 2 +Aug 8 +Sep 9 +May 5 +Jun 6 +Jul 7 +Jan 1 +Apr 4 +Nov 11 +Dec 12 +Oct 10 +>>> +\end{verbatim} + +That's just the default behaviour. If you want to iterate over keys, +values, or key/value pairs, you can explicitly call the +\method{iterkeys()}, \method{itervalues()}, or \method{iteritems()} +methods to get an appropriate iterator. + +Files also provide an iterator, which calls its \method{readline()} +method until there are no more lines in the file. This means you can +now read each line of a file using code like this: + +\begin{verbatim} +for line in file: + # do something for each line +\end{verbatim} + +Note that you can only go forward in an iterator; there's no way to +get the previous element, reset the iterator, or make a copy of it. +An iterator object could provide such additional capabilities, but the iterator protocol only requires a \method{next()} method. \begin{seealso} @@ -43,33 +177,144 @@ by the Python Labs crew, mostly by GvR and Tim Peters.} %====================================================================== \section{PEP 255: Simple Generators} -XXX +Generators are another new feature, one that interacts with the +introduction of iterators. + +You're doubtless familiar with how function calls work in Python or +C. When you call a function, it gets a private area where its local +variables are created. When the function reaches a \keyword{return} +statement, the local variables are destroyed and the resulting value +is returned to the caller. A later call to the same function will get +a fresh new set of local variables. But, what if the local variables +weren't destroyed on exiting a function? What if you could later +resume the function where it left off? This is what generators +provide; they can be thought of as resumable functions. + +Here's the simplest example of a generator function: + +\begin{verbatim} +def generate_ints(N): + for i in range(N): + yield i +\end{verbatim} + +A new keyword, \keyword{yield}, was introduced for generators. Any +function containing a \keyword{yield} statement is a generator +function; this is detected by Python's bytecode compiler which +compiles the function specially. When you call a generator function, +it doesn't return a single value; instead it returns a generator +object that supports the iterator interface. On executing the +\keyword{yield} statement, the generator outputs the value of +\code{i}, similar to a \keyword{return} statement. The big difference +between \keyword{yield} and a \keyword{return} statement is that, on +reaching a \keyword{yield} the generator's state of execution is +suspended and local variables are preserved. On the next call to the +generator's \code{.next()} method, the function will resume executing +immediately after the \keyword{yield} statement. (For complicated +reasons, the \keyword{yield} statement isn't allowed inside the +\keyword{try} block of a \code{try...finally} statement; read PEP 255 +for a full explanation of the interaction between \keyword{yield} and +exceptions.) + +Here's a sample usage of the \function{generate_ints} generator: + +\begin{verbatim} +>>> gen = generate_ints(3) +>>> gen +<generator object at 0x8117f90> +>>> gen.next() +0 +>>> gen.next() +1 +>>> gen.next() +2 +>>> gen.next() +Traceback (most recent call last): + File "<stdin>", line 1, in ? + File "<stdin>", line 2, in generate_ints +StopIteration +>>> +\end{verbatim} + +You could equally write \code{for i in generate_ints(5)}, or +\code{a,b,c = generate_ints(3)}. + +Inside a generator function, the \keyword{return} statement can only +be used without a value, and is equivalent to raising the +\exception{StopIteration} exception; afterwards the generator cannot +return any further values. \keyword{return} with a value, such as +\code{return 5}, is a syntax error inside a generator function. You +can also raise \exception{StopIteration} manually, or just let the +thread of execution fall off the bottom of the function, to achieve +the same effect. + +You could achieve the effect of generators manually by writing your +own class, and storing all the local variables of the generator as +instance variables. For example, returning a list of integers could +be done by setting \code{self.count} to 0, and having the +\method{next()} method increment \code{self.count} and return it. +because it would be easy to write a Python class. However, for a +moderately complicated generator, writing a corresponding class would +be much messier. \file{Lib/test/test_generators.py} contains a number +of more interesting examples. The simplest one implements an in-order +traversal of a tree using generators recursively. + +\begin{verbatim} +# A recursive generator that generates Tree leaves in in-order. +def inorder(t): + if t: + for x in inorder(t.left): + yield x + yield t.label + for x in inorder(t.right): + yield x +\end{verbatim} + +Two other examples in \file{Lib/test/test_generators.py} produce +solutions for the N-Queens problem (placing $N$ queens on an $NxN$ +chess board so that no queen threatens another) and the Knight's Tour +(a route that takes a knight to every square of an $NxN$ chessboard +without visiting any square twice). + +The idea of generators comes from other programming languages, +especially Icon (\url{http://www.cs.arizona.edu/icon/}), where the +idea of generators is central to the language. In Icon, every +expression and function call behaves like a generator. One example +from ``An Overview of the Icon Programming Language'' at +\url{http://www.cs.arizona.edu/icon/docs/ipd266.htm} gives an idea of +what this looks like: + +\begin{verbatim} +sentence := "Store it in the neighboring harbor" +if (i := find("or", sentence)) > 5 then write(i) +\end{verbatim} + +The \function{find()} function returns the indexes at which the +substring ``or'' is found: 3, 23, 33. In the \keyword{if} statement, +\code{i} is first assigned a value of 3, but 3 is less than 5, so the +comparison fails, and Icon retries it with the second value of 23. 23 +is greater than 5, so the comparison now succeeds, and the code prints +the value 23 to the screen. + +Python doesn't go nearly as far as Icon in adopting generators as a +central concept. Generators are considered a new part of the core +Python language, but learning or using them isn't compulsory; if they +don't solve any problems that you have, feel free to ignore them. +This is different from Icon where the idea of generators is a basic +concept. One novel feature of Python's interface as compared to +Icon's is that a generator's state is represented as a concrete object +that can be passed around to other functions or stored in a data +structure. \begin{seealso} \seepep{255}{Simple Generators}{Written by Neil Schemenauer, Tim Peters, Magnus Lie Hetland. Implemented mostly by Neil -Schemenauer, with fixes from the Python Labs crew, mostly by GvR and -Tim Peters.} +Schemenauer, with fixes from the Python Labs crew.} \end{seealso} %====================================================================== -% It looks like this set of changes isn't going to be getting into 2.2, -% unless someone plans to merge the descr-branch back into the mainstream -% very quickly. -%\section{PEP 252: Type and Class Changes} - -%XXX - -%\begin{seealso} - -%\seepep{252}{Making Types Look More Like Classes}{Written and implemented -%by GvR.} - -%\end{seealso} - -%====================================================================== \section{Unicode Changes} XXX I have to figure out what the changes mean to users. @@ -78,13 +323,178 @@ XXX I have to figure out what the changes mean to users. References: http://mail.python.org/pipermail/i18n-sig/2001-June/001107.html and following thread. +%====================================================================== +\section{PEP 227: Nested Scopes} + +In Python 2.1, statically nested scopes were added as an optional +feature, to be enabled by a \code{from __future__ import +nested_scopes} directive. In 2.2 nested scopes no longer need to be +specially enabled, but are always enabled. The rest of this section +is a copy of the description of nested scopes from my ``What's New in +Python 2.1'' document; if you read it when 2.1 came out, you can skip +the rest of this section. + +The largest change introduced in Python 2.1, and made complete in 2.2, +is to Python's scoping rules. In Python 2.0, at any given time there +are at most three namespaces used to look up variable names: local, +module-level, and the built-in namespace. This often surprised people +because it didn't match their intuitive expectations. For example, a +nested recursive function definition doesn't work: + +\begin{verbatim} +def f(): + ... + def g(value): + ... + return g(value-1) + 1 + ... +\end{verbatim} + +The function \function{g()} will always raise a \exception{NameError} +exception, because the binding of the name \samp{g} isn't in either +its local namespace or in the module-level namespace. This isn't much +of a problem in practice (how often do you recursively define interior +functions like this?), but this also made using the \keyword{lambda} +statement clumsier, and this was a problem in practice. In code which +uses \keyword{lambda} you can often find local variables being copied +by passing them as the default values of arguments. + +\begin{verbatim} +def find(self, name): + "Return list of any entries equal to 'name'" + L = filter(lambda x, name=name: x == name, + self.list_attribute) + return L +\end{verbatim} + +The readability of Python code written in a strongly functional style +suffers greatly as a result. + +The most significant change to Python 2.2 is that static scoping has +been added to the language to fix this problem. As a first effect, +the \code{name=name} default argument is now unnecessary in the above +example. Put simply, when a given variable name is not assigned a +value within a function (by an assignment, or the \keyword{def}, +\keyword{class}, or \keyword{import} statements), references to the +variable will be looked up in the local namespace of the enclosing +scope. A more detailed explanation of the rules, and a dissection of +the implementation, can be found in the PEP. + +This change may cause some compatibility problems for code where the +same variable name is used both at the module level and as a local +variable within a function that contains further function definitions. +This seems rather unlikely though, since such code would have been +pretty confusing to read in the first place. + +One side effect of the change is that the \code{from \var{module} +import *} and \keyword{exec} statements have been made illegal inside +a function scope under certain conditions. The Python reference +manual has said all along that \code{from \var{module} import *} is +only legal at the top level of a module, but the CPython interpreter +has never enforced this before. As part of the implementation of +nested scopes, the compiler which turns Python source into bytecodes +has to generate different code to access variables in a containing +scope. \code{from \var{module} import *} and \keyword{exec} make it +impossible for the compiler to figure this out, because they add names +to the local namespace that are unknowable at compile time. +Therefore, if a function contains function definitions or +\keyword{lambda} expressions with free variables, the compiler will +flag this by raising a \exception{SyntaxError} exception. + +To make the preceding explanation a bit clearer, here's an example: + +\begin{verbatim} +x = 1 +def f(): + # The next line is a syntax error + exec 'x=2' + def g(): + return x +\end{verbatim} + +Line 4 containing the \keyword{exec} statement is a syntax error, +since \keyword{exec} would define a new local variable named \samp{x} +whose value should be accessed by \function{g()}. + +This shouldn't be much of a limitation, since \keyword{exec} is rarely +used in most Python code (and when it is used, it's often a sign of a +poor design anyway). +======= +%\end{seealso} + +\begin{seealso} + +\seepep{227}{Statically Nested Scopes}{Written and implemented by +Jeremy Hylton.} + +\end{seealso} + %====================================================================== \section{New and Improved Modules} \begin{itemize} - \item xmlrpclib added to standard library. + \item The \module{xmlrpclib} module was contributed to the standard +library by Fredrik Lundh. It provides support for writing XML-RPC +clients; XML-RPC is a simple remote procedure call protocol built on +top of HTTP and XML. For example, the following snippet retrieves a +list of RSS channels from the O'Reilly Network, and then retrieves a +list of the recent headlines for one channel: + +\begin{verbatim} +import xmlrpclib +s = xmlrpclib.Server( + 'http://www.oreillynet.com/meerkat/xml-rpc/server.php') +channels = s.meerkat.getChannels() +# channels is a list of dictionaries, like this: +# [{'id': 4, 'title': 'Freshmeat Daily News'} +# {'id': 190, 'title': '32Bits Online'}, +# {'id': 4549, 'title': '3DGamers'}, ... ] + +# Get the items for one channel +items = s.meerkat.getItems( {'channel': 4} ) + +# 'items' is another list of dictionaries, like this: +# [{'link': 'http://freshmeat.net/releases/52719/', +# 'description': 'A utility which converts HTML to XSL FO.', +# 'title': 'html2fo 0.3 (Default)'}, ... ] +\end{verbatim} + +See \url{http://www.xmlrpc.com} for more information about XML-RPC. + + \item The \module{socket} module can be compiled to support IPv6; + specify the \code{--enable-ipv6} option to Python's configure + script. (Contributed by Jun-ichiro ``itojun'' Hagino.) + + \item Two new format characters were added to the \module{struct} + module for 64-bit integers on platforms that support the C + \ctype{long long} type. \samp{q} is for a signed 64-bit integer, + and \samp{Q} is for an unsigned one. The value is returned in + Python's long integer type. (Contributed by Tim Peters.) + + \item In the interpreter's interactive mode, there's a new built-in + function \function{help()}, that uses the \module{pydoc} module + introduced in Python 2.1 to provide interactive. + \code{help(\var{object})} displays any available help text about + \var{object}. \code{help()} with no argument puts you in an online + help utility, where you can enter the names of functions, classes, + or modules to read their help text. + (Contributed by Guido van Rossum, using Ka-Ping Yee's \module{pydoc} module.) + + \item Various bugfixes and performance improvements have been made +to the SRE engine underlying the \module{re} module. For example, +\function{re.sub()} will now use \function{string.replace()} +automatically when the pattern and its replacement are both just +literal strings without regex metacharacters. Another contributed +patch speeds up certain Unicode character ranges by a factor of +two. (SRE is maintained by Fredrik Lundh. The BIGCHARSET patch +was contributed by Martin von L\"owis.) + + \item The \module{imaplib} module now has support for the IMAP +NAMESPACE extension defined in \rfc{2342}. (Contributed by Michel +Pelletier.) + \end{itemize} @@ -92,20 +502,63 @@ and following thread. %====================================================================== \section{Other Changes and Fixes} -XXX +As usual there were a bunch of other improvements and bugfixes +scattered throughout the source tree. A search through the CVS change +logs finds there were XXX patches applied, and XXX bugs fixed; both +figures are likely to be underestimates. Some of the more notable +changes are: \begin{itemize} - \item XXX Nested scoping enabled by default - \item XXX C API: Reorganization of object calling \item XXX .encode(), .decode() string methods. Interesting new codecs such -as zlib. +as zlib. -%Original log message: - -%The call_object() function, originally in ceval.c, begins a new life + \item MacOS code now in main CVS tree. + + \item SF patch \#418147 Fixes to allow compiling w/ Borland, from Stephen Hansen. + + \item Add support for Windows using "mbcs" as the default Unicode encoding when dealing with the file system. As discussed on python-dev and in patch 410465. + +\item Lots of patches to dictionaries; measure performance improvement, if any. + + \item Patch \#430754: Makes ftpmirror.py .netrc aware + +\item Fix bug reported by Tim Peters on python-dev: + +Keyword arguments passed to builtin functions that don't take them are +ignored. + +>>> {}.clear(x=2) +>>> + +instead of + +>>> {}.clear(x=2) +Traceback (most recent call last): + File "<stdin>", line 1, in ? +TypeError: clear() takes no keyword arguments + +\item Make the license GPL-compatible. + +\item This change adds two new C-level APIs: PyEval_SetProfile() and +PyEval_SetTrace(). These can be used to install profile and trace +functions implemented in C, which can operate at much higher speeds +than Python-based functions. The overhead for calling a C-based +profile function is a very small fraction of a percent of the overhead +involved in calling a Python-based function. + +The machinery required to call a Python-based profile or trace +function been moved to sysmodule.c, where sys.setprofile() and +sys.setprofile() simply become users of the new interface. + +\item 'Advanced' xrange() features now deprecated: repeat, slice, +contains, tolist(), and the start/stop/step attributes. This includes +removing the 4th ('repeat') argument to PyRange_New(). + + +\item The call_object() function, originally in ceval.c, begins a new life %as the official API PyObject_Call(). It is also much simplified: all %it does is call the tp_call slot, or raise an exception if that's %NULL. |