1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
|
\documentclass{howto}
% $Id$
\title{What's New in Python 2.2}
\release{0.01}
\author{A.M. Kuchling}
\authoraddress{\email{akuchlin@mems-exchange.org}}
\begin{document}
\maketitle\tableofcontents
\section{Introduction}
{\large This document is a draft, and is subject to change until the
final version of Python 2.2 is released. Currently it's not up to
date at all. Please send any comments, bug reports, or questions, no
matter how minor, to \email{akuchlin@mems-exchange.org}. }
This article explains the new features in Python 2.2. Python 2.2
includes some significant changes that go far toward cleaning up the
language's darkest corners, and some exciting new features.
This article doesn't attempt to provide a complete specification for
the new features, but instead provides a convenient overview of the
new features. For full details, you should refer to 2.2 documentation
such as the Library Reference and the Reference Guide, or to the PEP
for a particular new feature.
The final release of Python 2.2 is planned for October 2001.
%======================================================================
% It looks like this set of changes will likely get into 2.2,
% so I need to read and digest the relevant PEPs.
%\section{PEP 252: Type and Class Changes}
%XXX
%\begin{seealso}
%\seepep{252}{Making Types Look More Like Classes}{Written and implemented
%by GvR.}
%\end{seealso}
%======================================================================
\section{PEP 234: Iterators}
A significant addition to 2.2 is an iteration interface at both the C
and Python levels. Objects can define how they can be looped over by
callers.
In Python versions up to 2.1, the usual way to make \code{for item in
obj} work is to define a \method{__getitem__()} method that looks
something like this:
\begin{verbatim}
def __getitem__(self, index):
return <next item>
\end{verbatim}
\method{__getitem__()} is more properly used to define an indexing
operation on an object so that you can write \code{obj[5]} to retrieve
the fifth element. It's a bit misleading when you're using this only
to support \keyword{for} loops. Consider some file-like object that
wants to be looped over; the \var{index} parameter is essentially
meaningless, as the class probably assumes that a series of
\method{__getitem__()} calls will be made, with \var{index}
incrementing by one each time. In other words, the presence of the
\method{__getitem__()} method doesn't mean that \code{file[5]} will
work, though it really should.
In Python 2.2, iteration can be implemented separately, and
\method{__getitem__()} methods can be limited to classes that really
do support random access. The basic idea of iterators is quite
simple. A new built-in function, \function{iter(obj)}, returns an
iterator for the object \var{obj}. (It can also take two arguments:
\code{iter(\var{C}, \var{sentinel})} will call the callable \var{C}, until it
returns \var{sentinel}, which will signal that the iterator is done. This form probably won't be used very often.)
Python classes can define an \method{__iter__()} method, which should
create and return a new iterator for the object; if the object is its
own iterator, this method can just return \code{self}. In particular,
iterators will usually be their own iterators. Extension types
implemented in C can implement a \code{tp_iter} function in order to
return an iterator, too.
So what do iterators do? They have one required method,
\method{next()}, which takes no arguments and returns the next value.
When there are no more values to be returned, calling \method{next()}
should raise the \exception{StopIteration} exception.
\begin{verbatim}
>>> L = [1,2,3]
>>> i = iter(L)
>>> print i
<iterator object at 0x8116870>
>>> i.next()
1
>>> i.next()
2
>>> i.next()
3
>>> i.next()
Traceback (most recent call last):
File "<stdin>", line 1, in ?
StopIteration
>>>
\end{verbatim}
In 2.2, Python's \keyword{for} statement no longer expects a sequence;
it expects something for which \function{iter()} will return something.
For backward compatibility, and convenience, an iterator is
automatically constructed for sequences that don't implement
\method{__iter__()} or a \code{tp_iter} slot, so \code{for i in
[1,2,3]} will still work. Wherever the Python interpreter loops over
a sequence, it's been changed to use the iterator protocol. This
means you can do things like this:
\begin{verbatim}
>>> i = iter(L)
>>> a,b,c = i
>>> a,b,c
(1, 2, 3)
>>>
\end{verbatim}
Iterator support has been added to some of Python's basic types. The
\keyword{in} operator now works on dictionaries, so \code{\var{key} in
dict} is now equivalent to \code{dict.has_key(\var{key})}.
Calling \function{iter()} on a dictionary will return an iterator which loops over their keys:
\begin{verbatim}
>>> m = {'Jan': 1, 'Feb': 2, 'Mar': 3, 'Apr': 4, 'May': 5, 'Jun': 6,
... 'Jul': 7, 'Aug': 8, 'Sep': 9, 'Oct': 10, 'Nov': 11, 'Dec': 12}
>>> for key in m: print key, m[key]
...
Mar 3
Feb 2
Aug 8
Sep 9
May 5
Jun 6
Jul 7
Jan 1
Apr 4
Nov 11
Dec 12
Oct 10
>>>
\end{verbatim}
That's just the default behaviour. If you want to iterate over keys,
values, or key/value pairs, you can explicitly call the
\method{iterkeys()}, \method{itervalues()}, or \method{iteritems()}
methods to get an appropriate iterator.
Files also provide an iterator, which calls its \method{readline()}
method until there are no more lines in the file. This means you can
now read each line of a file using code like this:
\begin{verbatim}
for line in file:
# do something for each line
\end{verbatim}
Note that you can only go forward in an iterator; there's no way to
get the previous element, reset the iterator, or make a copy of it.
An iterator object could provide such additional capabilities, but the iterator protocol only requires a \method{next()} method.
\begin{seealso}
\seepep{234}{Iterators}{Written by Ka-Ping Yee and GvR; implemented
by the Python Labs crew, mostly by GvR and Tim Peters.}
\end{seealso}
%======================================================================
\section{PEP 255: Simple Generators}
Generators are another new feature, one that interacts with the
introduction of iterators.
You're doubtless familiar with how function calls work in Python or
C. When you call a function, it gets a private area where its local
variables are created. When the function reaches a \keyword{return}
statement, the local variables are destroyed and the resulting value
is returned to the caller. A later call to the same function will get
a fresh new set of local variables. But, what if the local variables
weren't destroyed on exiting a function? What if you could later
resume the function where it left off? This is what generators
provide; they can be thought of as resumable functions.
Here's the simplest example of a generator function:
\begin{verbatim}
def generate_ints(N):
for i in range(N):
yield i
\end{verbatim}
A new keyword, \keyword{yield}, was introduced for generators. Any
function containing a \keyword{yield} statement is a generator
function; this is detected by Python's bytecode compiler which
compiles the function specially. When you call a generator function,
it doesn't return a single value; instead it returns a generator
object that supports the iterator interface. On executing the
\keyword{yield} statement, the generator outputs the value of
\code{i}, similar to a \keyword{return} statement. The big difference
between \keyword{yield} and a \keyword{return} statement is that, on
reaching a \keyword{yield} the generator's state of execution is
suspended and local variables are preserved. On the next call to the
generator's \code{.next()} method, the function will resume executing
immediately after the \keyword{yield} statement. (For complicated
reasons, the \keyword{yield} statement isn't allowed inside the
\keyword{try} block of a \code{try...finally} statement; read PEP 255
for a full explanation of the interaction between \keyword{yield} and
exceptions.)
Here's a sample usage of the \function{generate_ints} generator:
\begin{verbatim}
>>> gen = generate_ints(3)
>>> gen
<generator object at 0x8117f90>
>>> gen.next()
0
>>> gen.next()
1
>>> gen.next()
2
>>> gen.next()
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "<stdin>", line 2, in generate_ints
StopIteration
>>>
\end{verbatim}
You could equally write \code{for i in generate_ints(5)}, or
\code{a,b,c = generate_ints(3)}.
Inside a generator function, the \keyword{return} statement can only
be used without a value, and is equivalent to raising the
\exception{StopIteration} exception; afterwards the generator cannot
return any further values. \keyword{return} with a value, such as
\code{return 5}, is a syntax error inside a generator function. You
can also raise \exception{StopIteration} manually, or just let the
thread of execution fall off the bottom of the function, to achieve
the same effect.
You could achieve the effect of generators manually by writing your
own class, and storing all the local variables of the generator as
instance variables. For example, returning a list of integers could
be done by setting \code{self.count} to 0, and having the
\method{next()} method increment \code{self.count} and return it.
because it would be easy to write a Python class. However, for a
moderately complicated generator, writing a corresponding class would
be much messier. \file{Lib/test/test_generators.py} contains a number
of more interesting examples. The simplest one implements an in-order
traversal of a tree using generators recursively.
\begin{verbatim}
# A recursive generator that generates Tree leaves in in-order.
def inorder(t):
if t:
for x in inorder(t.left):
yield x
yield t.label
for x in inorder(t.right):
yield x
\end{verbatim}
Two other examples in \file{Lib/test/test_generators.py} produce
solutions for the N-Queens problem (placing $N$ queens on an $NxN$
chess board so that no queen threatens another) and the Knight's Tour
(a route that takes a knight to every square of an $NxN$ chessboard
without visiting any square twice).
The idea of generators comes from other programming languages,
especially Icon (\url{http://www.cs.arizona.edu/icon/}), where the
idea of generators is central to the language. In Icon, every
expression and function call behaves like a generator. One example
from ``An Overview of the Icon Programming Language'' at
\url{http://www.cs.arizona.edu/icon/docs/ipd266.htm} gives an idea of
what this looks like:
\begin{verbatim}
sentence := "Store it in the neighboring harbor"
if (i := find("or", sentence)) > 5 then write(i)
\end{verbatim}
The \function{find()} function returns the indexes at which the
substring ``or'' is found: 3, 23, 33. In the \keyword{if} statement,
\code{i} is first assigned a value of 3, but 3 is less than 5, so the
comparison fails, and Icon retries it with the second value of 23. 23
is greater than 5, so the comparison now succeeds, and the code prints
the value 23 to the screen.
Python doesn't go nearly as far as Icon in adopting generators as a
central concept. Generators are considered a new part of the core
Python language, but learning or using them isn't compulsory; if they
don't solve any problems that you have, feel free to ignore them.
This is different from Icon where the idea of generators is a basic
concept. One novel feature of Python's interface as compared to
Icon's is that a generator's state is represented as a concrete object
that can be passed around to other functions or stored in a data
structure.
\begin{seealso}
\seepep{255}{Simple Generators}{Written by Neil Schemenauer,
Tim Peters, Magnus Lie Hetland. Implemented mostly by Neil
Schemenauer, with fixes from the Python Labs crew.}
\end{seealso}
%======================================================================
\section{Unicode Changes}
XXX I have to figure out what the changes mean to users.
(--enable-unicode configure switch)
References: http://mail.python.org/pipermail/i18n-sig/2001-June/001107.html
and following thread.
%======================================================================
\section{PEP 227: Nested Scopes}
In Python 2.1, statically nested scopes were added as an optional
feature, to be enabled by a \code{from __future__ import
nested_scopes} directive. In 2.2 nested scopes no longer need to be
specially enabled, but are always enabled. The rest of this section
is a copy of the description of nested scopes from my ``What's New in
Python 2.1'' document; if you read it when 2.1 came out, you can skip
the rest of this section.
The largest change introduced in Python 2.1, and made complete in 2.2,
is to Python's scoping rules. In Python 2.0, at any given time there
are at most three namespaces used to look up variable names: local,
module-level, and the built-in namespace. This often surprised people
because it didn't match their intuitive expectations. For example, a
nested recursive function definition doesn't work:
\begin{verbatim}
def f():
...
def g(value):
...
return g(value-1) + 1
...
\end{verbatim}
The function \function{g()} will always raise a \exception{NameError}
exception, because the binding of the name \samp{g} isn't in either
its local namespace or in the module-level namespace. This isn't much
of a problem in practice (how often do you recursively define interior
functions like this?), but this also made using the \keyword{lambda}
statement clumsier, and this was a problem in practice. In code which
uses \keyword{lambda} you can often find local variables being copied
by passing them as the default values of arguments.
\begin{verbatim}
def find(self, name):
"Return list of any entries equal to 'name'"
L = filter(lambda x, name=name: x == name,
self.list_attribute)
return L
\end{verbatim}
The readability of Python code written in a strongly functional style
suffers greatly as a result.
The most significant change to Python 2.2 is that static scoping has
been added to the language to fix this problem. As a first effect,
the \code{name=name} default argument is now unnecessary in the above
example. Put simply, when a given variable name is not assigned a
value within a function (by an assignment, or the \keyword{def},
\keyword{class}, or \keyword{import} statements), references to the
variable will be looked up in the local namespace of the enclosing
scope. A more detailed explanation of the rules, and a dissection of
the implementation, can be found in the PEP.
This change may cause some compatibility problems for code where the
same variable name is used both at the module level and as a local
variable within a function that contains further function definitions.
This seems rather unlikely though, since such code would have been
pretty confusing to read in the first place.
One side effect of the change is that the \code{from \var{module}
import *} and \keyword{exec} statements have been made illegal inside
a function scope under certain conditions. The Python reference
manual has said all along that \code{from \var{module} import *} is
only legal at the top level of a module, but the CPython interpreter
has never enforced this before. As part of the implementation of
nested scopes, the compiler which turns Python source into bytecodes
has to generate different code to access variables in a containing
scope. \code{from \var{module} import *} and \keyword{exec} make it
impossible for the compiler to figure this out, because they add names
to the local namespace that are unknowable at compile time.
Therefore, if a function contains function definitions or
\keyword{lambda} expressions with free variables, the compiler will
flag this by raising a \exception{SyntaxError} exception.
To make the preceding explanation a bit clearer, here's an example:
\begin{verbatim}
x = 1
def f():
# The next line is a syntax error
exec 'x=2'
def g():
return x
\end{verbatim}
Line 4 containing the \keyword{exec} statement is a syntax error,
since \keyword{exec} would define a new local variable named \samp{x}
whose value should be accessed by \function{g()}.
This shouldn't be much of a limitation, since \keyword{exec} is rarely
used in most Python code (and when it is used, it's often a sign of a
poor design anyway).
=======
%\end{seealso}
\begin{seealso}
\seepep{227}{Statically Nested Scopes}{Written and implemented by
Jeremy Hylton.}
\end{seealso}
%======================================================================
\section{New and Improved Modules}
\begin{itemize}
\item The \module{xmlrpclib} module was contributed to the standard
library by Fredrik Lundh. It provides support for writing XML-RPC
clients; XML-RPC is a simple remote procedure call protocol built on
top of HTTP and XML. For example, the following snippet retrieves a
list of RSS channels from the O'Reilly Network, and then retrieves a
list of the recent headlines for one channel:
\begin{verbatim}
import xmlrpclib
s = xmlrpclib.Server(
'http://www.oreillynet.com/meerkat/xml-rpc/server.php')
channels = s.meerkat.getChannels()
# channels is a list of dictionaries, like this:
# [{'id': 4, 'title': 'Freshmeat Daily News'}
# {'id': 190, 'title': '32Bits Online'},
# {'id': 4549, 'title': '3DGamers'}, ... ]
# Get the items for one channel
items = s.meerkat.getItems( {'channel': 4} )
# 'items' is another list of dictionaries, like this:
# [{'link': 'http://freshmeat.net/releases/52719/',
# 'description': 'A utility which converts HTML to XSL FO.',
# 'title': 'html2fo 0.3 (Default)'}, ... ]
\end{verbatim}
See \url{http://www.xmlrpc.com} for more information about XML-RPC.
\item The \module{socket} module can be compiled to support IPv6;
specify the \code{--enable-ipv6} option to Python's configure
script. (Contributed by Jun-ichiro ``itojun'' Hagino.)
\item Two new format characters were added to the \module{struct}
module for 64-bit integers on platforms that support the C
\ctype{long long} type. \samp{q} is for a signed 64-bit integer,
and \samp{Q} is for an unsigned one. The value is returned in
Python's long integer type. (Contributed by Tim Peters.)
\item In the interpreter's interactive mode, there's a new built-in
function \function{help()}, that uses the \module{pydoc} module
introduced in Python 2.1 to provide interactive.
\code{help(\var{object})} displays any available help text about
\var{object}. \code{help()} with no argument puts you in an online
help utility, where you can enter the names of functions, classes,
or modules to read their help text.
(Contributed by Guido van Rossum, using Ka-Ping Yee's \module{pydoc} module.)
\item Various bugfixes and performance improvements have been made
to the SRE engine underlying the \module{re} module. For example,
\function{re.sub()} will now use \function{string.replace()}
automatically when the pattern and its replacement are both just
literal strings without regex metacharacters. Another contributed
patch speeds up certain Unicode character ranges by a factor of
two. (SRE is maintained by Fredrik Lundh. The BIGCHARSET patch
was contributed by Martin von L\"owis.)
\item The \module{imaplib} module now has support for the IMAP
NAMESPACE extension defined in \rfc{2342}. (Contributed by Michel
Pelletier.)
\end{itemize}
%======================================================================
\section{Other Changes and Fixes}
As usual there were a bunch of other improvements and bugfixes
scattered throughout the source tree. A search through the CVS change
logs finds there were XXX patches applied, and XXX bugs fixed; both
figures are likely to be underestimates. Some of the more notable
changes are:
\begin{itemize}
\item XXX C API: Reorganization of object calling
\item XXX .encode(), .decode() string methods. Interesting new codecs such
as zlib.
\item MacOS code now in main CVS tree.
\item SF patch \#418147 Fixes to allow compiling w/ Borland, from Stephen Hansen.
\item Add support for Windows using "mbcs" as the default Unicode encoding when dealing with the file system. As discussed on python-dev and in patch 410465.
\item Lots of patches to dictionaries; measure performance improvement, if any.
\item Patch \#430754: Makes ftpmirror.py .netrc aware
\item Fix bug reported by Tim Peters on python-dev:
Keyword arguments passed to builtin functions that don't take them are
ignored.
>>> {}.clear(x=2)
>>>
instead of
>>> {}.clear(x=2)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: clear() takes no keyword arguments
\item Make the license GPL-compatible.
\item This change adds two new C-level APIs: PyEval_SetProfile() and
PyEval_SetTrace(). These can be used to install profile and trace
functions implemented in C, which can operate at much higher speeds
than Python-based functions. The overhead for calling a C-based
profile function is a very small fraction of a percent of the overhead
involved in calling a Python-based function.
The machinery required to call a Python-based profile or trace
function been moved to sysmodule.c, where sys.setprofile() and
sys.setprofile() simply become users of the new interface.
\item 'Advanced' xrange() features now deprecated: repeat, slice,
contains, tolist(), and the start/stop/step attributes. This includes
removing the 4th ('repeat') argument to PyRange_New().
\item The call_object() function, originally in ceval.c, begins a new life
%as the official API PyObject_Call(). It is also much simplified: all
%it does is call the tp_call slot, or raise an exception if that's
%NULL.
%The subsidiary functions (call_eval_code2(), call_cfunction(),
%call_instance(), and call_method()) have all been moved to the file
%implementing their particular object type, renamed according to the
%local convention, and added to the type's tp_call slot. Note that
%call_eval_code2() became function_call(); the tp_slot for class
%objects now simply points to PyInstance_New(), which already has the
%correct signature.
%Because of these moves, there are some more new APIs that expose
%helpers in ceval.c that are now needed outside: PyEval_GetFuncName(),
%PyEval_GetFuncDesc(), PyEval_EvalCodeEx() (formerly get_func_name(),
%get_func_desc(), and eval_code2().
\end{itemize}
%======================================================================
\section{Acknowledgements}
The author would like to thank the following people for offering
suggestions on various drafts of this article: No one yet.
\end{document}
|