1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
|
\documentclass{howto}
% $Id$
\title{What's New in Python 2.2}
\release{1.02}
\author{A.M. Kuchling}
\authoraddress{
\strong{Python Software Foundation}\\
Email: \email{amk@amk.ca}
}
\begin{document}
\maketitle\tableofcontents
\section{Introduction}
This article explains the new features in Python 2.2.2, released on
October 14, 2002. Python 2.2.2 is a bugfix release of Python 2.2,
originally released on December 21, 2001.
Python 2.2 can be thought of as the "cleanup release". There are some
features such as generators and iterators that are completely new, but
most of the changes, significant and far-reaching though they may be,
are aimed at cleaning up irregularities and dark corners of the
language design.
This article doesn't attempt to provide a complete specification of
the new features, but instead provides a convenient overview. For
full details, you should refer to the documentation for Python 2.2,
such as the
\citetitle[http://www.python.org/doc/2.2/lib/lib.html]{Python
Library Reference} and the
\citetitle[http://www.python.org/doc/2.2/ref/ref.html]{Python
Reference Manual}. If you want to understand the complete
implementation and design rationale for a change, refer to the PEP for
a particular new feature.
\begin{seealso}
\seeurl{http://www.unixreview.com/documents/s=1356/urm0109h/0109h.htm}
{``What's So Special About Python 2.2?'' is also about the new 2.2
features, and was written by Cameron Laird and Kathryn Soraiz.}
\end{seealso}
%======================================================================
\section{PEPs 252 and 253: Type and Class Changes}
The largest and most far-reaching changes in Python 2.2 are to
Python's model of objects and classes. The changes should be backward
compatible, so it's likely that your code will continue to run
unchanged, but the changes provide some amazing new capabilities.
Before beginning this, the longest and most complicated section of
this article, I'll provide an overview of the changes and offer some
comments.
A long time ago I wrote a Web page
(\url{http://www.amk.ca/python/writing/warts.html}) listing flaws in
Python's design. One of the most significant flaws was that it's
impossible to subclass Python types implemented in C. In particular,
it's not possible to subclass built-in types, so you can't just
subclass, say, lists in order to add a single useful method to them.
The \module{UserList} module provides a class that supports all of the
methods of lists and that can be subclassed further, but there's lots
of C code that expects a regular Python list and won't accept a
\class{UserList} instance.
Python 2.2 fixes this, and in the process adds some exciting new
capabilities. A brief summary:
\begin{itemize}
\item You can subclass built-in types such as lists and even integers,
and your subclasses should work in every place that requires the
original type.
\item It's now possible to define static and class methods, in addition
to the instance methods available in previous versions of Python.
\item It's also possible to automatically call methods on accessing or
setting an instance attribute by using a new mechanism called
\dfn{properties}. Many uses of \method{__getattr__} can be rewritten
to use properties instead, making the resulting code simpler and
faster. As a small side benefit, attributes can now have docstrings,
too.
\item The list of legal attributes for an instance can be limited to a
particular set using \dfn{slots}, making it possible to safeguard
against typos and perhaps make more optimizations possible in future
versions of Python.
\end{itemize}
Some users have voiced concern about all these changes. Sure, they
say, the new features are neat and lend themselves to all sorts of
tricks that weren't possible in previous versions of Python, but
they also make the language more complicated. Some people have said
that they've always recommended Python for its simplicity, and feel
that its simplicity is being lost.
Personally, I think there's no need to worry. Many of the new
features are quite esoteric, and you can write a lot of Python code
without ever needed to be aware of them. Writing a simple class is no
more difficult than it ever was, so you don't need to bother learning
or teaching them unless they're actually needed. Some very
complicated tasks that were previously only possible from C will now
be possible in pure Python, and to my mind that's all for the better.
I'm not going to attempt to cover every single corner case and small
change that were required to make the new features work. Instead this
section will paint only the broad strokes. See section~\ref{sect-rellinks},
``Related Links'', for further sources of information about Python 2.2's new
object model.
\subsection{Old and New Classes}
First, you should know that Python 2.2 really has two kinds of
classes: classic or old-style classes, and new-style classes. The
old-style class model is exactly the same as the class model in
earlier versions of Python. All the new features described in this
section apply only to new-style classes. This divergence isn't
intended to last forever; eventually old-style classes will be
dropped, possibly in Python 3.0.
So how do you define a new-style class? You do it by subclassing an
existing new-style class. Most of Python's built-in types, such as
integers, lists, dictionaries, and even files, are new-style classes
now. A new-style class named \class{object}, the base class for all
built-in types, has also been added so if no built-in type is
suitable, you can just subclass \class{object}:
\begin{verbatim}
class C(object):
def __init__ (self):
...
...
\end{verbatim}
This means that \keyword{class} statements that don't have any base
classes are always classic classes in Python 2.2. (Actually you can
also change this by setting a module-level variable named
\member{__metaclass__} --- see \pep{253} for the details --- but it's
easier to just subclass \keyword{object}.)
The type objects for the built-in types are available as built-ins,
named using a clever trick. Python has always had built-in functions
named \function{int()}, \function{float()}, and \function{str()}. In
2.2, they aren't functions any more, but type objects that behave as
factories when called.
\begin{verbatim}
>>> int
<type 'int'>
>>> int('123')
123
\end{verbatim}
To make the set of types complete, new type objects such as
\function{dict} and \function{file} have been added. Here's a
more interesting example, adding a \method{lock()} method to file
objects:
\begin{verbatim}
class LockableFile(file):
def lock (self, operation, length=0, start=0, whence=0):
import fcntl
return fcntl.lockf(self.fileno(), operation,
length, start, whence)
\end{verbatim}
The now-obsolete \module{posixfile} module contained a class that
emulated all of a file object's methods and also added a
\method{lock()} method, but this class couldn't be passed to internal
functions that expected a built-in file, something which is possible
with our new \class{LockableFile}.
\subsection{Descriptors}
In previous versions of Python, there was no consistent way to
discover what attributes and methods were supported by an object.
There were some informal conventions, such as defining
\member{__members__} and \member{__methods__} attributes that were
lists of names, but often the author of an extension type or a class
wouldn't bother to define them. You could fall back on inspecting the
\member{__dict__} of an object, but when class inheritance or an
arbitrary \method{__getattr__} hook were in use this could still be
inaccurate.
The one big idea underlying the new class model is that an API for
describing the attributes of an object using \dfn{descriptors} has
been formalized. Descriptors specify the value of an attribute,
stating whether it's a method or a field. With the descriptor API,
static methods and class methods become possible, as well as more
exotic constructs.
Attribute descriptors are objects that live inside class objects, and
have a few attributes of their own:
\begin{itemize}
\item \member{__name__} is the attribute's name.
\item \member{__doc__} is the attribute's docstring.
\item \method{__get__(\var{object})} is a method that retrieves the
attribute value from \var{object}.
\item \method{__set__(\var{object}, \var{value})} sets the attribute
on \var{object} to \var{value}.
\item \method{__delete__(\var{object}, \var{value})} deletes the \var{value}
attribute of \var{object}.
\end{itemize}
For example, when you write \code{obj.x}, the steps that Python
actually performs are:
\begin{verbatim}
descriptor = obj.__class__.x
descriptor.__get__(obj)
\end{verbatim}
For methods, \method{descriptor.__get__} returns a temporary object that's
callable, and wraps up the instance and the method to be called on it.
This is also why static methods and class methods are now possible;
they have descriptors that wrap up just the method, or the method and
the class. As a brief explanation of these new kinds of methods,
static methods aren't passed the instance, and therefore resemble
regular functions. Class methods are passed the class of the object,
but not the object itself. Static and class methods are defined like
this:
\begin{verbatim}
class C(object):
def f(arg1, arg2):
...
f = staticmethod(f)
def g(cls, arg1, arg2):
...
g = classmethod(g)
\end{verbatim}
The \function{staticmethod()} function takes the function
\function{f}, and returns it wrapped up in a descriptor so it can be
stored in the class object. You might expect there to be special
syntax for creating such methods (\code{def static f()},
\code{defstatic f()}, or something like that) but no such syntax has
been defined yet; that's been left for future versions of Python.
More new features, such as slots and properties, are also implemented
as new kinds of descriptors, and it's not difficult to write a
descriptor class that does something novel. For example, it would be
possible to write a descriptor class that made it possible to write
Eiffel-style preconditions and postconditions for a method. A class
that used this feature might be defined like this:
\begin{verbatim}
from eiffel import eiffelmethod
class C(object):
def f(self, arg1, arg2):
# The actual function
...
def pre_f(self):
# Check preconditions
...
def post_f(self):
# Check postconditions
...
f = eiffelmethod(f, pre_f, post_f)
\end{verbatim}
Note that a person using the new \function{eiffelmethod()} doesn't
have to understand anything about descriptors. This is why I think
the new features don't increase the basic complexity of the language.
There will be a few wizards who need to know about it in order to
write \function{eiffelmethod()} or the ZODB or whatever, but most
users will just write code on top of the resulting libraries and
ignore the implementation details.
\subsection{Multiple Inheritance: The Diamond Rule}
Multiple inheritance has also been made more useful through changing
the rules under which names are resolved. Consider this set of classes
(diagram taken from \pep{253} by Guido van Rossum):
\begin{verbatim}
class A:
^ ^ def save(self): ...
/ \
/ \
/ \
/ \
class B class C:
^ ^ def save(self): ...
\ /
\ /
\ /
\ /
class D
\end{verbatim}
The lookup rule for classic classes is simple but not very smart; the
base classes are searched depth-first, going from left to right. A
reference to \method{D.save} will search the classes \class{D},
\class{B}, and then \class{A}, where \method{save()} would be found
and returned. \method{C.save()} would never be found at all. This is
bad, because if \class{C}'s \method{save()} method is saving some
internal state specific to \class{C}, not calling it will result in
that state never getting saved.
New-style classes follow a different algorithm that's a bit more
complicated to explain, but does the right thing in this situation.
(Note that Python 2.3 changes this algorithm to one that produces the
same results in most cases, but produces more useful results for
really complicated inheritance graphs.)
\begin{enumerate}
\item List all the base classes, following the classic lookup rule and
include a class multiple times if it's visited repeatedly. In the
above example, the list of visited classes is [\class{D}, \class{B},
\class{A}, \class{C}, \class{A}].
\item Scan the list for duplicated classes. If any are found, remove
all but one occurrence, leaving the \emph{last} one in the list. In
the above example, the list becomes [\class{D}, \class{B}, \class{C},
\class{A}] after dropping duplicates.
\end{enumerate}
Following this rule, referring to \method{D.save()} will return
\method{C.save()}, which is the behaviour we're after. This lookup
rule is the same as the one followed by Common Lisp. A new built-in
function, \function{super()}, provides a way to get at a class's
superclasses without having to reimplement Python's algorithm.
The most commonly used form will be
\function{super(\var{class}, \var{obj})}, which returns
a bound superclass object (not the actual class object). This form
will be used in methods to call a method in the superclass; for
example, \class{D}'s \method{save()} method would look like this:
\begin{verbatim}
class D (B,C):
def save (self):
# Call superclass .save()
super(D, self).save()
# Save D's private information here
...
\end{verbatim}
\function{super()} can also return unbound superclass objects
when called as \function{super(\var{class})} or
\function{super(\var{class1}, \var{class2})}, but this probably won't
often be useful.
\subsection{Attribute Access}
A fair number of sophisticated Python classes define hooks for
attribute access using \method{__getattr__}; most commonly this is
done for convenience, to make code more readable by automatically
mapping an attribute access such as \code{obj.parent} into a method
call such as \code{obj.get_parent()}. Python 2.2 adds some new ways
of controlling attribute access.
First, \method{__getattr__(\var{attr_name})} is still supported by
new-style classes, and nothing about it has changed. As before, it
will be called when an attempt is made to access \code{obj.foo} and no
attribute named \samp{foo} is found in the instance's dictionary.
New-style classes also support a new method,
\method{__getattribute__(\var{attr_name})}. The difference between
the two methods is that \method{__getattribute__} is \emph{always}
called whenever any attribute is accessed, while the old
\method{__getattr__} is only called if \samp{foo} isn't found in the
instance's dictionary.
However, Python 2.2's support for \dfn{properties} will often be a
simpler way to trap attribute references. Writing a
\method{__getattr__} method is complicated because to avoid recursion
you can't use regular attribute accesses inside them, and instead have
to mess around with the contents of \member{__dict__}.
\method{__getattr__} methods also end up being called by Python when
it checks for other methods such as \method{__repr__} or
\method{__coerce__}, and so have to be written with this in mind.
Finally, calling a function on every attribute access results in a
sizable performance loss.
\class{property} is a new built-in type that packages up three
functions that get, set, or delete an attribute, and a docstring. For
example, if you want to define a \member{size} attribute that's
computed, but also settable, you could write:
\begin{verbatim}
class C(object):
def get_size (self):
result = ... computation ...
return result
def set_size (self, size):
... compute something based on the size
and set internal state appropriately ...
# Define a property. The 'delete this attribute'
# method is defined as None, so the attribute
# can't be deleted.
size = property(get_size, set_size,
None,
"Storage size of this instance")
\end{verbatim}
That is certainly clearer and easier to write than a pair of
\method{__getattr__}/\method{__setattr__} methods that check for the
\member{size} attribute and handle it specially while retrieving all
other attributes from the instance's \member{__dict__}. Accesses to
\member{size} are also the only ones which have to perform the work of
calling a function, so references to other attributes run at
their usual speed.
Finally, it's possible to constrain the list of attributes that can be
referenced on an object using the new \member{__slots__} class attribute.
Python objects are usually very dynamic; at any time it's possible to
define a new attribute on an instance by just doing
\code{obj.new_attr=1}. A new-style class can define a class attribute named
\member{__slots__} to limit the legal attributes
to a particular set of names. An example will make this clear:
\begin{verbatim}
>>> class C(object):
... __slots__ = ('template', 'name')
...
>>> obj = C()
>>> print obj.template
None
>>> obj.template = 'Test'
>>> print obj.template
Test
>>> obj.newattr = None
Traceback (most recent call last):
File "<stdin>", line 1, in ?
AttributeError: 'C' object has no attribute 'newattr'
\end{verbatim}
Note how you get an \exception{AttributeError} on the attempt to
assign to an attribute not listed in \member{__slots__}.
\subsection{Related Links}
\label{sect-rellinks}
This section has just been a quick overview of the new features,
giving enough of an explanation to start you programming, but many
details have been simplified or ignored. Where should you go to get a
more complete picture?
\url{http://www.python.org/2.2/descrintro.html} is a lengthy tutorial
introduction to the descriptor features, written by Guido van Rossum.
If my description has whetted your appetite, go read this tutorial
next, because it goes into much more detail about the new features
while still remaining quite easy to read.
Next, there are two relevant PEPs, \pep{252} and \pep{253}. \pep{252}
is titled "Making Types Look More Like Classes", and covers the
descriptor API. \pep{253} is titled "Subtyping Built-in Types", and
describes the changes to type objects that make it possible to subtype
built-in objects. \pep{253} is the more complicated PEP of the two,
and at a few points the necessary explanations of types and meta-types
may cause your head to explode. Both PEPs were written and
implemented by Guido van Rossum, with substantial assistance from the
rest of the Zope Corp. team.
Finally, there's the ultimate authority: the source code. Most of the
machinery for the type handling is in \file{Objects/typeobject.c}, but
you should only resort to it after all other avenues have been
exhausted, including posting a question to python-list or python-dev.
%======================================================================
\section{PEP 234: Iterators}
Another significant addition to 2.2 is an iteration interface at both
the C and Python levels. Objects can define how they can be looped
over by callers.
In Python versions up to 2.1, the usual way to make \code{for item in
obj} work is to define a \method{__getitem__()} method that looks
something like this:
\begin{verbatim}
def __getitem__(self, index):
return <next item>
\end{verbatim}
\method{__getitem__()} is more properly used to define an indexing
operation on an object so that you can write \code{obj[5]} to retrieve
the sixth element. It's a bit misleading when you're using this only
to support \keyword{for} loops. Consider some file-like object that
wants to be looped over; the \var{index} parameter is essentially
meaningless, as the class probably assumes that a series of
\method{__getitem__()} calls will be made with \var{index}
incrementing by one each time. In other words, the presence of the
\method{__getitem__()} method doesn't mean that using \code{file[5]}
to randomly access the sixth element will work, though it really should.
In Python 2.2, iteration can be implemented separately, and
\method{__getitem__()} methods can be limited to classes that really
do support random access. The basic idea of iterators is
simple. A new built-in function, \function{iter(obj)} or
\code{iter(\var{C}, \var{sentinel})}, is used to get an iterator.
\function{iter(obj)} returns an iterator for the object \var{obj},
while \code{iter(\var{C}, \var{sentinel})} returns an iterator that
will invoke the callable object \var{C} until it returns
\var{sentinel} to signal that the iterator is done.
Python classes can define an \method{__iter__()} method, which should
create and return a new iterator for the object; if the object is its
own iterator, this method can just return \code{self}. In particular,
iterators will usually be their own iterators. Extension types
implemented in C can implement a \member{tp_iter} function in order to
return an iterator, and extension types that want to behave as
iterators can define a \member{tp_iternext} function.
So, after all this, what do iterators actually do? They have one
required method, \method{next()}, which takes no arguments and returns
the next value. When there are no more values to be returned, calling
\method{next()} should raise the \exception{StopIteration} exception.
\begin{verbatim}
>>> L = [1,2,3]
>>> i = iter(L)
>>> print i
<iterator object at 0x8116870>
>>> i.next()
1
>>> i.next()
2
>>> i.next()
3
>>> i.next()
Traceback (most recent call last):
File "<stdin>", line 1, in ?
StopIteration
>>>
\end{verbatim}
In 2.2, Python's \keyword{for} statement no longer expects a sequence;
it expects something for which \function{iter()} will return an iterator.
For backward compatibility and convenience, an iterator is
automatically constructed for sequences that don't implement
\method{__iter__()} or a \member{tp_iter} slot, so \code{for i in
[1,2,3]} will still work. Wherever the Python interpreter loops over
a sequence, it's been changed to use the iterator protocol. This
means you can do things like this:
\begin{verbatim}
>>> L = [1,2,3]
>>> i = iter(L)
>>> a,b,c = i
>>> a,b,c
(1, 2, 3)
\end{verbatim}
Iterator support has been added to some of Python's basic types.
Calling \function{iter()} on a dictionary will return an iterator
which loops over its keys:
\begin{verbatim}
>>> m = {'Jan': 1, 'Feb': 2, 'Mar': 3, 'Apr': 4, 'May': 5, 'Jun': 6,
... 'Jul': 7, 'Aug': 8, 'Sep': 9, 'Oct': 10, 'Nov': 11, 'Dec': 12}
>>> for key in m: print key, m[key]
...
Mar 3
Feb 2
Aug 8
Sep 9
May 5
Jun 6
Jul 7
Jan 1
Apr 4
Nov 11
Dec 12
Oct 10
\end{verbatim}
That's just the default behaviour. If you want to iterate over keys,
values, or key/value pairs, you can explicitly call the
\method{iterkeys()}, \method{itervalues()}, or \method{iteritems()}
methods to get an appropriate iterator. In a minor related change,
the \keyword{in} operator now works on dictionaries, so
\code{\var{key} in dict} is now equivalent to
\code{dict.has_key(\var{key})}.
Files also provide an iterator, which calls the \method{readline()}
method until there are no more lines in the file. This means you can
now read each line of a file using code like this:
\begin{verbatim}
for line in file:
# do something for each line
...
\end{verbatim}
Note that you can only go forward in an iterator; there's no way to
get the previous element, reset the iterator, or make a copy of it.
An iterator object could provide such additional capabilities, but the
iterator protocol only requires a \method{next()} method.
\begin{seealso}
\seepep{234}{Iterators}{Written by Ka-Ping Yee and GvR; implemented
by the Python Labs crew, mostly by GvR and Tim Peters.}
\end{seealso}
%======================================================================
\section{PEP 255: Simple Generators}
Generators are another new feature, one that interacts with the
introduction of iterators.
You're doubtless familiar with how function calls work in Python or
C. When you call a function, it gets a private namespace where its local
variables are created. When the function reaches a \keyword{return}
statement, the local variables are destroyed and the resulting value
is returned to the caller. A later call to the same function will get
a fresh new set of local variables. But, what if the local variables
weren't thrown away on exiting a function? What if you could later
resume the function where it left off? This is what generators
provide; they can be thought of as resumable functions.
Here's the simplest example of a generator function:
\begin{verbatim}
def generate_ints(N):
for i in range(N):
yield i
\end{verbatim}
A new keyword, \keyword{yield}, was introduced for generators. Any
function containing a \keyword{yield} statement is a generator
function; this is detected by Python's bytecode compiler which
compiles the function specially as a result. Because a new keyword was
introduced, generators must be explicitly enabled in a module by
including a \code{from __future__ import generators} statement near
the top of the module's source code. In Python 2.3 this statement
will become unnecessary.
When you call a generator function, it doesn't return a single value;
instead it returns a generator object that supports the iterator
protocol. On executing the \keyword{yield} statement, the generator
outputs the value of \code{i}, similar to a \keyword{return}
statement. The big difference between \keyword{yield} and a
\keyword{return} statement is that on reaching a \keyword{yield} the
generator's state of execution is suspended and local variables are
preserved. On the next call to the generator's \code{next()} method,
the function will resume executing immediately after the
\keyword{yield} statement. (For complicated reasons, the
\keyword{yield} statement isn't allowed inside the \keyword{try} block
of a \keyword{try}...\keyword{finally} statement; read \pep{255} for a full
explanation of the interaction between \keyword{yield} and
exceptions.)
Here's a sample usage of the \function{generate_ints} generator:
\begin{verbatim}
>>> gen = generate_ints(3)
>>> gen
<generator object at 0x8117f90>
>>> gen.next()
0
>>> gen.next()
1
>>> gen.next()
2
>>> gen.next()
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "<stdin>", line 2, in generate_ints
StopIteration
\end{verbatim}
You could equally write \code{for i in generate_ints(5)}, or
\code{a,b,c = generate_ints(3)}.
Inside a generator function, the \keyword{return} statement can only
be used without a value, and signals the end of the procession of
values; afterwards the generator cannot return any further values.
\keyword{return} with a value, such as \code{return 5}, is a syntax
error inside a generator function. The end of the generator's results
can also be indicated by raising \exception{StopIteration} manually,
or by just letting the flow of execution fall off the bottom of the
function.
You could achieve the effect of generators manually by writing your
own class and storing all the local variables of the generator as
instance variables. For example, returning a list of integers could
be done by setting \code{self.count} to 0, and having the
\method{next()} method increment \code{self.count} and return it.
However, for a moderately complicated generator, writing a
corresponding class would be much messier.
\file{Lib/test/test_generators.py} contains a number of more
interesting examples. The simplest one implements an in-order
traversal of a tree using generators recursively.
\begin{verbatim}
# A recursive generator that generates Tree leaves in in-order.
def inorder(t):
if t:
for x in inorder(t.left):
yield x
yield t.label
for x in inorder(t.right):
yield x
\end{verbatim}
Two other examples in \file{Lib/test/test_generators.py} produce
solutions for the N-Queens problem (placing $N$ queens on an $NxN$
chess board so that no queen threatens another) and the Knight's Tour
(a route that takes a knight to every square of an $NxN$ chessboard
without visiting any square twice).
The idea of generators comes from other programming languages,
especially Icon (\url{http://www.cs.arizona.edu/icon/}), where the
idea of generators is central. In Icon, every
expression and function call behaves like a generator. One example
from ``An Overview of the Icon Programming Language'' at
\url{http://www.cs.arizona.edu/icon/docs/ipd266.htm} gives an idea of
what this looks like:
\begin{verbatim}
sentence := "Store it in the neighboring harbor"
if (i := find("or", sentence)) > 5 then write(i)
\end{verbatim}
In Icon the \function{find()} function returns the indexes at which the
substring ``or'' is found: 3, 23, 33. In the \keyword{if} statement,
\code{i} is first assigned a value of 3, but 3 is less than 5, so the
comparison fails, and Icon retries it with the second value of 23. 23
is greater than 5, so the comparison now succeeds, and the code prints
the value 23 to the screen.
Python doesn't go nearly as far as Icon in adopting generators as a
central concept. Generators are considered a new part of the core
Python language, but learning or using them isn't compulsory; if they
don't solve any problems that you have, feel free to ignore them.
One novel feature of Python's interface as compared to
Icon's is that a generator's state is represented as a concrete object
(the iterator) that can be passed around to other functions or stored
in a data structure.
\begin{seealso}
\seepep{255}{Simple Generators}{Written by Neil Schemenauer, Tim
Peters, Magnus Lie Hetland. Implemented mostly by Neil Schemenauer
and Tim Peters, with other fixes from the Python Labs crew.}
\end{seealso}
%======================================================================
\section{PEP 237: Unifying Long Integers and Integers}
In recent versions, the distinction between regular integers, which
are 32-bit values on most machines, and long integers, which can be of
arbitrary size, was becoming an annoyance. For example, on platforms
that support files larger than \code{2**32} bytes, the
\method{tell()} method of file objects has to return a long integer.
However, there were various bits of Python that expected plain
integers and would raise an error if a long integer was provided
instead. For example, in Python 1.5, only regular integers
could be used as a slice index, and \code{'abc'[1L:]} would raise a
\exception{TypeError} exception with the message 'slice index must be
int'.
Python 2.2 will shift values from short to long integers as required.
The 'L' suffix is no longer needed to indicate a long integer literal,
as now the compiler will choose the appropriate type. (Using the 'L'
suffix will be discouraged in future 2.x versions of Python,
triggering a warning in Python 2.4, and probably dropped in Python
3.0.) Many operations that used to raise an \exception{OverflowError}
will now return a long integer as their result. For example:
\begin{verbatim}
>>> 1234567890123
1234567890123L
>>> 2 ** 64
18446744073709551616L
\end{verbatim}
In most cases, integers and long integers will now be treated
identically. You can still distinguish them with the
\function{type()} built-in function, but that's rarely needed.
\begin{seealso}
\seepep{237}{Unifying Long Integers and Integers}{Written by
Moshe Zadka and Guido van Rossum. Implemented mostly by Guido van
Rossum.}
\end{seealso}
%======================================================================
\section{PEP 238: Changing the Division Operator}
The most controversial change in Python 2.2 heralds the start of an effort
to fix an old design flaw that's been in Python from the beginning.
Currently Python's division operator, \code{/}, behaves like C's
division operator when presented with two integer arguments: it
returns an integer result that's truncated down when there would be
a fractional part. For example, \code{3/2} is 1, not 1.5, and
\code{(-1)/2} is -1, not -0.5. This means that the results of divison
can vary unexpectedly depending on the type of the two operands and
because Python is dynamically typed, it can be difficult to determine
the possible types of the operands.
(The controversy is over whether this is \emph{really} a design flaw,
and whether it's worth breaking existing code to fix this. It's
caused endless discussions on python-dev, and in July 2001 erupted into an
storm of acidly sarcastic postings on \newsgroup{comp.lang.python}. I
won't argue for either side here and will stick to describing what's
implemented in 2.2. Read \pep{238} for a summary of arguments and
counter-arguments.)
Because this change might break code, it's being introduced very
gradually. Python 2.2 begins the transition, but the switch won't be
complete until Python 3.0.
First, I'll borrow some terminology from \pep{238}. ``True division'' is the
division that most non-programmers are familiar with: 3/2 is 1.5, 1/4
is 0.25, and so forth. ``Floor division'' is what Python's \code{/}
operator currently does when given integer operands; the result is the
floor of the value returned by true division. ``Classic division'' is
the current mixed behaviour of \code{/}; it returns the result of
floor division when the operands are integers, and returns the result
of true division when one of the operands is a floating-point number.
Here are the changes 2.2 introduces:
\begin{itemize}
\item A new operator, \code{//}, is the floor division operator.
(Yes, we know it looks like \Cpp's comment symbol.) \code{//}
\emph{always} performs floor division no matter what the types of
its operands are, so \code{1 // 2} is 0 and \code{1.0 // 2.0} is also
0.0.
\code{//} is always available in Python 2.2; you don't need to enable
it using a \code{__future__} statement.
\item By including a \code{from __future__ import division} in a
module, the \code{/} operator will be changed to return the result of
true division, so \code{1/2} is 0.5. Without the \code{__future__}
statement, \code{/} still means classic division. The default meaning
of \code{/} will not change until Python 3.0.
\item Classes can define methods called \method{__truediv__} and
\method{__floordiv__} to overload the two division operators. At the
C level, there are also slots in the \ctype{PyNumberMethods} structure
so extension types can define the two operators.
\item Python 2.2 supports some command-line arguments for testing
whether code will works with the changed division semantics. Running
python with \programopt{-Q warn} will cause a warning to be issued
whenever division is applied to two integers. You can use this to
find code that's affected by the change and fix it. By default,
Python 2.2 will simply perform classic division without a warning; the
warning will be turned on by default in Python 2.3.
\end{itemize}
\begin{seealso}
\seepep{238}{Changing the Division Operator}{Written by Moshe Zadka and
Guido van Rossum. Implemented by Guido van Rossum..}
\end{seealso}
%======================================================================
\section{Unicode Changes}
Python's Unicode support has been enhanced a bit in 2.2. Unicode
strings are usually stored as UCS-2, as 16-bit unsigned integers.
Python 2.2 can also be compiled to use UCS-4, 32-bit unsigned
integers, as its internal encoding by supplying
\longprogramopt{enable-unicode=ucs4} to the configure script.
(It's also possible to specify
\longprogramopt{disable-unicode} to completely disable Unicode
support.)
When built to use UCS-4 (a ``wide Python''), the interpreter can
natively handle Unicode characters from U+000000 to U+110000, so the
range of legal values for the \function{unichr()} function is expanded
accordingly. Using an interpreter compiled to use UCS-2 (a ``narrow
Python''), values greater than 65535 will still cause
\function{unichr()} to raise a \exception{ValueError} exception.
This is all described in \pep{261}, ``Support for `wide' Unicode
characters''; consult it for further details.
Another change is simpler to explain. Since their introduction,
Unicode strings have supported an \method{encode()} method to convert
the string to a selected encoding such as UTF-8 or Latin-1. A
symmetric \method{decode(\optional{\var{encoding}})} method has been
added to 8-bit strings (though not to Unicode strings) in 2.2.
\method{decode()} assumes that the string is in the specified encoding
and decodes it, returning whatever is returned by the codec.
Using this new feature, codecs have been added for tasks not directly
related to Unicode. For example, codecs have been added for
uu-encoding, MIME's base64 encoding, and compression with the
\module{zlib} module:
\begin{verbatim}
>>> s = """Here is a lengthy piece of redundant, overly verbose,
... and repetitive text.
... """
>>> data = s.encode('zlib')
>>> data
'x\x9c\r\xc9\xc1\r\x80 \x10\x04\xc0?Ul...'
>>> data.decode('zlib')
'Here is a lengthy piece of redundant, overly verbose,\nand repetitive text.\n'
>>> print s.encode('uu')
begin 666 <data>
M2&5R92!I<R!A(&QE;F=T:'D@<&EE8V4@;V8@<F5D=6YD86YT+"!O=F5R;'D@
>=F5R8F]S92P*86YD(')E<&5T:71I=F4@=&5X="X*
end
>>> "sheesh".encode('rot-13')
'furrfu'
\end{verbatim}
To convert a class instance to Unicode, a \method{__unicode__} method
can be defined by a class, analogous to \method{__str__}.
\method{encode()}, \method{decode()}, and \method{__unicode__} were
implemented by Marc-Andr\'e Lemburg. The changes to support using
UCS-4 internally were implemented by Fredrik Lundh and Martin von
L\"owis.
\begin{seealso}
\seepep{261}{Support for `wide' Unicode characters}{Written by
Paul Prescod.}
\end{seealso}
%======================================================================
\section{PEP 227: Nested Scopes}
In Python 2.1, statically nested scopes were added as an optional
feature, to be enabled by a \code{from __future__ import
nested_scopes} directive. In 2.2 nested scopes no longer need to be
specially enabled, and are now always present. The rest of this section
is a copy of the description of nested scopes from my ``What's New in
Python 2.1'' document; if you read it when 2.1 came out, you can skip
the rest of this section.
The largest change introduced in Python 2.1, and made complete in 2.2,
is to Python's scoping rules. In Python 2.0, at any given time there
are at most three namespaces used to look up variable names: local,
module-level, and the built-in namespace. This often surprised people
because it didn't match their intuitive expectations. For example, a
nested recursive function definition doesn't work:
\begin{verbatim}
def f():
...
def g(value):
...
return g(value-1) + 1
...
\end{verbatim}
The function \function{g()} will always raise a \exception{NameError}
exception, because the binding of the name \samp{g} isn't in either
its local namespace or in the module-level namespace. This isn't much
of a problem in practice (how often do you recursively define interior
functions like this?), but this also made using the \keyword{lambda}
statement clumsier, and this was a problem in practice. In code which
uses \keyword{lambda} you can often find local variables being copied
by passing them as the default values of arguments.
\begin{verbatim}
def find(self, name):
"Return list of any entries equal to 'name'"
L = filter(lambda x, name=name: x == name,
self.list_attribute)
return L
\end{verbatim}
The readability of Python code written in a strongly functional style
suffers greatly as a result.
The most significant change to Python 2.2 is that static scoping has
been added to the language to fix this problem. As a first effect,
the \code{name=name} default argument is now unnecessary in the above
example. Put simply, when a given variable name is not assigned a
value within a function (by an assignment, or the \keyword{def},
\keyword{class}, or \keyword{import} statements), references to the
variable will be looked up in the local namespace of the enclosing
scope. A more detailed explanation of the rules, and a dissection of
the implementation, can be found in the PEP.
This change may cause some compatibility problems for code where the
same variable name is used both at the module level and as a local
variable within a function that contains further function definitions.
This seems rather unlikely though, since such code would have been
pretty confusing to read in the first place.
One side effect of the change is that the \code{from \var{module}
import *} and \keyword{exec} statements have been made illegal inside
a function scope under certain conditions. The Python reference
manual has said all along that \code{from \var{module} import *} is
only legal at the top level of a module, but the CPython interpreter
has never enforced this before. As part of the implementation of
nested scopes, the compiler which turns Python source into bytecodes
has to generate different code to access variables in a containing
scope. \code{from \var{module} import *} and \keyword{exec} make it
impossible for the compiler to figure this out, because they add names
to the local namespace that are unknowable at compile time.
Therefore, if a function contains function definitions or
\keyword{lambda} expressions with free variables, the compiler will
flag this by raising a \exception{SyntaxError} exception.
To make the preceding explanation a bit clearer, here's an example:
\begin{verbatim}
x = 1
def f():
# The next line is a syntax error
exec 'x=2'
def g():
return x
\end{verbatim}
Line 4 containing the \keyword{exec} statement is a syntax error,
since \keyword{exec} would define a new local variable named \samp{x}
whose value should be accessed by \function{g()}.
This shouldn't be much of a limitation, since \keyword{exec} is rarely
used in most Python code (and when it is used, it's often a sign of a
poor design anyway).
\begin{seealso}
\seepep{227}{Statically Nested Scopes}{Written and implemented by
Jeremy Hylton.}
\end{seealso}
%======================================================================
\section{New and Improved Modules}
\begin{itemize}
\item The \module{xmlrpclib} module was contributed to the standard
library by Fredrik Lundh, providing support for writing XML-RPC
clients. XML-RPC is a simple remote procedure call protocol built on
top of HTTP and XML. For example, the following snippet retrieves a
list of RSS channels from the O'Reilly Network, and then
lists the recent headlines for one channel:
\begin{verbatim}
import xmlrpclib
s = xmlrpclib.Server(
'http://www.oreillynet.com/meerkat/xml-rpc/server.php')
channels = s.meerkat.getChannels()
# channels is a list of dictionaries, like this:
# [{'id': 4, 'title': 'Freshmeat Daily News'}
# {'id': 190, 'title': '32Bits Online'},
# {'id': 4549, 'title': '3DGamers'}, ... ]
# Get the items for one channel
items = s.meerkat.getItems( {'channel': 4} )
# 'items' is another list of dictionaries, like this:
# [{'link': 'http://freshmeat.net/releases/52719/',
# 'description': 'A utility which converts HTML to XSL FO.',
# 'title': 'html2fo 0.3 (Default)'}, ... ]
\end{verbatim}
The \module{SimpleXMLRPCServer} module makes it easy to create
straightforward XML-RPC servers. See \url{http://www.xmlrpc.com/} for
more information about XML-RPC.
\item The new \module{hmac} module implements the HMAC
algorithm described by \rfc{2104}.
(Contributed by Gerhard H\"aring.)
\item Several functions that originally returned lengthy tuples now
return pseudo-sequences that still behave like tuples but also have
mnemonic attributes such as member{st_mtime} or \member{tm_year}.
The enhanced functions include \function{stat()},
\function{fstat()}, \function{statvfs()}, and \function{fstatvfs()}
in the \module{os} module, and \function{localtime()},
\function{gmtime()}, and \function{strptime()} in the \module{time}
module.
For example, to obtain a file's size using the old tuples, you'd end
up writing something like \code{file_size =
os.stat(filename)[stat.ST_SIZE]}, but now this can be written more
clearly as \code{file_size = os.stat(filename).st_size}.
The original patch for this feature was contributed by Nick Mathewson.
\item The Python profiler has been extensively reworked and various
errors in its output have been corrected. (Contributed by
Fred~L. Drake, Jr. and Tim Peters.)
\item The \module{socket} module can be compiled to support IPv6;
specify the \longprogramopt{enable-ipv6} option to Python's configure
script. (Contributed by Jun-ichiro ``itojun'' Hagino.)
\item Two new format characters were added to the \module{struct}
module for 64-bit integers on platforms that support the C
\ctype{long long} type. \samp{q} is for a signed 64-bit integer,
and \samp{Q} is for an unsigned one. The value is returned in
Python's long integer type. (Contributed by Tim Peters.)
\item In the interpreter's interactive mode, there's a new built-in
function \function{help()} that uses the \module{pydoc} module
introduced in Python 2.1 to provide interactive help.
\code{help(\var{object})} displays any available help text about
\var{object}. \function{help()} with no argument puts you in an online
help utility, where you can enter the names of functions, classes,
or modules to read their help text.
(Contributed by Guido van Rossum, using Ka-Ping Yee's \module{pydoc} module.)
\item Various bugfixes and performance improvements have been made
to the SRE engine underlying the \module{re} module. For example,
the \function{re.sub()} and \function{re.split()} functions have
been rewritten in C. Another contributed patch speeds up certain
Unicode character ranges by a factor of two, and a new \method{finditer()}
method that returns an iterator over all the non-overlapping matches in
a given string.
(SRE is maintained by
Fredrik Lundh. The BIGCHARSET patch was contributed by Martin von
L\"owis.)
\item The \module{smtplib} module now supports \rfc{2487}, ``Secure
SMTP over TLS'', so it's now possible to encrypt the SMTP traffic
between a Python program and the mail transport agent being handed a
message. \module{smtplib} also supports SMTP authentication.
(Contributed by Gerhard H\"aring.)
\item The \module{imaplib} module, maintained by Piers Lauder, has
support for several new extensions: the NAMESPACE extension defined
in \rfc{2342}, SORT, GETACL and SETACL. (Contributed by Anthony
Baxter and Michel Pelletier.)
\item The \module{rfc822} module's parsing of email addresses is now
compliant with \rfc{2822}, an update to \rfc{822}. (The module's
name is \emph{not} going to be changed to \samp{rfc2822}.) A new
package, \module{email}, has also been added for parsing and
generating e-mail messages. (Contributed by Barry Warsaw, and
arising out of his work on Mailman.)
\item The \module{difflib} module now contains a new \class{Differ}
class for producing human-readable lists of changes (a ``delta'')
between two sequences of lines of text. There are also two
generator functions, \function{ndiff()} and \function{restore()},
which respectively return a delta from two sequences, or one of the
original sequences from a delta. (Grunt work contributed by David
Goodger, from ndiff.py code by Tim Peters who then did the
generatorization.)
\item New constants \constant{ascii_letters},
\constant{ascii_lowercase}, and \constant{ascii_uppercase} were
added to the \module{string} module. There were several modules in
the standard library that used \constant{string.letters} to mean the
ranges A-Za-z, but that assumption is incorrect when locales are in
use, because \constant{string.letters} varies depending on the set
of legal characters defined by the current locale. The buggy
modules have all been fixed to use \constant{ascii_letters} instead.
(Reported by an unknown person; fixed by Fred~L. Drake, Jr.)
\item The \module{mimetypes} module now makes it easier to use
alternative MIME-type databases by the addition of a
\class{MimeTypes} class, which takes a list of filenames to be
parsed. (Contributed by Fred~L. Drake, Jr.)
\item A \class{Timer} class was added to the \module{threading}
module that allows scheduling an activity to happen at some future
time. (Contributed by Itamar Shtull-Trauring.)
\end{itemize}
%======================================================================
\section{Interpreter Changes and Fixes}
Some of the changes only affect people who deal with the Python
interpreter at the C level because they're writing Python extension modules,
embedding the interpreter, or just hacking on the interpreter itself.
If you only write Python code, none of the changes described here will
affect you very much.
\begin{itemize}
\item Profiling and tracing functions can now be implemented in C,
which can operate at much higher speeds than Python-based functions
and should reduce the overhead of profiling and tracing. This
will be of interest to authors of development environments for
Python. Two new C functions were added to Python's API,
\cfunction{PyEval_SetProfile()} and \cfunction{PyEval_SetTrace()}.
The existing \function{sys.setprofile()} and
\function{sys.settrace()} functions still exist, and have simply
been changed to use the new C-level interface. (Contributed by Fred
L. Drake, Jr.)
\item Another low-level API, primarily of interest to implementors
of Python debuggers and development tools, was added.
\cfunction{PyInterpreterState_Head()} and
\cfunction{PyInterpreterState_Next()} let a caller walk through all
the existing interpreter objects;
\cfunction{PyInterpreterState_ThreadHead()} and
\cfunction{PyThreadState_Next()} allow looping over all the thread
states for a given interpreter. (Contributed by David Beazley.)
\item The C-level interface to the garbage collector has been changed
to make it easier to write extension types that support garbage
collection and to debug misuses of the functions.
Various functions have slightly different semantics, so a bunch of
functions had to be renamed. Extensions that use the old API will
still compile but will \emph{not} participate in garbage collection,
so updating them for 2.2 should be considered fairly high priority.
To upgrade an extension module to the new API, perform the following
steps:
\begin{itemize}
\item Rename \cfunction{Py_TPFLAGS_GC} to \cfunction{PyTPFLAGS_HAVE_GC}.
\item Use \cfunction{PyObject_GC_New} or \cfunction{PyObject_GC_NewVar} to
allocate objects, and \cfunction{PyObject_GC_Del} to deallocate them.
\item Rename \cfunction{PyObject_GC_Init} to \cfunction{PyObject_GC_Track} and
\cfunction{PyObject_GC_Fini} to \cfunction{PyObject_GC_UnTrack}.
\item Remove \cfunction{PyGC_HEAD_SIZE} from object size calculations.
\item Remove calls to \cfunction{PyObject_AS_GC} and \cfunction{PyObject_FROM_GC}.
\end{itemize}
\item A new \samp{et} format sequence was added to
\cfunction{PyArg_ParseTuple}; \samp{et} takes both a parameter and
an encoding name, and converts the parameter to the given encoding
if the parameter turns out to be a Unicode string, or leaves it
alone if it's an 8-bit string, assuming it to already be in the
desired encoding. This differs from the \samp{es} format character,
which assumes that 8-bit strings are in Python's default ASCII
encoding and converts them to the specified new encoding.
(Contributed by M.-A. Lemburg, and used for the MBCS support on
Windows described in the following section.)
\item A different argument parsing function,
\cfunction{PyArg_UnpackTuple()}, has been added that's simpler and
presumably faster. Instead of specifying a format string, the
caller simply gives the minimum and maximum number of arguments
expected, and a set of pointers to \ctype{PyObject*} variables that
will be filled in with argument values.
\item Two new flags \constant{METH_NOARGS} and \constant{METH_O} are
available in method definition tables to simplify implementation of
methods with no arguments or a single untyped argument. Calling
such methods is more efficient than calling a corresponding method
that uses \constant{METH_VARARGS}.
Also, the old \constant{METH_OLDARGS} style of writing C methods is
now officially deprecated.
\item
Two new wrapper functions, \cfunction{PyOS_snprintf()} and
\cfunction{PyOS_vsnprintf()} were added to provide
cross-platform implementations for the relatively new
\cfunction{snprintf()} and \cfunction{vsnprintf()} C lib APIs. In
contrast to the standard \cfunction{sprintf()} and
\cfunction{vsprintf()} functions, the Python versions check the
bounds of the buffer used to protect against buffer overruns.
(Contributed by M.-A. Lemburg.)
\item The \cfunction{_PyTuple_Resize()} function has lost an unused
parameter, so now it takes 2 parameters instead of 3. The third
argument was never used, and can simply be discarded when porting
code from earlier versions to Python 2.2.
\end{itemize}
%======================================================================
\section{Other Changes and Fixes}
As usual there were a bunch of other improvements and bugfixes
scattered throughout the source tree. A search through the CVS change
logs finds there were 527 patches applied and 683 bugs fixed between
Python 2.1 and 2.2; 2.2.1 applied 139 patches and fixed 143 bugs;
2.2.2 applied 106 patches and fixed 82 bugs. These figures are likely
to be underestimates.
Some of the more notable changes are:
\begin{itemize}
\item The code for the MacOS port for Python, maintained by Jack
Jansen, is now kept in the main Python CVS tree, and many changes
have been made to support MacOS~X.
The most significant change is the ability to build Python as a
framework, enabled by supplying the \longprogramopt{enable-framework}
option to the configure script when compiling Python. According to
Jack Jansen, ``This installs a self-contained Python installation plus
the OS~X framework "glue" into
\file{/Library/Frameworks/Python.framework} (or another location of
choice). For now there is little immediate added benefit to this
(actually, there is the disadvantage that you have to change your PATH
to be able to find Python), but it is the basis for creating a
full-blown Python application, porting the MacPython IDE, possibly
using Python as a standard OSA scripting language and much more.''
Most of the MacPython toolbox modules, which interface to MacOS APIs
such as windowing, QuickTime, scripting, etc. have been ported to OS~X,
but they've been left commented out in \file{setup.py}. People who want
to experiment with these modules can uncomment them manually.
% Jack's original comments:
%The main change is the possibility to build Python as a
%framework. This installs a self-contained Python installation plus the
%OSX framework "glue" into /Library/Frameworks/Python.framework (or
%another location of choice). For now there is little immedeate added
%benefit to this (actually, there is the disadvantage that you have to
%change your PATH to be able to find Python), but it is the basis for
%creating a fullblown Python application, porting the MacPython IDE,
%possibly using Python as a standard OSA scripting language and much
%more. You enable this with "configure --enable-framework".
%The other change is that most MacPython toolbox modules, which
%interface to all the MacOS APIs such as windowing, quicktime,
%scripting, etc. have been ported. Again, most of these are not of
%immedeate use, as they need a full application to be really useful, so
%they have been commented out in setup.py. People wanting to experiment
%can uncomment them. Gestalt and Internet Config modules are enabled by
%default.
\item Keyword arguments passed to builtin functions that don't take them
now cause a \exception{TypeError} exception to be raised, with the
message "\var{function} takes no keyword arguments".
\item Weak references, added in Python 2.1 as an extension module,
are now part of the core because they're used in the implementation
of new-style classes. The \exception{ReferenceError} exception has
therefore moved from the \module{weakref} module to become a
built-in exception.
\item A new script, \file{Tools/scripts/cleanfuture.py} by Tim
Peters, automatically removes obsolete \code{__future__} statements
from Python source code.
\item An additional \var{flags} argument has been added to the
built-in function \function{compile()}, so the behaviour of
\code{__future__} statements can now be correctly observed in
simulated shells, such as those presented by IDLE and other
development environments. This is described in \pep{264}.
(Contributed by Michael Hudson.)
\item The new license introduced with Python 1.6 wasn't
GPL-compatible. This is fixed by some minor textual changes to the
2.2 license, so it's now legal to embed Python inside a GPLed
program again. Note that Python itself is not GPLed, but instead is
under a license that's essentially equivalent to the BSD license,
same as it always was. The license changes were also applied to the
Python 2.0.1 and 2.1.1 releases.
\item When presented with a Unicode filename on Windows, Python will
now convert it to an MBCS encoded string, as used by the Microsoft
file APIs. As MBCS is explicitly used by the file APIs, Python's
choice of ASCII as the default encoding turns out to be an
annoyance. On \UNIX, the locale's character set is used if
\function{locale.nl_langinfo(CODESET)} is available. (Windows
support was contributed by Mark Hammond with assistance from
Marc-Andr\'e Lemburg. \UNIX{} support was added by Martin von L\"owis.)
\item Large file support is now enabled on Windows. (Contributed by
Tim Peters.)
\item The \file{Tools/scripts/ftpmirror.py} script
now parses a \file{.netrc} file, if you have one.
(Contributed by Mike Romberg.)
\item Some features of the object returned by the
\function{xrange()} function are now deprecated, and trigger
warnings when they're accessed; they'll disappear in Python 2.3.
\class{xrange} objects tried to pretend they were full sequence
types by supporting slicing, sequence multiplication, and the
\keyword{in} operator, but these features were rarely used and
therefore buggy. The \method{tolist()} method and the
\member{start}, \member{stop}, and \member{step} attributes are also
being deprecated. At the C level, the fourth argument to the
\cfunction{PyRange_New()} function, \samp{repeat}, has also been
deprecated.
\item There were a bunch of patches to the dictionary
implementation, mostly to fix potential core dumps if a dictionary
contains objects that sneakily changed their hash value, or mutated
the dictionary they were contained in. For a while python-dev fell
into a gentle rhythm of Michael Hudson finding a case that dumped
core, Tim Peters fixing the bug, Michael finding another case, and round
and round it went.
\item On Windows, Python can now be compiled with Borland C thanks
to a number of patches contributed by Stephen Hansen, though the
result isn't fully functional yet. (But this \emph{is} progress...)
\item Another Windows enhancement: Wise Solutions generously offered
PythonLabs use of their InstallerMaster 8.1 system. Earlier
PythonLabs Windows installers used Wise 5.0a, which was beginning to
show its age. (Packaged up by Tim Peters.)
\item Files ending in \samp{.pyw} can now be imported on Windows.
\samp{.pyw} is a Windows-only thing, used to indicate that a script
needs to be run using PYTHONW.EXE instead of PYTHON.EXE in order to
prevent a DOS console from popping up to display the output. This
patch makes it possible to import such scripts, in case they're also
usable as modules. (Implemented by David Bolen.)
\item On platforms where Python uses the C \cfunction{dlopen()} function
to load extension modules, it's now possible to set the flags used
by \cfunction{dlopen()} using the \function{sys.getdlopenflags()} and
\function{sys.setdlopenflags()} functions. (Contributed by Bram Stolk.)
\item The \function{pow()} built-in function no longer supports 3
arguments when floating-point numbers are supplied.
\code{pow(\var{x}, \var{y}, \var{z})} returns \code{(x**y) \% z}, but
this is never useful for floating point numbers, and the final
result varies unpredictably depending on the platform. A call such
as \code{pow(2.0, 8.0, 7.0)} will now raise a \exception{TypeError}
exception.
\end{itemize}
%======================================================================
\section{Acknowledgements}
The author would like to thank the following people for offering
suggestions, corrections and assistance with various drafts of this
article: Fred Bremmer, Keith Briggs, Andrew Dalke, Fred~L. Drake, Jr.,
Carel Fellinger, David Goodger, Mark Hammond, Stephen Hansen, Michael
Hudson, Jack Jansen, Marc-Andr\'e Lemburg, Martin von L\"owis, Fredrik
Lundh, Michael McLay, Nick Mathewson, Paul Moore, Gustavo Niemeyer,
Don O'Donnell, Joonas Paalasma, Tim Peters, Jens Quade, Tom Reinhardt, Neil
Schemenauer, Guido van Rossum, Greg Ward, Edward Welbourne.
\end{document}
|