summaryrefslogtreecommitdiffstats
path: root/PLAN.txt
blob: 559b125b25d3cdff7bbe99c3a3d32197a61ee99a (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
Project: core implementation
****************************

Still to do
-----------

Fix comparisons.  There's some nasty stuff here: when two types are
not the same, and they're not instances, the fallback code doesn't
account for the possibility that they might be subtypes of a common
base type that defines a comparison.

Check for conflicts between base classes.  I fear that the rules used
to decide whether multiple bases have conflicting instance variables
aren't strict enough.  I think that sometimes two different classes
adding __dict__ may be incompatible after all.

Check for order conflicts.  Suppose there are two base classes X and
Y.  Suppose class B derives from X and Y, and class C from Y and X (in
that order).  Now suppose class D derives from B and C.  In which
order should the base classes X and Y be searched?  This is an order
conflict, and should be disallowed; currently the test for this is not
implemented.

Allow __class__ assignment.

Make __dynamic__ the default.

Add __del__ handlers.

Done (mostly)
-------------

Do binary operators properly.  nb_add should try to call self.__add__
and other.__radd__.  I think I'll exclude base types that define any
binary operator without setting the CHECKTYPES flag.  *** This is
done, AFAICT.  Even supports __truediv__ and __floordiv__. ***

Fix subtype_dealloc().  This currently searches through the list of
base types until it finds a type whose tp_dealloc is not
subtype_dealloc.  I think this is not safe.  I think the alloc/dealloc
policy needs to be rethought.  *** There's an idea here that I haven't
worked out yet: just as object creation now has separate API's tp_new,
tp_alloc, and tp_init, destruction has tp_dealloc and tp_free.  (Maybe
tp_fini should be added to correspond to tp_init?)  Something
could/should be done with this. ***

Clean up isinstance(), issubclass() and their C equivalents.  There
are a bunch of different APIs here and not all of them do the right
thing yet.  There should be fewer APIs and their implementation should
be simpler.  The old "abstract subclass" test should probably
disappear (if we want to root out ExtensionClass).  *** I think I've
done 90% of this by creating PyType_IsSubtype() and using it
appropriately.  For now, the old "abstract subclass" test is still
there, and there may be some places where PyObject_IsSubclass() is
called where PyType_IsSubtype() would be more appropriate. ***

Clean up the GC interface.  Currently, tp_basicsize includes the GC
head size iff tp_flags includes the GC flag bit.  This makes object
size math a pain (e.g. to see if two object types have the same
instance size, you can't just compare the tp_basicsize fields -- you
have to conditionally subtract the GC head size).  Neil has a patch
that improves the API in this area, but it's backwards incompatible.
(http://sf.net/tracker/?func=detail&aid=421893&group_id=5470&atid=305470)
I think I know of a way to fix the incompatibility (by switching to a
different flag bit).  *** Tim proposed a better idea: macros to access
tp_basicsize while hiding the nastiness.  This is done now, so I think
the rest of this task needn't be done. ***  *** Neil checked in a
much improved version of his idea, and it's all squared away. ***

Make the __dict__ of types declared with Python class statements
writable -- only statically declared types must have an immutable
dict, because they're shared between interpreter instances.  Possibly
trap writes to the __dict__ to update the corresponding tp_<slot> if
an __<slot>__ name is affected.  *** Done as part of the next task. ***

It should be an option (maybe a different metaclass, maybe a flag) to
*not* merge __dict__ with all the bases, but instead search the
__dict__ (or __introduced__?) of all bases in __mro__ order.  (This is
needed anyway to unify classes completely.)  *** Partly done.
Inheritance of slots from bases is still icky: (1) MRO is not always
respected when inheriting slots; (2) dynamic classes can't add slot
implementations in Python after creation (e.g., setting C.__hash__
doesn't set the tp_hash slot).  ***

Universal base class (object).  How can we make the object class
subclassable and define simple default methods for everything without
having these inherited by built-in types that don't want these
defaults?  *** Done, really. ***

Add error checking to the MRO calculation.  *** Done. ***

Make __new__ overridable through a Python class method (!).  Make more
of the sub-algorithms of type construction available as methods.  ***
After I implemented class methods, I found that in order to be able
to make an upcall to Base.__new__() and have it create an instance of
your class (rather than a Base instance), you can't use class methods
-- you must use static methods.  So I've implemented those too.  I've
hooked up __new__ in the right places, so the first part of this is
now done.  I've also exported the MRO calculation and made it
overridable, as metamethod mro().  I believe that closes this topic
for now.  I expect that some warts will only be really debugged when
we try to use this for some, eh, interesting types such as tuples. ***

    There was a sequel to the __new__ story (see checkins).  There
    still is a problem: object.__new__ now no longer exists, because
    it was inherited by certain extension types that could break.  But
    now when I write

        class C(object):
            def __new__(cls, *args):
                "How do I call the default __new__ implementation???"

    This was resolved nicely by putting object.__new__ back but not
    inheriting __new__ from object when the subtype is a built-in or
    extension type.

More -- I'm sure new issues will crop up as we go.


Project: loose ends and follow-through
**************************************

Still to do
-----------

Exceptions should be types.  This changes the rules, since now almost
anything can be raised (as maybe it should).  Or should we strive for
enforcement of the convention that all exceptions should be derived
from Exception?  String exceptions will be another hassle, to be
deprecated and eventually ruled out.

Standardize a module containing names for all built-in types, and
standardize on names.  E.g. should the official name of the string
type be 'str', 'string', or 'StringType'?

Create a hierarchy of types, so that e.g. int and long are both
subtypes of an abstract base type integer, which is itself a subtype
of number, etc.  A lot of thinking can go into this!

*** NEW TASK??? ***
Implement "signature" objects.  These are alluded to in PEP 252 but
not yet specified.  Supposedly they provide an easily usable API to
find out about function/method arguments.  Building these for Python
functions is simple.  Building these for built-in functions will
require a change to the PyMethodDef structure, so that a type can
provide signature information for its C methods.  (This would also
help in supporting keyword arguments for C methods with less work than
PyArg_ParseTupleAndKeywords() currently requires.)  But should we do
this?  It's additional work and not required for any of the other
parts.

Done (mostly)
-------------

Make more (most?) built-in types act as their own factory functions.
*** Done for all reasonable built-in types. ***

Make more (most?) built-in types subtypable -- with or without
overridable allocation.  *** This includes descriptors!  It should be
possible to write descriptors in Python, so metaclasses can do clever
things with them. *** *** Done for most reasonable built-in types,
except for descriptors ***


Project: making classes use the new machinery
*********************************************

Tasks:

Try to get rid of all code in classobject.c by deferring to the new
mechanisms.  How far can we get without breaking backwards
compatibility?  This is underspecified because I haven't thought much
about it yet.  Can we lose the use of PyInstance_Check() everywhere?
I would hope so!  *** I'm dropping this goal for now -- classic
classes will be 99% unchanged. ***


Project: backwards compatibility
********************************

Tasks:

Make sure all code checks the proper tp_flags bit before accessing
type object fields.

Identify areas of incompatibility with Python 2.1.  Design solutions.
Implement and test.

Some specific areas: a fair amount of code probably depends on
specific types having __members__ and/or __methods__ attributes.
These are currently not present (conformant to PEP 252, which proposes
to drop them) but we may have to add them back.  This can be done in a
generic way with not too much effort.  Tim adds:  Perhaps that dir(object)
rarely returns anything but [] now is a consequence of this.  I'm very
used to doing, e.g., dir([]) or dir("") in an interactive shell to jog my
memory; also one of the reasons test_generators failed.

Another area: going all the way with classes and instances means that
type(x) == types.InstanceType won't work any more to detect instances.
Should there be a mode where this still works?  Maybe this should be
the default mode, with a warning, and an explicit way to get the new
way to work?  (Instead of a __future__ statement, I'm thinking of a
module global __metaclass__ which would provide the default metaclass
for baseless class statements.)


Project: testing
****************

Tasks:

Identify new functionality that needs testing.  Conceive unit tests
for all new functionality.  Conceive stress tests for critical
features.  Run the tests.  Fix bugs.  Repeat until satisfied.

Note: this may interact with the branch integration task.


Project: integration with main branch *** This is done - tim ***
*************************************

Tasks:

Merge changes in the HEAD branch into the descr-branch.  Then merge
the descr-branch back into the HEAD branch.

The longer we wait, the more effort this will be -- the descr-branch
forked off quite a long time ago, and there are changes everywhere in
the HEAD branch (e.g. the dict object has been radically rewritten).

On the other hand, if we do this too early, we'll have to do it again
later.

Note from Tim:  We should never again wait until literally 100s of files
are out of synch.  I don't care how often I need to do this, provided only
that it's a tractable task each time.  Once per week sounds like a good
idea.  As is, even the trunk change to rangeobject.c created more than its
proper share of merge headaches, because it confused all the other reasons
include file merges were getting conflicts (the more changes there are, the
worse diff does; indeed, I came up with the ndiff algorithm in the 80s
precisely because the source-control diff program Cray used at the time
produced minimal but *senseless* diffs, thus creating artificial conflicts;
paying unbounded attention to context does a much better job of putting
changes where they make semantic sense too; but we're stuck with Unix diff
here, and it isn't robust in this sense; if we don't keep its job simple,
it will make my job hell).

Done:
To undo or rename before final merge:  Modules/spam.c has worked its
way into the branch Unix and Windows builds (pythoncore.dsp and
PC/config.c); also imported by test_descr.py.  How about renaming to
xxsubtype.c (whatever) now? *** this is done - tim ***


Project: performance tuning
***************************

Tasks:

Pick or create a general performance benchmark for Python.  Benchmark
the new system vs. the old system.  Profile the new system.  Improve
hotspots.  Repeat until satisfied.

Note: this may interact with the branch integration task.


Project: documentation
**********************

Tasks:

Update PEP 252 (descriptors).  Describe more of the prototype
implementation

Update PEP 253 (subtyping).  Complicated architectural wrangling with
metaclasses.  There is an interaction between implementation and
description.

Write PEP 254 (unification of classes).  This should discuss what
changes for ordinary classes, and how we can make it more b/w
compatible.

Other documentation.  There needs to be user documentation,
eventually.


Project: community interaction
******************************

Tasks:

Once the PEPs are written, solicit community feedback, and formulate
responses to the feedback.  Give the community enough time to think
over this complicated proposal.  Provide the community with a
prototype implementation to test.  Try to do this *before* casting
everything in stone!