summaryrefslogtreecommitdiffstats
path: root/Objects
Commit message (Collapse)AuthorAgeFilesLines
* Aggressive reordering of dict comparisons. In case of collision, it standsTim Peters2001-05-131-30/+21
| | | | | | | | | | | | to reason that me_key is much more likely to match the key we're looking for than to match dummy, and if the key is absent me_key is much more likely to be NULL than dummy: most dicts don't even have a dummy entry. Running instrumented dict code over the test suite and some apps confirmed that matching dummy was 200-300x less frequent than matching key in practice. So this reorders the tests to try the common case first. It can lose if a large dict with many collisions is mostly deleted, not resized, and then frequently searched, but that's hardly a case we should be favoring.
* Get rid of the superstitious "~" in dict hashing's "i = (~hash) & mask".Tim Peters2001-05-131-10/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The comment following used to say: /* We use ~hash instead of hash, as degenerate hash functions, such as for ints <sigh>, can have lots of leading zeros. It's not really a performance risk, but better safe than sorry. 12-Dec-00 tim: so ~hash produces lots of leading ones instead -- what's the gain? */ That is, there was never a good reason for doing it. And to the contrary, as explained on Python-Dev last December, it tended to make the *sum* (i + incr) & mask (which is the first table index examined in case of collison) the same "too often" across distinct hashes. Changing to the simpler "i = hash & mask" reduced the number of string-dict collisions (== # number of times we go around the lookup for-loop) from about 6 million to 5 million during a full run of the test suite (these are approximate because the test suite does some random stuff from run to run). The number of collisions in non-string dicts also decreased, but not as dramatically. Note that this may, for a given dict, change the order (wrt previous releases) of entries exposed by .keys(), .values() and .items(). A number of std tests suffered bogus failures as a result. For dicts keyed by small ints, or (less so) by characters, the order is much more likely to be in increasing order of key now; e.g., >>> d = {} >>> for i in range(10): ... d[i] = i ... >>> d {0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9} >>> Unfortunately. people may latch on to that in small examples and draw a bogus conclusion. test_support.py Moved test_extcall's sortdict() into test_support, made it stronger, and imported sortdict into other std tests that needed it. test_unicode.py Excluced cp875 from the "roundtrip over range(128)" test, because cp875 doesn't have a well-defined inverse for unicode("?", "cp875"). See Python-Dev for excruciating details. Cookie.py Chaged various output functions to sort dicts before building strings from them. test_extcall Fiddled the expected-result file. This remains sensitive to native dict ordering, because, e.g., if there are multiple errors in a keyword-arg dict (and test_extcall sets up many cases like that), the specific error Python complains about first depends on native dict ordering.
* Repair "module has no attribute xxx" error msg; bug introduced whenTim Peters2001-05-121-1/+1
| | | | switching from tp_getattr to tp_getattro.
* Variant of patch #423262: Change module attribute get & setTim Peters2001-05-111-34/+35
| | | | | | Allow module getattr and setattr to exploit string interning, via the previously null module object tp_getattro and tp_setattro slots. Yields a very nice speedup for things like random.random and os.path etc.
* Variant of SF patch 423181Jeremy Hylton2001-05-111-21/+51
| | | | | | | For rich comparisons, use instance_getattr2() when possible to avoid the expense of setting an AttributeError. Also intern the name_op[] table and use the interned strings rather than creating a new string and interning it each time through.
* Cosmetic: code under "else" clause was missing indent.Tim Peters2001-05-111-1/+1
|
* Restore dicts' tp_compare slot, and change dict_richcompare to say itTim Peters2001-05-101-15/+3
| | | | | | | | | | | | | | | | | | | | doesn't know how to do LE, LT, GE, GT. dict_richcompare can't do the latter any faster than dict_compare can. More importantly, for cmp(dict1, dict2), Python *first* tries rich compares with EQ, LT, and GT one at a time, even if the tp_compare slot is defined, and dict_richcompare called dict_compare for the latter two because it couldn't do them itself. The result was a lot of wasted calls to dict_compare. Now dict_richcompare gives up at once the times Python calls it with LT and GT from try_rich_to_3way_compare(), and dict_compare is called only once (when Python gets around to trying the tp_compare slot). Continued mystery: despite that this cut the number of calls to dict_compare approximately in half in test_mutants.py, the latter still runs amazingly slowly. Running under the debugger doesn't show excessive activity in the dict comparison code anymore, so I'm guessing the culprit is somewhere else -- but where? Perhaps in the element (key/value) comparison code? We clearly spend a lot of time figuring out how to compare things.
* Repair typo in comment.Tim Peters2001-05-101-1/+1
|
* SF bug #422121 Insecurities in dict comparison.Tim Peters2001-05-101-34/+95
| | | | | | | Fixed a half dozen ways in which general dict comparison could crash Python (even cause Win98SE to reboot) in the presence of kay and/or value comparison routines that mutate the dict during dict comparison. Bugfix candidate.
* Heh. I need a break. After this: stropmodule & stringobject were moreTim Peters2001-05-101-8/+6
| | | | | | out of synch than I realized, and I managed to break replace's "count" argument when it was 0. All is well again. Maybe. Bugfix candidate.
* Fudge. stropmodule and stringobject both had copies of the buggyTim Peters2001-05-101-32/+41
| | | | | | mymemXXX stuff, and they were already out of synch. Fix the remaining bugs in both and get them back in synch. Bugfix release candidate.
* SF patch #416247 2.1c1 stringobject: unused vrbl cleanup.Tim Peters2001-05-091-2/+0
| | | | Thanks to Mark Favas.
* Sheesh -- repair the dodge around "cast isn't an lvalue" complaints toTim Peters2001-05-091-0/+4
| | | | restore correct semantics.
* Mark Favas reported that gcc caught me using casts as lvalues. Dodge it.Tim Peters2001-05-091-6/+10
|
* Ack! Restore the COUNT_ALLOCS one_strings code.Tim Peters2001-05-091-1/+5
|
* My change to string_item() left an extra reference to each 1-characterTim Peters2001-05-091-4/+3
| | | | | interned string created by "string"[i]. Since they're immortal anyway, this was hard to notice, but it was still wrong <wink>.
* Intern 1-character strings as soon as they're created. As-is, they aren'tTim Peters2001-05-081-15/+12
| | | | | | | | | | | | | | | | | | | interned when created, so the cached versions generally aren't ever interned. With the patch, the Py_INCREF(t); *p = t; Py_DECREF(s); return; indirection block in PyString_InternInPlace() is never executed during a full run of the test suite, but was executed very many times before. So I'm trading more work when creating one-character strings for doing less work later. Note that the "more work" here can happen at most 256 times per program run, so it's trivial. The same reasoning accounts for the patch's simplification of string_item (the new version can call PyString_FromStringAndSize() no more than 256 times per run, so there's no point to inlining that stuff -- if we were serious about saving time here, we'd pre-initialize the characters vector so that no runtime testing at all was needed!).
* SF bug #422177: Results from .pyc differs from .pyTim Peters2001-05-081-0/+6
| | | | | | | | Store floats and doubles to full precision in marshal. Test that floats read from .pyc/.pyo closely match those read from .py. Declare PyFloat_AsString() in floatobject header file. Add new PyFloat_AsReprString() API function. Document the functions declared in floatobject.h.
* SF patch #421922: Implement rich comparison for dicts.Tim Peters2001-05-081-2/+72
| | | | | | d1 == d2 and d1 != d2 now work even if the keys and values in d1 and d2 don't support comparisons other than ==, and testing dicts for equality is faster now (especially when inequality obtains).
* SF patch 419176 from MvL; fixed bug 418977Jeremy Hylton2001-05-081-6/+3
| | | | Two errors in dict_to_map() helper used by PyFrame_LocalsToFast().
* Remove unused variableJeremy Hylton2001-05-081-1/+0
|
* SF bug #422108 - Error in rich comparisons.Tim Peters2001-05-071-1/+7
| | | | | | 2.1.1 bugfix candidate too. Fix a bad (albeit unlikely) return value in try_rich_to_3way_compare(). Also document do_cmp()'s return values.
* Reimplement PySequence_Contains() and instance_contains(), so they workTim Peters2001-05-052-58/+50
| | | | | | | | | safely together and don't duplicate logic (the common logic was factored out into new private API function _PySequence_IterContains()). Visible change: some_complex_number in some_instance no longer blows up if some_instance has __getitem__ but neither __contains__ nor __iter__. test_iter changed to ensure that remains true.
* Generalize PySequence_Count() (operator.countOf) to work with iterators.Tim Peters2001-05-051-13/+31
|
* Make 'x in y' and 'x not in y' (PySequence_Contains) play nice w/ iterators.Tim Peters2001-05-052-29/+35
| | | | | | | | | | | | | NEEDS DOC CHANGES A few more AttributeErrors turned into TypeErrors, but in test_contains this time. The full story for instance objects is pretty much unexplainable, because instance_contains() tries its own flavor of iteration-based containment testing first, and PySequence_Contains doesn't get a chance at it unless instance_contains() blows up. A consequence is that some_complex_number in some_instance dies with a TypeError unless some_instance.__class__ defines __iter__ but does not define __getitem__.
* Make unicode.join() work nice with iterators. This also required a changeTim Peters2001-05-052-12/+23
| | | | | | | | to string.join(), so that when the latter figures out in midstream that it really needs unicode.join() instead, unicode.join() can actually get all the sequence elements (i.e., there's no guarantee that the sequence passed to string.join() can be iterated over *again* by unicode.join(), so string.join() must not pass on the original sequence object anymore).
* Fix a tiny and unlikely memory leak. Was there before too, and actuallyTim Peters2001-05-051-1/+3
| | | | several of these turned up and got fixed during the iteration crusade.
* Generalize tuple() to work nicely with iterators.Tim Peters2001-05-051-40/+47
| | | | | | | | | | | | | | | | | | | | | NEEDS DOC CHANGES. This one surprised me! While I expected tuple() to be a no-brainer, turns out it's actually dripping with consequences: 1. It will *allow* the popular PySequence_Fast() to work with any iterable object (code for that not yet checked in, but should be trivial). 2. It caused two std tests to fail. This because some places used PyTuple_Sequence() (the C spelling of tuple()) as an indirect way to test whether something *is* a sequence. But tuple() code only looked for the existence of sq->item to determine that, and e.g. an instance passed that test whether or not it supported the other operations tuple() needed (e.g., __len__). So some things the tests *expected* to fail with an AttributeError now fail with a TypeError instead. This looks like an improvement to me; e.g., test_coercion used to produce 559 TypeErrors and 2 AttributeErrors, and now they're all TypeErrors. The error details are more informative too, because the places calling this were *looking* for TypeErrors in order to replace the generic tuple() "not a sequence" msg with their own more specific text, and AttributeErrors snuck by that.
* Make PyIter_Next() a little smarter (wrt its knowledge of iteratorTim Peters2001-05-051-11/+16
| | | | internals) so clients can be a lot dumber (wrt their knowledge).
* The weakref support in PyObject_InitVar() as well; this should have come outFred Drake2001-05-031-4/+0
| | | | at the same time as it did from PyObject_Init() .
* Remove unnecessary intialization for the case of weakly-referencable objects;Fred Drake2001-05-031-4/+0
| | | | | | | the code necessary to accomplish this is simpler and faster if confined to the object implementations, so we only do this there. This causes no behaviorial changes beyond a (very slight) speedup.
* Since Py_TPFLAGS_HAVE_WEAKREFS is set in Py_TPFLAGS_DEFAULT, it does notFred Drake2001-05-032-21/+21
| | | | | | | need to be specified in the type structures independently. The flag exists only for binary compatibility. This is a "source cleanliness" issue and introduces no behavioral changes.
* Mchael Hudson pointed out that the code for detecting changes inGuido van Rossum2001-05-021-4/+4
| | | | | | dictionary size was comparing ma_size, the hash table size, which is always a power of two, rather than ma_used, wich changes on each insertion or deletion. Fixed this.
* Fix for bug #417030: "print '%*s' fails for unicode string"Marc-André Lemburg2001-05-021-2/+3
|
* Plug a memory leak in list(), when appending to the result list.Tim Peters2001-05-021-5/+9
|
* Generalize list(seq) to work with iterators. This also generalizes list()Tim Peters2001-05-011-31/+57
| | | | | | | | | | | | | to no longer insist that len(seq) be defined. NEEDS DOC CHANGES. This is meant to be a model for how other functions of this ilk (max, filter, etc) can be generalized similarly. Feel encouraged to grab your favorite and convert it! Note some cute consequences: list(file) == file.readlines() == list(file.xreadlines()) list(dict) == dict.keys() list(dict.iteritems()) = dict.items() list(xrange(i, j, k)) == range(i, j, k)
* Discard a misleading comment about iter_iternext().Guido van Rossum2001-05-011-1/+0
|
* Printing objects to a real file still wasn't done right: if theGuido van Rossum2001-05-011-32/+14
| | | | | | | | | | | | | | | | | | | object's type didn't define tp_print, there were still cases where the full "print uses str() which falls back to repr()" semantics weren't honored. This resulted in >>> print None <None object at 0x80bd674> >>> print type(u'') <type object at 0x80c0a80> Fixed this by always using the appropriate PyObject_Repr() or PyObject_Str() call, rather than trying to emulate what they would do. Also simplified PyObject_Str() to always fall back on PyObject_Repr() when tp_str is not defined (rather than making an extra check for instances with a __str__ method). And got rid of the special case for strings.
* Add a proper implementation for the tp_str slot (returning self, ofGuido van Rossum2001-05-011-1/+8
| | | | | course), so I can get rid of the special case for strings in PyObject_Str().
* Add experimental iterkeys(), itervalues(), iteritems() to dictGuido van Rossum2001-05-011-11/+85
| | | | | | | objects. Tests show that iteritems() is 5-10% faster than iterating over the dict and extracting the value with dict[key].
* Well darnit! The innocuous fix I made to PyObject_Print() causedGuido van Rossum2001-04-301-1/+20
| | | | printing of instances not to look for __str__(). Fix this.
* A different approach to the problem reported inTim Peters2001-04-282-37/+54
| | | | | | | | | | | Patch #419651: Metrowerks on Mac adds 0x itself C std says %#x and %#X conversion of 0 do not add the 0x/0X base marker. Metrowerks apparently does. Mark Favas reported the same bug under a Compaq compiler on Tru64 Unix, but no other libc broken in this respect is known (known to be OK under MSVC and gcc). So just try the damn thing at runtime and see what the platform does. Note that we've always had bugs here, but never knew it before because a relevant test case didn't exist before 2.1.
* (Adding this to the trunk as well.)Guido van Rossum2001-04-271-1/+4
| | | | | | | | Fix a very old flaw in PyObject_Print(). Amazing! When an object type defines tp_str but not tp_repr, 'print x' to a real file object would not call the tp_str slot but rather print a default style representation: <foo object at 0x....>. This even though 'print x' to a file-like-object would correctly call the tp_str slot.
* This patch originated from an idea by Martin v. Loewis who submitted aMarc-André Lemburg2001-04-231-51/+133
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | patch for sharing single character Unicode objects. Martin's patch had to be reworked in a number of ways to take Unicode resizing into consideration as well. Here's what the updated patch implements: * Single character Unicode strings in the Latin-1 range are shared (not only ASCII chars as in Martin's original patch). * The ASCII and Latin-1 codecs make use of this optimization, providing a noticable speedup for single character strings. Most Unicode methods can use the optimization as well (by virtue of using PyUnicode_FromUnicode()). * Some code cleanup was done (replacing memcpy with Py_UNICODE_COPY) * The PyUnicode_Resize() can now also handle the case of resizing unicode_empty which previously resulted in an error. * Modified the internal API _PyUnicode_Resize() and the public PyUnicode_Resize() API to handle references to shared objects correctly. The _PyUnicode_Resize() signature changed due to this. * Callers of PyUnicode_FromUnicode() may now only modify the Unicode object contents of the returned object in case they called the API with NULL as content template. Note that even though this patch passes the regression tests, there may still be subtle bugs in the sharing code.
* Mondo changes to the iterator stuff, without changing how Python codeGuido van Rossum2001-04-235-28/+148
| | | | | | | | | | | | | | | | | | | | | | | | sees it (test_iter.py is unchanged). - Added a tp_iternext slot, which calls the iterator's next() method; this is much faster for built-in iterators over built-in types such as lists and dicts, speeding up pybench's ForLoop with about 25% compared to Python 2.1. (Now there's a good argument for iterators. ;-) - Renamed the built-in sequence iterator SeqIter, affecting the C API functions for it. (This frees up the PyIter prefix for generic iterator operations.) - Added PyIter_Check(obj), which checks that obj's type has a tp_iternext slot and that the proper feature flag is set. - Added PyIter_Next(obj) which calls the tp_iternext slot. It has a somewhat complex return condition due to the need for speed: when it returns NULL, it may not have set an exception condition, meaning the iterator is exhausted; when the exception StopIteration is set (or a derived exception class), it means the same thing; any other exception means some other error occurred.
* Oops, forgot to merge this from the iter-branch to the trunk.Guido van Rossum2001-04-211-9/+37
| | | | This adds "for line in file" iteration, as promised.
* SF but #417587: compiler warnings compiling 2.1.Tim Peters2001-04-211-3/+0
| | | | Repaired *some* of the SGI compiler warnings Sjoerd Mullender reported.
* Adding iterobject.[ch], which were accidentally not added. Sorry\!Guido van Rossum2001-04-201-0/+188
|
* Iterators phase 1. This comprises:Guido van Rossum2001-04-204-2/+151
| | | | | | | | | | | | | | | | | | | | | | new slot tp_iter in type object, plus new flag Py_TPFLAGS_HAVE_ITER new C API PyObject_GetIter(), calls tp_iter new builtin iter(), with two forms: iter(obj), and iter(function, sentinel) new internal object types iterobject and calliterobject new exception StopIteration new opcodes for "for" loops, GET_ITER and FOR_ITER (also supported by dis.py) new magic number for .pyc files new special method for instances: __iter__() returns an iterator iteration over dictionaries: "for x in dict" iterates over the keys iteration over files: "for x in file" iterates over lines TODO: documentation test suite decide whether to use a different way to spell iter(function, sentinal) decide whether "for key in dict" is a good idea use iterators in map/filter/reduce, min/max, and elsewhere (in/not in?) speed tuning (make next() a slot tp_next???)
* Oops. Removed dictiter_new decl that wasn't supposed to go in yet.Guido van Rossum2001-04-201-2/+0
|