summaryrefslogtreecommitdiffstats
path: root/Objects/dictobject.c
Commit message (Collapse)AuthorAgeFilesLines
* Changed the dict implementation to take "string shortcuts" only whenTim Peters2001-09-141-13/+16
| | | | | | | | | | | | | | | | keys are true strings -- no subclasses need apply. This may be debatable. The problem is that a str subclass may very well want to override __eq__ and/or __hash__ (see the new example of case-insensitive strings in test_descr), but go-fast shortcuts for strings are ubiquitous in our dicts (and subclass overrides aren't even looked for then). Another go-fast reason for the change is that PyCheck_StringExact() is a quicker test than PyCheck_String(), and we make such a test on virtually every access to every dict. OTOH, a str subclass may also be perfectly happy using the base str eq and hash, and this change slows them a lot. But those cases are still hypothetical, while Python's own reliance on true-string dicts is not.
* Repair typo in comment.Tim Peters2001-09-021-1/+1
|
* Make dictionary() a real constructor. Accepts at most one argument, "aTim Peters2001-09-021-2/+29
| | | | | | | | | | | | mapping object", in the same sense dict.update(x) requires of x (that x has a keys() method and a getitem). Questionable: The other type constructors accept a keyword argument, so I did that here too (e.g., dictionary(mapping={1:2}) works). But type_call doesn't pass the keyword args to the tp_new slot (it passes NULL), it only passes them to the tp_init slot, so getting at them required adding a tp_init slot to dicts. Looks like that makes the normal case (i.e., no args at all) a little slower (the time it takes to call dict.tp_init and have it figure out there's nothing to do).
* Use new GC API.Neil Schemenauer2001-08-291-7/+6
|
* Patch #427190: Implement and use METH_NOARGS and METH_O.Martin v. Löwis2001-08-161-50/+25
|
* Add PyDict_Merge(a, b, override):Guido van Rossum2001-08-101-2/+18
| | | | | | PyDict_Merge(a, b, 1) is the same as PyDict_Update(a, b). PyDict_Merge(a, b, 0) does something similar but leaves existing items unchanged.
* Merge of descr-branch back into trunk.Tim Peters2001-08-021-123/+107
|
* dict_update(): Generalize this method so {}.update() accepts anyBarry Warsaw2001-06-261-17/+70
| | | | | | | | | | | | | | | "mapping" object, specifically one that supports PyMapping_Keys() and PyObject_GetItem(). This allows you to say e.g. {}.update(UserDict()) We keep the special case for concrete dict objects, although that seems moderately questionable. OTOH, the code exists and works, so why change that? .update()'s docstring already claims that D.update(E) implies calling E.keys() so it's appropriate not to transform AttributeErrors in PyMapping_Keys() to TypeErrors. Patch eyeballed by Tim.
* dict_repr: Reuse one of the int vars (minor code simplification).Tim Peters2001-06-161-3/+3
|
* SF bug 433228: repr(list) woes when len(list) big.Tim Peters2001-06-161-30/+68
| | | | | | | | | | | | Gave Python linear-time repr() implementations for dicts, lists, strings. This means, e.g., that repr(range(50000)) is no longer 50x slower than pprint.pprint() in 2.2 <wink>. I don't consider this a bugfix candidate, as it's a performance boost. Added _PyString_Join() to the internal string API. If we want that in the public API, fine, but then it requires runtime error checks instead of asserts.
* Store the mask instead of the size in dictobjects. The mask is moreTim Peters2001-06-041-23/+29
| | | | | | | frequently used, and in particular this allows to drop the last remaining obvious time-waster in the crucial lookdict() and lookdict_string() functions. Other changes consist mostly of changing "i < ma_size" to "i <= ma_mask" everywhere.
* lookdict: stop more insane core-dump mutating comparison cases. ShouldTim Peters2001-06-031-6/+31
| | | | | | | be possible to provoke unbounded recursion now, but leaving that to someone else to provoke and repair. Bugfix candidate -- although this is getting harder to backstitch, and the cases it's protecting against are mondo contrived.
* lookdict: Reduce obfuscating code duplication with a judicious goto.Tim Peters2001-06-031-25/+21
| | | | | This code is likely to get even hairier to squash core dumps due to mutating comparisons, and it's hard enough to follow without that.
* Finish the dict->string coredump fix. Need sleep.Tim Peters2001-06-021-1/+1
| | | | Bugfix candidate.
* Coredumpers from Michael Hudson, mutating dicts while printing orTim Peters2001-06-021-7/+19
| | | | | converting to string. Critical bugfix candidate -- if you take this seriously <wink>.
* dict_popitem(): Repaired last-second 2.1 comment, which misidentified theTim Peters2001-06-021-5/+8
| | | | true reason for allocating the tuple before checking the dict size.
* New collision resolution scheme: no polynomials, simpler, faster, lessTim Peters2001-06-021-163/+124
| | | | | | | code, less memory. Tests have uncovered no drawbacks. Christian and Vladimir are the other two people who have burned many brain cells on the dict code in recent years, and they like the approach too, so I'm checking it in without further ado.
* Implement an old idea of Christian Tismer's: use polynomial divisionTim Peters2001-05-271-18/+72
| | | | | | | | | | | | | | | instead of multiplication to generate the probe sequence. The idea is recorded in Python-Dev for Dec 2000, but that version is prone to rare infinite loops. The value is in getting *all* the bits of the hash code to participate; and, e.g., this speeds up querying every key in a dict with keys [i << 16 for i in range(20000)] by a factor of 500. Should be equally valuable in any bad case where the high-order hash bits were getting ignored. Also wrote up some of the motivations behind Python's ever-more-subtle hash table strategy.
* Patch #424335: Implement string_richcompare, remove string_compare.Martin v. Löwis2001-05-241-4/+3
| | | | Use new _PyString_Eq in lookdict_string.
* dictresize(): Rebuild small tables if there are any dummies, not just ifTim Peters2001-05-241-7/+11
| | | | | they're entirely full. Not a question of correctness, but of temporarily misplaced common sense.
* Jack Jansen hit a bug in the new dict code, reported on python-dev.Tim Peters2001-05-231-9/+28
| | | | | | | | | | dictresize() was too aggressive about never ever resizing small dicts. If a small dict is entirely full, it needs to rebuild it despite that it won't actually resize it, in order to purge old dummy entries thus creating at least one virgin slot (lookdict assumes at least one such exists). Also took the opportunity to add some high-level comments to dictresize.
* Remove unused variable.Fred Drake2001-05-221-1/+0
|
* SF patch #425242: Patch which "inlines" small dictionaries.Tim Peters2001-05-221-81/+145
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The idea is Marc-Andre Lemburg's, the implementation is Tim's. Add a new ma_smalltable member to dictobjects, an embedded vector of MINSIZE (8) dictentry structs. Short course is that this lets us avoid additional malloc(s) for dicts with no more than 5 entries. The changes are widespread but mostly small. Long course: WRT speed, all scalar operations (getitem, setitem, delitem) on non-empty dicts benefit from no longer needing NULL-pointer checks (ma_table is never NULL anymore). Bulk operations (copy, update, resize, clearing slots during dealloc) benefit in some cases from now looping on the ma_fill count rather than on ma_size, but that was an unexpected benefit: the original reason to loop on ma_fill was to let bulk operations on empty dicts end quickly (since the NULL-pointer checks went away, empty dicts aren't special-cased any more). Special considerations: For dicts that remain empty, this change is a lose on two counts: the dict object contains 8 new dictentry slots now that weren't needed before, and dict object creation also spends time memset'ing these doomed-to-be-unsused slots to NULLs. For dicts with one or two entries that never get larger than 2, it's a mix: a malloc()/free() pair is no longer needed, and the 2-entry case gets to use 8 slots (instead of 4) thus decreasing the chance of collision. Against that, dict object creation spends time memset'ing 4 slots that aren't strictly needed in this case. For dicts with 3 through 5 entries that never get larger than 5, it's a pure win: the dict is created with all the space they need, and they never need to resize. Before they suffered two malloc()/free() calls, plus 1 dict resize, to get enough space. In addition, the 8-slot table they ended with consumed more memory overall, because of the hidden overhead due to the additional malloc. For dicts with 6 or more entries, the ma_smalltable member is wasted space, but then these are large(r) dicts so 8 slots more or less doesn't make much difference. They still benefit all the time from removing ubiquitous dynamic null-pointer checks, and get a small benefit (but relatively smaller the larger the dict) from not having to do two mallocs, two frees, and a resize on the way *to* getting their sixth entry. All in all it appears a small but definite general win, with larger benefits in specific cases. It's especially nice that it allowed to get rid of several branches, gotos and labels, and overall made the code smaller.
* Bugfix candidate.Tim Peters2001-05-191-2/+3
| | | | | | | | | | | | Two exceedingly unlikely errors in dictresize(): 1. The loop for finding the new size had an off-by-one error at the end (could over-index the polys[] vector). 2. The polys[] vector ended with a 0, apparently intended as a sentinel value but never used as such; i.e., it was never checked, so 0 could have been used *as* a polynomial. Neither bug could trigger unless a dict grew to 2**30 slots; since that would consume at least 12GB of memory just to hold the dict pointers, I'm betting it's not the cause of the bug Fred's tracking down <wink>.
* Speed dictresize by collapsing its two passes into one; the reason givenTim Peters2001-05-171-8/+9
| | | | | | | in the comments for using two passes was bogus, as the only object that can get decref'ed due to the copy is the dummy key, and decref'ing dummy can't have side effects (for one thing, dummy is immortal! for another, it's a string object, not a potentially dangerous user-defined object).
* Aggressive reordering of dict comparisons. In case of collision, it standsTim Peters2001-05-131-30/+21
| | | | | | | | | | | | to reason that me_key is much more likely to match the key we're looking for than to match dummy, and if the key is absent me_key is much more likely to be NULL than dummy: most dicts don't even have a dummy entry. Running instrumented dict code over the test suite and some apps confirmed that matching dummy was 200-300x less frequent than matching key in practice. So this reorders the tests to try the common case first. It can lose if a large dict with many collisions is mostly deleted, not resized, and then frequently searched, but that's hardly a case we should be favoring.
* Get rid of the superstitious "~" in dict hashing's "i = (~hash) & mask".Tim Peters2001-05-131-10/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The comment following used to say: /* We use ~hash instead of hash, as degenerate hash functions, such as for ints <sigh>, can have lots of leading zeros. It's not really a performance risk, but better safe than sorry. 12-Dec-00 tim: so ~hash produces lots of leading ones instead -- what's the gain? */ That is, there was never a good reason for doing it. And to the contrary, as explained on Python-Dev last December, it tended to make the *sum* (i + incr) & mask (which is the first table index examined in case of collison) the same "too often" across distinct hashes. Changing to the simpler "i = hash & mask" reduced the number of string-dict collisions (== # number of times we go around the lookup for-loop) from about 6 million to 5 million during a full run of the test suite (these are approximate because the test suite does some random stuff from run to run). The number of collisions in non-string dicts also decreased, but not as dramatically. Note that this may, for a given dict, change the order (wrt previous releases) of entries exposed by .keys(), .values() and .items(). A number of std tests suffered bogus failures as a result. For dicts keyed by small ints, or (less so) by characters, the order is much more likely to be in increasing order of key now; e.g., >>> d = {} >>> for i in range(10): ... d[i] = i ... >>> d {0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9} >>> Unfortunately. people may latch on to that in small examples and draw a bogus conclusion. test_support.py Moved test_extcall's sortdict() into test_support, made it stronger, and imported sortdict into other std tests that needed it. test_unicode.py Excluced cp875 from the "roundtrip over range(128)" test, because cp875 doesn't have a well-defined inverse for unicode("?", "cp875"). See Python-Dev for excruciating details. Cookie.py Chaged various output functions to sort dicts before building strings from them. test_extcall Fiddled the expected-result file. This remains sensitive to native dict ordering, because, e.g., if there are multiple errors in a keyword-arg dict (and test_extcall sets up many cases like that), the specific error Python complains about first depends on native dict ordering.
* Restore dicts' tp_compare slot, and change dict_richcompare to say itTim Peters2001-05-101-15/+3
| | | | | | | | | | | | | | | | | | | | doesn't know how to do LE, LT, GE, GT. dict_richcompare can't do the latter any faster than dict_compare can. More importantly, for cmp(dict1, dict2), Python *first* tries rich compares with EQ, LT, and GT one at a time, even if the tp_compare slot is defined, and dict_richcompare called dict_compare for the latter two because it couldn't do them itself. The result was a lot of wasted calls to dict_compare. Now dict_richcompare gives up at once the times Python calls it with LT and GT from try_rich_to_3way_compare(), and dict_compare is called only once (when Python gets around to trying the tp_compare slot). Continued mystery: despite that this cut the number of calls to dict_compare approximately in half in test_mutants.py, the latter still runs amazingly slowly. Running under the debugger doesn't show excessive activity in the dict comparison code anymore, so I'm guessing the culprit is somewhere else -- but where? Perhaps in the element (key/value) comparison code? We clearly spend a lot of time figuring out how to compare things.
* Repair typo in comment.Tim Peters2001-05-101-1/+1
|
* SF bug #422121 Insecurities in dict comparison.Tim Peters2001-05-101-34/+95
| | | | | | | Fixed a half dozen ways in which general dict comparison could crash Python (even cause Win98SE to reboot) in the presence of kay and/or value comparison routines that mutate the dict during dict comparison. Bugfix candidate.
* SF patch #421922: Implement rich comparison for dicts.Tim Peters2001-05-081-2/+72
| | | | | | d1 == d2 and d1 != d2 now work even if the keys and values in d1 and d2 don't support comparisons other than ==, and testing dicts for equality is faster now (especially when inequality obtains).
* Mchael Hudson pointed out that the code for detecting changes inGuido van Rossum2001-05-021-4/+4
| | | | | | dictionary size was comparing ma_size, the hash table size, which is always a power of two, rather than ma_used, wich changes on each insertion or deletion. Fixed this.
* Add experimental iterkeys(), itervalues(), iteritems() to dictGuido van Rossum2001-05-011-11/+85
| | | | | | | objects. Tests show that iteritems() is 5-10% faster than iterating over the dict and extracting the value with dict[key].
* Mondo changes to the iterator stuff, without changing how Python codeGuido van Rossum2001-04-231-2/+21
| | | | | | | | | | | | | | | | | | | | | | | | sees it (test_iter.py is unchanged). - Added a tp_iternext slot, which calls the iterator's next() method; this is much faster for built-in iterators over built-in types such as lists and dicts, speeding up pybench's ForLoop with about 25% compared to Python 2.1. (Now there's a good argument for iterators. ;-) - Renamed the built-in sequence iterator SeqIter, affecting the C API functions for it. (This frees up the PyIter prefix for generic iterator operations.) - Added PyIter_Check(obj), which checks that obj's type has a tp_iternext slot and that the proper feature flag is set. - Added PyIter_Next(obj) which calls the tp_iternext slot. It has a somewhat complex return condition due to the need for speed: when it returns NULL, it may not have set an exception condition, meaning the iterator is exhausted; when the exception StopIteration is set (or a derived exception class), it means the same thing; any other exception means some other error occurred.
* Iterators phase 1. This comprises:Guido van Rossum2001-04-201-0/+103
| | | | | | | | | | | | | | | | | | | | | | new slot tp_iter in type object, plus new flag Py_TPFLAGS_HAVE_ITER new C API PyObject_GetIter(), calls tp_iter new builtin iter(), with two forms: iter(obj), and iter(function, sentinel) new internal object types iterobject and calliterobject new exception StopIteration new opcodes for "for" loops, GET_ITER and FOR_ITER (also supported by dis.py) new magic number for .pyc files new special method for instances: __iter__() returns an iterator iteration over dictionaries: "for x in dict" iterates over the keys iteration over files: "for x in file" iterates over lines TODO: documentation test suite decide whether to use a different way to spell iter(function, sentinal) decide whether "for key in dict" is a good idea use iterators in map/filter/reduce, min/max, and elsewhere (in/not in?) speed tuning (make next() a slot tp_next???)
* Oops. Removed dictiter_new decl that wasn't supposed to go in yet.Guido van Rossum2001-04-201-2/+0
|
* Implement, test and document "key in dict" and "key not in dict".Guido van Rossum2001-04-201-1/+35
| | | | | | | | | I know some people don't like this -- if it's really controversial, I'll take it out again. (If it's only Alex Martelli who doesn't like it, that doesn't count as "real controversial" though. :-) That's why this is a separate checkin from the iterators stuff I'm about to check in next.
* Tim pointed out a remaining vulnerability in popitem(): theGuido van Rossum2001-04-161-5/+6
| | | | | | | | | | | PyTuple_New() could *conceivably* clear the dict, so move the test for an empty dict after the tuple allocation. It means that we waste time allocating and deallocating a 2-tuple when the dict is empty, but who cares. It also means that when the dict is empty *and* there's no memory to allocate a 2-tuple, we raise MemoryError, not KeyError -- but that may actually a good idea: if there's no room for a lousy 2-tuple, what are the chances that there's room for a KeyError instance?
* Tentative fix for a problem that Tim discovered at the last moment,Guido van Rossum2001-04-151-61/+110
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | and reported to python-dev: because we were calling dict_resize() in PyDict_Next(), and because GC's dict_traverse() uses PyDict_Next(), and because PyTuple_New() can cause GC, and because dict_items() calls PyTuple_New(), it was possible for dict_items() to have the dict resized right under its nose. The solution is convoluted, and touches several places: keys(), values(), items(), popitem(), PyDict_Next(), and PyDict_SetItem(). There are two parts to it. First, we no longer call dict_resize() in PyDict_Next(), which seems to solve the immediate problem. But then PyDict_SetItem() must have a different policy about when *it* calls dict_resize(), because we want to guarantee (e.g. for an algorithm that Jeremy uses in the compiler) that you can loop over a dict using PyDict_Next() and make changes to the dict as long as those changes are only value replacements for existing keys using PyDict_SetItem(). This is done by resizing *after* the insertion instead of before, and by remembering the size before we insert the item, and if the size is still the same, we don't bother to even check if we might need to resize. An additional detail is that if the dict starts out empty, we must still resize it before the insertion. That was the first part. :-) The second part is to make keys(), values(), items(), and popitem() safe against side effects on the dict caused by allocations, under the assumption that if the GC can cause arbitrary Python code to run, it can cause other threads to run, and it's not inconceivable that our dict could be resized -- it would be insane to write code that relies on this, but not all code is sane. Now, I have this nagging feeling that the loops in lookdict probably are blissfully assuming that doing a simple key comparison does not change the dict's size. This is not necessarily true (the keys could be class instances after all). But that's a battle for another day.
* Make PyDict_Next safe to use for loops that merely modify the valuesTim Peters2001-03-211-8/+32
| | | | | | associated with existing dict keys. This is a variant of part of Michael Hudson's patch #409864 "lazy fix for Pings bizarre scoping crash".
* Rich comparisons:Guido van Rossum2001-01-181-118/+45
| | | | | | | | | | | | | | | | | | | | | - Use PyObject_RichCompareBool() when comparing keys; this makes the error handling cleaner. - There were two implementations for dictionary comparison, an old one (#ifdef'ed out) and a new one. Got rid of the old one, which was abandoned years ago. - In the characterize() function, part of dictionary comparison, use PyObject_RichCompareBool() to compare keys and values instead. But continue to use PyObject_Compare() for comparing the final (deciding) elements. - Align the comments in the type struct initializer. Note: I don't implement rich comparison for dictionaries -- there doesn't seem to be much to be gained. (The existing comparison already decides that shorter dicts are always smaller than longer dicts.)
* dict_update has two boundary conditions: a.update(a) and a.update({})Jeremy Hylton2001-01-031-2/+2
| | | | Added test for second one.
* Add long-overdue docstrings to dict methods.Tim Peters2000-12-131-11/+53
|
* Typo repair in comments. Fell for GregS's .popitem() poke.Tim Peters2000-12-131-2/+6
|
* Bring comments up to date (e.g., they still said the table had to beTim Peters2000-12-131-23/+40
| | | | a prime size, which is in fact never true anymore ...).
* Add popitem() -- SF patch #102733.Guido van Rossum2000-12-121-0/+53
|
* Backing out my changes.Moshe Zadka2000-11-301-72/+0
| | | | Improved version coming soon to a Source Forge near you!
* Added .first{item,value,key}() to dictionaries.Moshe Zadka2000-11-301-0/+72
| | | | | Complete with docos and tests. OKed by Guido.
* REMOVED all CWI, CNRI and BeOpen copyright markings.Guido van Rossum2000-09-011-9/+0
| | | | This should match the situation in the 1.6b1 tree.
* Slight performance hack that also avoids requiring the existence of threadFred Drake2000-08-311-12/+124
| | | | | | | | state for dictionaries that have only been indexed by string keys. See the comments in SourceForge for more. This closes SourceForge patch #101309.