summaryrefslogtreecommitdiffstats
path: root/Objects
Commit message (Collapse)AuthorAgeFilesLines
* Fix for bug #432384: Recursion in PyString_AsEncodedString?Marc-André Lemburg2001-06-121-1/+1
|
* Added q/Q standard (x-platform 8-byte ints) mode in struct module.Tim Peters2001-06-121-6/+19
| | | | | | | | | | | | | | This completes the q/Q project. longobject.c _PyLong_AsByteArray: The original code had a gross bug: the most-significant Python digit doesn't necessarily have SHIFT significant bits, and you really need to count how many copies of the sign bit it has else spurious overflow errors result. test_struct.py: This now does exhaustive std q/Q testing at, and on both sides of, all relevant power-of-2 boundaries, both positive and negative. NEWS: Added brief dict news while I was at it.
* Two new private longobject API functions,Tim Peters2001-06-111-0/+213
| | | | | | | | | | _PyLong_FromByteArray _PyLong_AsByteArray Untested and probably buggy -- they compile OK, but nothing calls them yet. Will soon be called by the struct module, to implement x-platform 'q' and 'Q'. If other people have uses for them, we could move them into the public API. See longobject.h for usage details.
* Added a missing cast to the hashfunc initializer.Jack Jansen2001-06-101-1/+1
|
* Patch #424475: Speed-up tp_compare usage, by special-casing the commonMartin v. Löwis2001-06-092-22/+61
| | | | | | case of objects with equal types which support tp_compare. Give type objects a tp_compare function. Also add c<0 tests before a few PyErr_Occurred tests.
* Fixes [ #430986 ] Buglet in PyUnicode_FromUnicode.Marc-André Lemburg2001-06-071-1/+1
|
* Store the mask instead of the size in dictobjects. The mask is moreTim Peters2001-06-041-23/+29
| | | | | | | frequently used, and in particular this allows to drop the last remaining obvious time-waster in the crucial lookdict() and lookdict_string() functions. Other changes consist mostly of changing "i < ma_size" to "i <= ma_mask" everywhere.
* lookdict: stop more insane core-dump mutating comparison cases. ShouldTim Peters2001-06-031-6/+31
| | | | | | | be possible to provoke unbounded recursion now, but leaving that to someone else to provoke and repair. Bugfix candidate -- although this is getting harder to backstitch, and the cases it's protecting against are mondo contrived.
* lookdict: Reduce obfuscating code duplication with a judicious goto.Tim Peters2001-06-031-25/+21
| | | | | This code is likely to get even hairier to squash core dumps due to mutating comparisons, and it's hard enough to follow without that.
* Finish the dict->string coredump fix. Need sleep.Tim Peters2001-06-021-1/+1
| | | | Bugfix candidate.
* Coredumpers from Michael Hudson, mutating dicts while printing orTim Peters2001-06-021-7/+19
| | | | | converting to string. Critical bugfix candidate -- if you take this seriously <wink>.
* dict_popitem(): Repaired last-second 2.1 comment, which misidentified theTim Peters2001-06-021-5/+8
| | | | true reason for allocating the tuple before checking the dict size.
* New collision resolution scheme: no polynomials, simpler, faster, lessTim Peters2001-06-021-163/+124
| | | | | | | code, less memory. Tests have uncovered no drawbacks. Christian and Vladimir are the other two people who have burned many brain cells on the dict code in recent years, and they like the approach too, so I'm checking it in without further ado.
* fix bogus indentationJeremy Hylton2001-05-291-1/+1
|
* _PyTuple_Resize: guard against PyTuple_New() returning NULL, using Tim'sThomas Wouters2001-05-291-1/+1
| | | | suggestion (modulo style).
* Cruft cleanup: Removed the unused last_is_sticky argument from the internalTim Peters2001-05-282-6/+5
| | | | _PyTuple_Resize().
* _PyTuple_Resize: take into account the empty tuple. There can be only one.Thomas Wouters2001-05-281-2/+11
| | | | | | | Instead of raising a SystemError, just create a new tuple of the desired size. This fixes (at least) SF bug #420343.
* Implement an old idea of Christian Tismer's: use polynomial divisionTim Peters2001-05-271-18/+72
| | | | | | | | | | | | | | | instead of multiplication to generate the probe sequence. The idea is recorded in Python-Dev for Dec 2000, but that version is prone to rare infinite loops. The value is in getting *all* the bits of the hash code to participate; and, e.g., this speeds up querying every key in a dict with keys [i << 16 for i in range(20000)] by a factor of 500. Should be equally valuable in any bad case where the high-order hash bits were getting ignored. Also wrote up some of the motivations behind Python's ever-more-subtle hash table strategy.
* Change list.extend() error msgs and NEWS to reflect that list.extend()Tim Peters2001-05-261-2/+2
| | | | | | | now takes any iterable argument, not only sequences. NEEDS DOC CHANGES -- but I don't think we settled on a concise way to say this stuff.
* Cruft cleanup: removed the #ifdef'ery in support of compiling to allowTim Peters2001-05-261-18/+4
| | | | multi-argument list.append(1, 2, 3) (as opposed to .append((1,2,3))).
* roundupsize() and friends: fiddle over-allocation strategy for listTim Peters2001-05-261-8/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | resizing. Accurate timings are impossible on my Win98SE box, but this is obviously faster even on this box for reasonable list.append() cases. I give credit for this not to the resizing strategy but to getting rid of integer multiplication and divsion (in favor of shifting) when computing the rounded-up size. For unreasonable list.append() cases, Win98SE now displays linear behavior for one-at-time appends up to a list with about 35 million elements. Then it dies with a MemoryError, due to fatally fragmented *address space* (there's plenty of VM available, but by this point Win9X has broken user space into many distinct heaps none of which has enough contiguous space left to resize the list, and for whatever reason Win9x isn't coalescing the dead heaps). Before the patch it got a MemoryError for the same reason, but once the list reached about 2 million elements. Haven't yet tried on Win2K but have high hopes extreme list.append() will be much better behaved now (NT & Win2K didn't fragment address space, but suffered obvious quadratic-time behavior before as lists got large). For other systems I'm relying on common sense: replacing integer * and / by << and >> can't plausibly hurt, the number of function calls hasn't changed, and the total operation count for reasonably small lists is about the same (while the operations are cheaper now).
* Patch #424335: Implement string_richcompare, remove string_compare.Martin v. Löwis2001-05-242-16/+80
| | | | Use new _PyString_Eq in lookdict_string.
* dictresize(): Rebuild small tables if there are any dummies, not just ifTim Peters2001-05-241-7/+11
| | | | | they're entirely full. Not a question of correctness, but of temporarily misplaced common sense.
* Jack Jansen hit a bug in the new dict code, reported on python-dev.Tim Peters2001-05-231-9/+28
| | | | | | | | | | dictresize() was too aggressive about never ever resizing small dicts. If a small dict is entirely full, it needs to rebuild it despite that it won't actually resize it, in order to purge old dummy entries thus creating at least one virgin slot (lookdict assumes at least one such exists). Also took the opportunity to add some high-level comments to dictresize.
* Remove unused variable.Fred Drake2001-05-221-1/+0
|
* SF patch #425242: Patch which "inlines" small dictionaries.Tim Peters2001-05-221-81/+145
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The idea is Marc-Andre Lemburg's, the implementation is Tim's. Add a new ma_smalltable member to dictobjects, an embedded vector of MINSIZE (8) dictentry structs. Short course is that this lets us avoid additional malloc(s) for dicts with no more than 5 entries. The changes are widespread but mostly small. Long course: WRT speed, all scalar operations (getitem, setitem, delitem) on non-empty dicts benefit from no longer needing NULL-pointer checks (ma_table is never NULL anymore). Bulk operations (copy, update, resize, clearing slots during dealloc) benefit in some cases from now looping on the ma_fill count rather than on ma_size, but that was an unexpected benefit: the original reason to loop on ma_fill was to let bulk operations on empty dicts end quickly (since the NULL-pointer checks went away, empty dicts aren't special-cased any more). Special considerations: For dicts that remain empty, this change is a lose on two counts: the dict object contains 8 new dictentry slots now that weren't needed before, and dict object creation also spends time memset'ing these doomed-to-be-unsused slots to NULLs. For dicts with one or two entries that never get larger than 2, it's a mix: a malloc()/free() pair is no longer needed, and the 2-entry case gets to use 8 slots (instead of 4) thus decreasing the chance of collision. Against that, dict object creation spends time memset'ing 4 slots that aren't strictly needed in this case. For dicts with 3 through 5 entries that never get larger than 5, it's a pure win: the dict is created with all the space they need, and they never need to resize. Before they suffered two malloc()/free() calls, plus 1 dict resize, to get enough space. In addition, the 8-slot table they ended with consumed more memory overall, because of the hidden overhead due to the additional malloc. For dicts with 6 or more entries, the ma_smalltable member is wasted space, but then these are large(r) dicts so 8 slots more or less doesn't make much difference. They still benefit all the time from removing ubiquitous dynamic null-pointer checks, and get a small benefit (but relatively smaller the larger the dict) from not having to do two mallocs, two frees, and a resize on the way *to* getting their sixth entry. All in all it appears a small but definite general win, with larger benefits in specific cases. It's especially nice that it allowed to get rid of several branches, gotos and labels, and overall made the code smaller.
* file_getiter(): make iter(file) be equivalent to file.xreadlines().Guido van Rossum2001-05-221-12/+3
| | | | | | | | | | | | | | This should be faster. This means: (1) "for line in file:" won't work if the xreadlines module can't be imported. (2) The body of "for line in file:" shouldn't use the file directly; the effects (e.g. of file.readline(), file.seek() or even file.tell()) would be undefined because of the buffering that goes on in the xreadlines module.
* init_name_op(): add (void) to the argument list to make it a validGuido van Rossum2001-05-221-1/+1
| | | | prototype, for gcc -Wstrict-prototypes.
* This patch changes the behaviour of the UTF-16 codec family. Only theMarc-André Lemburg2001-05-211-17/+25
| | | | | | | | | UTF-16 codec will now interpret and remove a *leading* BOM mark. Sub- sequent BOM characters are no longer interpreted and removed. UTF-16-LE and -BE pass through all BOM mark characters. These changes should get the UTF-16 codec more in line with what the Unicode FAQ recommends w/r to BOM marks.
* Bugfix candidate.Tim Peters2001-05-191-2/+3
| | | | | | | | | | | | Two exceedingly unlikely errors in dictresize(): 1. The loop for finding the new size had an off-by-one error at the end (could over-index the polys[] vector). 2. The polys[] vector ended with a 0, apparently intended as a sentinel value but never used as such; i.e., it was never checked, so 0 could have been used *as* a polynomial. Neither bug could trigger unless a dict grew to 2**30 slots; since that would consume at least 12GB of memory just to hold the dict pointers, I'm betting it's not the cause of the bug Fred's tracking down <wink>.
* Speed dictresize by collapsing its two passes into one; the reason givenTim Peters2001-05-171-8/+9
| | | | | | | in the comments for using two passes was bogus, as the only object that can get decref'ed due to the copy is the dummy key, and decref'ing dummy can't have side effects (for one thing, dummy is immortal! for another, it's a string object, not a potentially dangerous user-defined object).
* Speed tuple comparisons in two ways:Tim Peters2001-05-151-22/+23
| | | | | | | | | | | | | | | | | | | | | | | | | 1. Omit the early-out EQ/NE "lengths different?" test. Was unable to find any real code where it triggered, but it always costs. The same is not true of list richcmps, where different-size lists appeared to get compared about half the time. 2. Because tuples are immutable, there's no need to refetch the lengths of both tuples from memory again on each loop trip. BUG ALERT: The tuple (and list) richcmp algorithm is arguably wrong, because it won't believe there's any difference unless Py_EQ returns false for some corresponding elements: >>> class C: ... def __lt__(x, y): return 1 ... __eq__ = __lt__ ... >>> C() < C() 1 >>> (C(),) < (C(),) 0 >>> That doesn't make sense -- provided you believe the defn. of C makes sense.
* This patch changes the way the string .encode() method works slightlyMarc-André Lemburg2001-05-151-20/+90
| | | | | | | | | | | | | | | | | | | | | | | | | and introduces a new method .decode(). The major change is that strg.encode() will no longer try to convert Unicode returns from the codec into a string, but instead pass along the Unicode object as-is. The same is now true for all other codec return types. The underlying C APIs were changed accordingly. Note that even though this does have the potential of breaking existing code, the chances are low since conversion from Unicode previously took place using the default encoding which is normally set to ASCII rendering this auto-conversion mechanism useless for most Unicode encodings. The good news is that you can now use .encode() and .decode() with much greater ease and that the door was opened for better accessibility of the builtin codecs. As demonstration of the new feature, the patch includes a few new codecs which allow string to string encoding and decoding (rot13, hex, zip, uu, base64). Written by Marc-Andre Lemburg. Copyright assigned to the PSF.
* Aggressive reordering of dict comparisons. In case of collision, it standsTim Peters2001-05-131-30/+21
| | | | | | | | | | | | to reason that me_key is much more likely to match the key we're looking for than to match dummy, and if the key is absent me_key is much more likely to be NULL than dummy: most dicts don't even have a dummy entry. Running instrumented dict code over the test suite and some apps confirmed that matching dummy was 200-300x less frequent than matching key in practice. So this reorders the tests to try the common case first. It can lose if a large dict with many collisions is mostly deleted, not resized, and then frequently searched, but that's hardly a case we should be favoring.
* Get rid of the superstitious "~" in dict hashing's "i = (~hash) & mask".Tim Peters2001-05-131-10/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The comment following used to say: /* We use ~hash instead of hash, as degenerate hash functions, such as for ints <sigh>, can have lots of leading zeros. It's not really a performance risk, but better safe than sorry. 12-Dec-00 tim: so ~hash produces lots of leading ones instead -- what's the gain? */ That is, there was never a good reason for doing it. And to the contrary, as explained on Python-Dev last December, it tended to make the *sum* (i + incr) & mask (which is the first table index examined in case of collison) the same "too often" across distinct hashes. Changing to the simpler "i = hash & mask" reduced the number of string-dict collisions (== # number of times we go around the lookup for-loop) from about 6 million to 5 million during a full run of the test suite (these are approximate because the test suite does some random stuff from run to run). The number of collisions in non-string dicts also decreased, but not as dramatically. Note that this may, for a given dict, change the order (wrt previous releases) of entries exposed by .keys(), .values() and .items(). A number of std tests suffered bogus failures as a result. For dicts keyed by small ints, or (less so) by characters, the order is much more likely to be in increasing order of key now; e.g., >>> d = {} >>> for i in range(10): ... d[i] = i ... >>> d {0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9} >>> Unfortunately. people may latch on to that in small examples and draw a bogus conclusion. test_support.py Moved test_extcall's sortdict() into test_support, made it stronger, and imported sortdict into other std tests that needed it. test_unicode.py Excluced cp875 from the "roundtrip over range(128)" test, because cp875 doesn't have a well-defined inverse for unicode("?", "cp875"). See Python-Dev for excruciating details. Cookie.py Chaged various output functions to sort dicts before building strings from them. test_extcall Fiddled the expected-result file. This remains sensitive to native dict ordering, because, e.g., if there are multiple errors in a keyword-arg dict (and test_extcall sets up many cases like that), the specific error Python complains about first depends on native dict ordering.
* Repair "module has no attribute xxx" error msg; bug introduced whenTim Peters2001-05-121-1/+1
| | | | switching from tp_getattr to tp_getattro.
* Variant of patch #423262: Change module attribute get & setTim Peters2001-05-111-34/+35
| | | | | | Allow module getattr and setattr to exploit string interning, via the previously null module object tp_getattro and tp_setattro slots. Yields a very nice speedup for things like random.random and os.path etc.
* Variant of SF patch 423181Jeremy Hylton2001-05-111-21/+51
| | | | | | | For rich comparisons, use instance_getattr2() when possible to avoid the expense of setting an AttributeError. Also intern the name_op[] table and use the interned strings rather than creating a new string and interning it each time through.
* Cosmetic: code under "else" clause was missing indent.Tim Peters2001-05-111-1/+1
|
* Restore dicts' tp_compare slot, and change dict_richcompare to say itTim Peters2001-05-101-15/+3
| | | | | | | | | | | | | | | | | | | | doesn't know how to do LE, LT, GE, GT. dict_richcompare can't do the latter any faster than dict_compare can. More importantly, for cmp(dict1, dict2), Python *first* tries rich compares with EQ, LT, and GT one at a time, even if the tp_compare slot is defined, and dict_richcompare called dict_compare for the latter two because it couldn't do them itself. The result was a lot of wasted calls to dict_compare. Now dict_richcompare gives up at once the times Python calls it with LT and GT from try_rich_to_3way_compare(), and dict_compare is called only once (when Python gets around to trying the tp_compare slot). Continued mystery: despite that this cut the number of calls to dict_compare approximately in half in test_mutants.py, the latter still runs amazingly slowly. Running under the debugger doesn't show excessive activity in the dict comparison code anymore, so I'm guessing the culprit is somewhere else -- but where? Perhaps in the element (key/value) comparison code? We clearly spend a lot of time figuring out how to compare things.
* Repair typo in comment.Tim Peters2001-05-101-1/+1
|
* SF bug #422121 Insecurities in dict comparison.Tim Peters2001-05-101-34/+95
| | | | | | | Fixed a half dozen ways in which general dict comparison could crash Python (even cause Win98SE to reboot) in the presence of kay and/or value comparison routines that mutate the dict during dict comparison. Bugfix candidate.
* Heh. I need a break. After this: stropmodule & stringobject were moreTim Peters2001-05-101-8/+6
| | | | | | out of synch than I realized, and I managed to break replace's "count" argument when it was 0. All is well again. Maybe. Bugfix candidate.
* Fudge. stropmodule and stringobject both had copies of the buggyTim Peters2001-05-101-32/+41
| | | | | | mymemXXX stuff, and they were already out of synch. Fix the remaining bugs in both and get them back in synch. Bugfix release candidate.
* SF patch #416247 2.1c1 stringobject: unused vrbl cleanup.Tim Peters2001-05-091-2/+0
| | | | Thanks to Mark Favas.
* Sheesh -- repair the dodge around "cast isn't an lvalue" complaints toTim Peters2001-05-091-0/+4
| | | | restore correct semantics.
* Mark Favas reported that gcc caught me using casts as lvalues. Dodge it.Tim Peters2001-05-091-6/+10
|
* Ack! Restore the COUNT_ALLOCS one_strings code.Tim Peters2001-05-091-1/+5
|
* My change to string_item() left an extra reference to each 1-characterTim Peters2001-05-091-4/+3
| | | | | interned string created by "string"[i]. Since they're immortal anyway, this was hard to notice, but it was still wrong <wink>.
* Intern 1-character strings as soon as they're created. As-is, they aren'tTim Peters2001-05-081-15/+12
| | | | | | | | | | | | | | | | | | | interned when created, so the cached versions generally aren't ever interned. With the patch, the Py_INCREF(t); *p = t; Py_DECREF(s); return; indirection block in PyString_InternInPlace() is never executed during a full run of the test suite, but was executed very many times before. So I'm trading more work when creating one-character strings for doing less work later. Note that the "more work" here can happen at most 256 times per program run, so it's trivial. The same reasoning accounts for the patch's simplification of string_item (the new version can call PyString_FromStringAndSize() no more than 256 times per run, so there's no point to inlining that stuff -- if we were serious about saving time here, we'd pre-initialize the characters vector so that no runtime testing at all was needed!).