| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
| |
necessarily always set before used. Between Tim, Armin & me we
couldn't prove GCC wrong, so we decided to fix the algorithm. This
version is Armin's.
|
|
|
|
| |
Py_USING_UNICODE is defined
|
| |
|
| |
|
|
|
|
|
|
| |
This gives another 30% speedup for operations such as
map(func, d.iteritems()) or list(d.iteritems()) which can both take
advantage of length information when provided.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Split into three separate types that share everything except the
code for iternext. Saves run time decision making and allows
each iternext function to be specialized.
* Inlined PyDict_Next(). In addition to saving a function call, this
allows a redundant test to be eliminated and further specialization
of the code for the unique needs of each iterator type.
* Created a reusable result tuple for iteritems(). Saves the malloc
time for tuples when the previous result was not kept by client code
(this is the typical use case for iteritems). If the client code
does keep the reference, then a new tuple is created.
Results in a 20% to 30% speedup depending on the size and sparsity
of the dictionary.
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Factored constant structure references out of the inner loops for
PyDict_Next(), dict_keys(), dict_values(), and dict_items().
Gave measurable speedups to each (the improvement varies depending
on the sparseness of the dictionary being measured).
* Added a freelist scheme styled after that for tuples. Saves around
80% of the calls to malloc and free. About 10% of the time, the
previous dictionary was completely empty; in those cases, the
dictionary initialization with memset() can be skipped.
|
| |
|
| |
|
|
|
|
|
| |
exact turned on. The tiny space savings wasn't worth the additional time
and code.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
scheme in situations that likely won't benefit from it. This further
improves memory utilization from Py2.3 which always over-allocates
except for PyList_New().
Situations expected to benefit from over-allocation:
list.insert(), list.pop(), list.append(), and list.extend()
Situations deemed unlikely to benefit:
list_inplace_repeat, list_ass_slice, list_ass_subscript
The most gray area was for listextend_internal() which only runs
when the argument is a list or a tuple. This could be viewed as
a one-time fixed length addition or it could be viewed as wrapping
a series of appends. I left its over-allocation turned on but
could be convinced otherwise.
|
| |
|
|
|
|
|
|
|
| |
(Spotted by Michael Hudson.)
* Now that "selflen" is no longer inside a loop, it should not be a
register variable.
|
|
|
|
|
| |
three recent optimizations. Aside from reducing code volume, it
increases readability.
|
|
|
|
|
|
|
| |
worth it to in-line the call to PyIter_Next().
Saves another 15% on most list operations that acceptable a general
iterable argument (such as the list constructor).
|
|
|
|
| |
exposing _PyList_Extend().
|
|
|
|
|
|
|
|
| |
avoids creating an intermediate tuple for iterable arguments other than
lists or tuples.
In other words, a+=b no longer requires extra memory when b is not a
list or tuple. The list and tuple cases are unchanged.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
for xrange and list objects).
* list.__reversed__ now checks the length of the sequence object before
calling PyList_GET_ITEM() because the mutable could have changed length.
* all three implementations are now tranparent with respect to length and
maintain the invariant len(it) == len(list(it)) even when the underlying
sequence mutates.
* __builtin__.reversed() now frees the underlying sequence as soon
as the iterator is exhausted.
* the code paths were rearranged so that the most common paths
do not require a jump.
|
|
|
|
| |
was academic and it was potentially confusing to use.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Replace sprintf message with a constant message string -- this error
message ran on every invocation except straight deletions but it was
only needed when the rhs was not iterable. The message was also
out-of-date and did not reflect that iterable arguments were allowed.
* For inner loops that do not make ref count adjustments, use memmove()
for fast copying and better readability.
* For inner loops that do make ref count adjustments, speed them up by
factoring out the constant structure reference and using vitem[] instead.
|
|
|
|
| |
longer needed.
|
|
|
|
|
|
| |
and list.extend(). Factoring the inner loops to remove the constant
structure references and fixed offsets gives speedups ranging from
20% to 30%.
|
|
|
|
|
|
|
|
| |
* Using addition instead of substraction on array indices allows the
compiler to use a fast addressing mode. Saves about 10%.
* Using PyTuple_GET_ITEM and PyList_SET_ITEM is about 7% faster than
PySequenceFast_GET_ITEM which has to make a list check on every pass.
|
| |
|
|
|
|
|
|
|
|
| |
(Championed by Bob Ippolito.)
The update() method for mappings now accepts all the same argument forms
as the dict() constructor. This includes item lists and/or keyword
arguments.
|
| |
|
|
|
|
| |
float_richcompare. Reported on c.l.py by Helmut Jarausch.
|
|
|
|
|
|
|
|
|
|
|
| |
recent gcc on Linux/x86)
[ 899109 ] 1==float('nan')
by implementing rich comparisons for floats.
Seems to make comparisons involving NaNs somewhat less surprising
when the underlying C compiler actually implements C99 semantics.
|
|
|
|
|
| |
for types other than PyInt being accepted for the optional argument.
(Spotted by Neal Norwitz.)
|
|
|
|
|
| |
This change probably isn't work a bug fix. It's unlikely that anyone
was calling this method without passing it a real dict.
|
|
|
|
| |
arguments.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
utilization, and speed:
* Moved the responsibility for emptying the previous list from list_fill
to list_init.
* Replaced the code in list_extend with the superior code from list_fill.
* Eliminated list_fill.
Results:
* list.extend() no longer creates an intermediate tuple except to handle
the special case of x.extend(x). The saves memory and time.
* list.extend(x) runs
5 to 10% faster when x is a list or tuple
15% faster when x is an iterable not defining __len__
twice as fast when x is an iterable defining __len__
* the code is about 15 lines shorter and no longer duplicates
functionality.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The Py2.3 approach overallocated small lists by up to 8 elements.
The last checkin would limited this to one but slowed down (by 20 to 30%)
the creation of small lists between 3 to 8 elements.
This tune-up balances the two, limiting overallocation to 3 elements
(significantly reducing space consumption from Py2.3) and running faster
than the previous checkin.
The first part of the growth pattern (0, 4, 8, 16) neatly meshes with
allocators that trigger data movement only when crossing a power of two
boundary. Also, then even numbers mesh well with common data alignments.
|
| |
|
|
|
|
| |
More than doubles its speed.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
realloc(). This is achieved by tracking the overallocation size in a new
field and using that information to skip calls to realloc() whenever
possible.
* Simplified and tightened the amount of overallocation. For larger lists,
this overallocates by 1/8th (compared to the previous scheme which ranged
between 1/4th to 1/32nd over-allocation). For smaller lists (n<6), the
maximum overallocation is one byte (formerly it could be upto eight bytes).
This saves memory in applications with large numbers of small lists.
* Eliminated the NRESIZE macro in favor of a new, static list_resize function
that encapsulates the resizing logic. Coverting this back to macro would
give a small (under 1%) speed-up. This was too small to warrant the loss
of readability, maintainability, and de-coupling.
* Some functions using NRESIZE had grown unnecessarily complex in their
efforts to bend to the macro's calling pattern. With the new list_resize
function in place, those other functions could be simplified. That is
being saved for a separate patch.
* The ob_item==NULL check could be eliminated from the new list_resize
function. This would entail finding each piece of code that sets ob_item
to NULL and adding a new line to invalidate the overallocation tracking
field. Rather than impose a new requirement on other pieces of list code,
it was preferred to leave the NULL check in place and retain the benefits
of decoupling, maintainability and information hiding (only PyList_New()
and list_sort() need to know about the new field). This approach also
reduces the odds of breaking an extension module.
(Collaborative effort by Raymond Hettinger, Hye-Shik Chang, Tim Peters,
and Armin Rigo.)
|
| |
|
|
|
|
|
|
|
| |
(Contributed by Mike Pall.)
Make sure fill_free_list() is called only once rather than 106 times
when pre-allocating small ints.
|
| |
|
|
|
|
|
| |
2. Failure to clear the error when attempts to get the __getstate__
attribute fail caused intermittent errors and odd behavior.
|
| |
|
|
|
|
| |
* Let deques support reversed().
|
|
|
|
| |
characters instead of character pointers to determine space requirements.
|