| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
| |
|
| |
|
|
|
|
|
| |
- Check the correct variable (str_obj, not str) for NULL
- sep_len was already verified it wasn't 0
|
| |
|
|
|
|
| |
patterns in a string, only the number needed by the max limit.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
find
|
|
|
|
| |
find helpers; updated unicodeobject to use stringlib_count
|
|
|
|
|
|
|
|
|
|
| |
(If compiled without FAST search support, changed the pre-memcmp test
to check the last character as well as the first. This gave a 25%
speedup for my test case.)
Rewrote the split algorithms so they stop when maxsplit gets to 0.
Previously they did a string match first then checked if the maxsplit
was reached. The new way prevents a needless string search.
|
|
|
|
| |
feel free to add more tests and improve the documentation.
|
| |
|
| |
|
|
|
|
| |
broken, someone would have noticed by now ;-)
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
the algorithm.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
results list.
Originally it allocated 0 items and used the list growth during append. Now
it preallocates 12 items so the first few appends don't need list reallocs.
("Here are some words ."*2).split(None, 1) is 7% faster
("Here are some words ."*2).split() is is 15% faster
(Your milage may vary, see dealership for details.)
File parsing like this
for line in f:
count += len(line.split())
is also about 15% faster. There is a slowdown of about 3% for large
strings because of the additional overhead of checking if the append is
to a preallocated region of the list or not. This will be the rare case.
It could be improved with special case code but we decided it was not
useful enough.
There is a cost of 12*sizeof(PyObject *) bytes per list. For the normal
case of file parsing this is not a problem because of the lists have
a short lifetime. We have not come up with cases where this is a problem
in real life.
I chose 12 because human text averages about 11 words per line in books,
one of my data sets averages 6.2 words with a final peak at 11 words per
line, and I work with a tab delimited data set with 8 tabs per line (or
9 words per line). 12 encompasses all of these.
Also changed the last rstrip code to append then reverse, rather than
doing insert(0). The strip() and rstrip() times are now comparable.
|
| |
|
|
|
|
|
| |
length (thanks, neal!). and yes, I've verified that this doesn't
slow things down ;-)
|
|
|
|
|
| |
~15% faster for the current tests (which is noticable faster than a corre-
sponding find call). thanks to neal-who-never-sleeps for the tip.
|
|
|
|
| |
feel free to improve the documentation and the docstrings.
|
|
|
|
|
|
|
|
| |
this is on par with a corresponding find, and nearly twice as fast
as split(sep, 1)
full tests, a unicode version, and documentation will follow to-
morrow.
|
|
|
|
|
|
|
|
|
| |
The SIGCHECK macro defined here has always been bizarre, but
it apparently causes compiler warnings on "Sun Studio 11".
I believe the warnings are bogus, but it doesn't hurt to make
the macro definition saner.
Bugfix candidate (but I'm not going to bother).
|
| |
|
| |
|
|
|
|
| |
with PyObject_CallFunctionObjArgs, which is 30% faster.
|
| |
|
|
|
|
| |
related tests are now about 10x faster.
|
| |
|
|
|
|
|
| |
new string is over max Py_ssize_t. I have no way to test it on my
box or any box I have access to. At least it doesn't break anything.
|
| |
|
|
|
|
|
|
| |
PyInt_FromLong.
Now using PyInt_FromSsize_t.
|
|
|
|
| |
for the related stringbench tests.
|
|
|
|
|
|
|
|
|
|
|
| |
made a copy of the string using PyString_FromStringAndSize(s, n) and modify
the copied string in-place. However, 1 (and 0) character strings are shared
from a cache. This cause "A".replace("A", "a") to change the cached version
of "A" -- used by everyone.
Now may the copy with NULL as the string and do the memcpy manually. I've
added regression tests to check if this happens in the future. Perhaps
there should be a PyString_Copy for this case?
|
|
|
|
|
|
| |
both mystrtoul.c and longobject.c. Share the table instead. Also
cut its size by 64 entries (they had been used for an inscrutable
trick originally, but the code no longer tries to use that trick).
|
|
|
|
|
| |
now about 3x faster on my machine, for the replace tests from string-
bench.
|
| |
|
|
|
|
| |
versions if they're not defined.
|
|
|
|
| |
this and it is more efficient than to use !Py_IS_INFINITE(X) && !Py_IS_NAN(X). No change on other platforms
|
|
|
|
|
| |
strings too... (thanks to georg brandl for spotting the exact problem
faster than anyone else)
|
|
|
|
|
|
|
| |
(the unicode versions of these are still 2x faster on windows,
though...)
based on work by Andrew Dalke, with tweaks by yours truly.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
``long(str, base)`` is now up to 6x faster for non-power-of-2 bases. The
largest speedup is for inputs with about 1000 decimal digits. Conversion
from non-power-of-2 bases remains quadratic-time in the number of input
digits (it was and remains linear-time for bases 2, 4, 8, 16 and 32).
Speedups at various lengths for decimal inputs, comparing 2.4.3 with
current trunk. Note that it's actually a bit slower for 1-digit strings:
len speedup
---- -------
1 -4.5%
2 4.6%
3 8.3%
4 12.7%
5 16.9%
6 28.6%
7 35.5%
8 44.3%
9 46.6%
10 55.3%
11 65.7%
12 77.7%
13 73.4%
14 75.3%
15 85.2%
16 103.0%
17 95.1%
18 112.8%
19 117.9%
20 128.3%
30 174.5%
40 209.3%
50 236.3%
60 254.3%
70 262.9%
80 295.8%
90 297.3%
100 324.5%
200 374.6%
300 403.1%
400 391.1%
500 388.7%
600 440.6%
700 468.7%
800 498.0%
900 507.2%
1000 501.2%
2000 450.2%
3000 463.2%
4000 452.5%
5000 440.6%
6000 439.6%
7000 424.8%
8000 418.1%
9000 417.7%
|
|
|
|
| |
constant-length changes; use fastsearch to locate the first match.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
results in a 2.5x speedup on the stringbench count tests, and a 20x (!)
speedup on the stringbench search/find/contains test, compared to 2.5a2.
for more on the algorithm, see:
http://effbot.org/zone/stringlib.htm
if you get weird results, you can disable the new algoritm by undefining
USE_FAST in Objects/unicodeobject.c.
enjoy /F
|