summaryrefslogtreecommitdiffstats
path: root/Include/intrcheck.h
diff options
context:
space:
mode:
authorAndrew Dalke <dalke@dalkescientific.com>2006-05-26 14:00:45 (GMT)
committerAndrew Dalke <dalke@dalkescientific.com>2006-05-26 14:00:45 (GMT)
commit525eab37127373cf6f3c4007b816095aef9d2b0c (patch)
tree59d81b45bc7b61ca1374d50e4e53e87651a1c80e /Include/intrcheck.h
parentb1f3251ceb07af3a3915f2fa2bd8cb106fa0f23a (diff)
downloadcpython-525eab37127373cf6f3c4007b816095aef9d2b0c.zip
cpython-525eab37127373cf6f3c4007b816095aef9d2b0c.tar.gz
cpython-525eab37127373cf6f3c4007b816095aef9d2b0c.tar.bz2
Changes to string.split/rsplit on whitespace to preallocate space in the
results list. Originally it allocated 0 items and used the list growth during append. Now it preallocates 12 items so the first few appends don't need list reallocs. ("Here are some words ."*2).split(None, 1) is 7% faster ("Here are some words ."*2).split() is is 15% faster (Your milage may vary, see dealership for details.) File parsing like this for line in f: count += len(line.split()) is also about 15% faster. There is a slowdown of about 3% for large strings because of the additional overhead of checking if the append is to a preallocated region of the list or not. This will be the rare case. It could be improved with special case code but we decided it was not useful enough. There is a cost of 12*sizeof(PyObject *) bytes per list. For the normal case of file parsing this is not a problem because of the lists have a short lifetime. We have not come up with cases where this is a problem in real life. I chose 12 because human text averages about 11 words per line in books, one of my data sets averages 6.2 words with a final peak at 11 words per line, and I work with a tab delimited data set with 8 tabs per line (or 9 words per line). 12 encompasses all of these. Also changed the last rstrip code to append then reverse, rather than doing insert(0). The strip() and rstrip() times are now comparable.
Diffstat (limited to 'Include/intrcheck.h')
0 files changed, 0 insertions, 0 deletions