diff options
Diffstat (limited to 'Tools/stringbench/README')
-rw-r--r-- | Tools/stringbench/README | 68 |
1 files changed, 68 insertions, 0 deletions
diff --git a/Tools/stringbench/README b/Tools/stringbench/README new file mode 100644 index 0000000..a271f12 --- /dev/null +++ b/Tools/stringbench/README @@ -0,0 +1,68 @@ +stringbench is a set of performance tests comparing byte string +operations with unicode operations. The two string implementations +are loosely based on each other and sometimes the algorithm for one is +faster than the other. + +These test set was started at the Need For Speed sprint in Reykjavik +to identify which string methods could be sped up quickly and to +identify obvious places for improvement. + +Here is an example of a benchmark + + +@bench('"Andrew".startswith("A")', 'startswith single character', 1000) +def startswith_single(STR): + s1 = STR("Andrew") + s2 = STR("A") + s1_startswith = s1.startswith + for x in _RANGE_1000: + s1_startswith(s2) + +The bench decorator takes three parameters. The first is a short +description of how the code works. In most cases this is Python code +snippet. It is not the code which is actually run because the real +code is hand-optimized to focus on the method being tested. + +The second parameter is a group title. All benchmarks with the same +group title are listed together. This lets you compare different +implementations of the same algorithm, such as "t in s" +vs. "s.find(t)". + +The last is a count. Each benchmark loops over the algorithm either +100 or 1000 times, depending on the algorithm performance. The output +time is the time per benchmark call so the reader needs a way to know +how to scale the performance. + +These parameters become function attributes. + + +Here is an example of the output + + +========== count newlines +38.54 41.60 92.7 ...text.with.2000.newlines.count("\n") (*100) +========== early match, single character +1.14 1.18 96.8 ("A"*1000).find("A") (*1000) +0.44 0.41 105.6 "A" in "A"*1000 (*1000) +1.15 1.17 98.1 ("A"*1000).index("A") (*1000) + +The first column is the run time in milliseconds for byte strings. +The second is the run time for unicode strings. The third is a +percentage; byte time / unicode time. It's the percentage by which +unicode is faster than byte strings. + +The last column contains the code snippet and the repeat count for the +internal benchmark loop. + +The times are computed with 'timeit.py' which repeats the test more +and more times until the total time takes over 0.2 seconds, returning +the best time for a single iteration. + +The final line of the output is the cumulative time for byte and +unicode strings, and the overall performance of unicode relative to +bytes. For example + +4079.83 5432.25 75.1 TOTAL + +However, this has no meaning as it evenly weights every test. + |