summaryrefslogtreecommitdiffstats
path: root/Tools/stringbench/README
diff options
context:
space:
mode:
Diffstat (limited to 'Tools/stringbench/README')
-rw-r--r--Tools/stringbench/README68
1 files changed, 68 insertions, 0 deletions
diff --git a/Tools/stringbench/README b/Tools/stringbench/README
new file mode 100644
index 0000000..a271f12
--- /dev/null
+++ b/Tools/stringbench/README
@@ -0,0 +1,68 @@
+stringbench is a set of performance tests comparing byte string
+operations with unicode operations. The two string implementations
+are loosely based on each other and sometimes the algorithm for one is
+faster than the other.
+
+These test set was started at the Need For Speed sprint in Reykjavik
+to identify which string methods could be sped up quickly and to
+identify obvious places for improvement.
+
+Here is an example of a benchmark
+
+
+@bench('"Andrew".startswith("A")', 'startswith single character', 1000)
+def startswith_single(STR):
+ s1 = STR("Andrew")
+ s2 = STR("A")
+ s1_startswith = s1.startswith
+ for x in _RANGE_1000:
+ s1_startswith(s2)
+
+The bench decorator takes three parameters. The first is a short
+description of how the code works. In most cases this is Python code
+snippet. It is not the code which is actually run because the real
+code is hand-optimized to focus on the method being tested.
+
+The second parameter is a group title. All benchmarks with the same
+group title are listed together. This lets you compare different
+implementations of the same algorithm, such as "t in s"
+vs. "s.find(t)".
+
+The last is a count. Each benchmark loops over the algorithm either
+100 or 1000 times, depending on the algorithm performance. The output
+time is the time per benchmark call so the reader needs a way to know
+how to scale the performance.
+
+These parameters become function attributes.
+
+
+Here is an example of the output
+
+
+========== count newlines
+38.54 41.60 92.7 ...text.with.2000.newlines.count("\n") (*100)
+========== early match, single character
+1.14 1.18 96.8 ("A"*1000).find("A") (*1000)
+0.44 0.41 105.6 "A" in "A"*1000 (*1000)
+1.15 1.17 98.1 ("A"*1000).index("A") (*1000)
+
+The first column is the run time in milliseconds for byte strings.
+The second is the run time for unicode strings. The third is a
+percentage; byte time / unicode time. It's the percentage by which
+unicode is faster than byte strings.
+
+The last column contains the code snippet and the repeat count for the
+internal benchmark loop.
+
+The times are computed with 'timeit.py' which repeats the test more
+and more times until the total time takes over 0.2 seconds, returning
+the best time for a single iteration.
+
+The final line of the output is the cumulative time for byte and
+unicode strings, and the overall performance of unicode relative to
+bytes. For example
+
+4079.83 5432.25 75.1 TOTAL
+
+However, this has no meaning as it evenly weights every test.
+