Mostly in SequenceMatcher.{__chain_b, find_longest_match}:

This now does a dynamic analysis of which elements are so frequently repeated as to constitute noise. The primary benefit is an enormous speedup in find_longest_match, as the innermost loop can have factors of 100s less potential matches to worry about, in cases where the sequences have many duplicate elements. In effect, this zooms in on sequences of non-ubiquitous elements now. While I like what I've seen of the effects so far, I still consider this experimental. Please give it a try!
author: Tim Peters <tim.peters@gmail.com> 2002-04-29 01:37:32 (GMT)
committer: Tim Peters <tim.peters@gmail.com> 2002-04-29 01:37:32 (GMT)
commit: 81b9251d5996ec89bcc016c29ecc0b5f0204e59b (patch)
tree: 642fedf48a570b1c7c9b5ecc360ffef8954b58b9 /Doc
parent: 29c0afcfecdd52f6980274e0d613df7bf52bc6a6 (diff)
download: cpython-81b9251d5996ec89bcc016c29ecc0b5f0204e59b.zip
cpython-81b9251d5996ec89bcc016c29ecc0b5f0204e59b.tar.gz
cpython-81b9251d5996ec89bcc016c29ecc0b5f0204e59b.tar.bz2
1 files changed, 18 insertions, 14 deletions
diff --git a/Doc/lib/libdifflib.tex b/Doc/lib/libdifflib.tex
index 3bc103b..37e401e 100644
--- a/Doc/lib/libdifflib.tex
+++ b/Doc/lib/libdifflib.tex
@@ -90,13 +90,19 @@
   Optional keyword parameters \var{linejunk} and \var{charjunk} are
   for filter functions (or \code{None}):
 
-  \var{linejunk}: A function that should accept a single string
-  argument, and return true if the string is junk (or false if it is
-  not). The default is module-level function
+  \var{linejunk}: A function that accepts a single string
+  argument, and returns true if the string is junk, or false if not.
+  The default is (\code{None}), starting with Python 2.3.  Before then,
+  the default was the module-level function
   \function{IS_LINE_JUNK()}, which filters out lines without visible
   characters, except for at most one pound character (\character{\#}).
+  As of Python 2.3, the underlying \class{SequenceMatcher} class
+  does a dynamic analysis of which lines are so frequent as to
+  constitute noise, and this usually works better than the pre-2.3
+  default.
 
-  \var{charjunk}: A function that should accept a string of length 1.
+  \var{charjunk}: A function that accepts a character (a string of
+  length 1), and returns if the character is junk, or false if not.
   The default is module-level function \function{IS_CHARACTER_JUNK()},
   which filters out whitespace characters (a blank or tab; note: bad
   idea to include newline in this!).
@@ -150,7 +156,7 @@ emu
   Return true for ignorable lines.  The line \var{line} is ignorable
   if \var{line} is blank or contains a single \character{\#},
   otherwise it is not ignorable.  Used as a default for parameter
-  \var{linejunk} in \function{ndiff()}.
+  \var{linejunk} in \function{ndiff()} before Python 2.3.
 \end{funcdesc}
 
 
@@ -443,16 +449,14 @@ The \class{Differ} class has this constructor:
   Optional keyword parameters \var{linejunk} and \var{charjunk} are
   for filter functions (or \code{None}):
 
-  \var{linejunk}: A function that should accept a single string
-  argument, and return true if the string is junk.  The default is
-  module-level function \function{IS_LINE_JUNK()}, which filters out
-  lines without visible characters, except for at most one pound
-  character (\character{\#}).
+  \var{linejunk}: A function that accepts a single string
+  argument, and returns true if the string is junk.  The default is
+  \code{None}, meaning that no line is considered junk.
 
-  \var{charjunk}: A function that should accept a string of length 1.
-  The default is module-level function \function{IS_CHARACTER_JUNK()},
-  which filters out whitespace characters (a blank or tab; note: bad
-  idea to include newline in this!).
+  \var{charjunk}: A function that accepts a single character argument
+  (a string of length 1), and returns true if the character is junk.
+  The default is \code{None}, meaning that no character is
+  considered junk.
 \end{classdesc}
 
 \class{Differ} objects are used (deltas generated) via a single
author	Tim Peters <tim.peters@gmail.com>	2002-04-29 01:37:32 (GMT)
committer	Tim Peters <tim.peters@gmail.com>	2002-04-29 01:37:32 (GMT)
commit	81b9251d5996ec89bcc016c29ecc0b5f0204e59b (patch)
tree	642fedf48a570b1c7c9b5ecc360ffef8954b58b9 /Doc
parent	29c0afcfecdd52f6980274e0d613df7bf52bc6a6 (diff)
download	cpython-81b9251d5996ec89bcc016c29ecc0b5f0204e59b.zip cpython-81b9251d5996ec89bcc016c29ecc0b5f0204e59b.tar.gz cpython-81b9251d5996ec89bcc016c29ecc0b5f0204e59b.tar.bz2