summaryrefslogtreecommitdiffstats
path: root/Doc/library/difflib.rst
diff options
context:
space:
mode:
authorTerry Reedy <tjreedy@udel.edu>2010-11-25 06:12:34 (GMT)
committerTerry Reedy <tjreedy@udel.edu>2010-11-25 06:12:34 (GMT)
commit99f9637de8893ecdb08ade606fe3a988e6a8c848 (patch)
tree0808c60efc0a58b46601438d012aa29ee73afced /Doc/library/difflib.rst
parentbd86301070e38726532ae57e7d1bdc01143b298b (diff)
downloadcpython-99f9637de8893ecdb08ade606fe3a988e6a8c848.zip
cpython-99f9637de8893ecdb08ade606fe3a988e6a8c848.tar.gz
cpython-99f9637de8893ecdb08ade606fe3a988e6a8c848.tar.bz2
Issue 2986: Add autojunk paramater to SequenceMatcher to turn off heuristic. Patch by Terry Reedy, Eli Bendersky, and Simon Cross
Diffstat (limited to 'Doc/library/difflib.rst')
-rw-r--r--Doc/library/difflib.rst14
1 files changed, 13 insertions, 1 deletions
diff --git a/Doc/library/difflib.rst b/Doc/library/difflib.rst
index 58bbe45..c7efa96 100644
--- a/Doc/library/difflib.rst
+++ b/Doc/library/difflib.rst
@@ -17,6 +17,7 @@ can be used for example, for comparing files, and can produce difference
information in various formats, including HTML and context and unified
diffs. For comparing directories and files, see also, the :mod:`filecmp` module.
+
.. class:: SequenceMatcher
This is a flexible class for comparing pairs of sequences of any type, so long
@@ -35,6 +36,14 @@ diffs. For comparing directories and files, see also, the :mod:`filecmp` module.
complicated way on how many elements the sequences have in common; best case
time is linear.
+ **Automatic junk heuristic:** :class:`SequenceMatcher` supports a heuristic that
+ automatically treats certain sequence items as junk. The heuristic counts how many
+ times each individual item appears in the sequence. If an item's duplicates (after
+ the first one) account for more than 1% of the sequence and the sequence is at least
+ 200 items long, this item is marked as "popular" and is treated as junk for
+ the purpose of sequence matching. This heuristic can be turned off by setting
+ the ``autojunk`` argument to ``False`` when creating the :class:`SequenceMatcher`.
+
.. class:: Differ
@@ -324,7 +333,7 @@ SequenceMatcher Objects
The :class:`SequenceMatcher` class has this constructor:
-.. class:: SequenceMatcher(isjunk=None, a='', b='')
+.. class:: SequenceMatcher(isjunk=None, a='', b='', autojunk=True)
Optional argument *isjunk* must be ``None`` (the default) or a one-argument
function that takes a sequence element and returns true if and only if the
@@ -340,6 +349,9 @@ The :class:`SequenceMatcher` class has this constructor:
The optional arguments *a* and *b* are sequences to be compared; both default to
empty strings. The elements of both sequences must be :term:`hashable`.
+ The optional argument *autojunk* can be used to disable the automatic junk
+ heuristic.
+
:class:`SequenceMatcher` objects have the following methods: