diff options
author | Terry Reedy <tjreedy@udel.edu> | 2010-11-25 06:12:34 (GMT) |
---|---|---|
committer | Terry Reedy <tjreedy@udel.edu> | 2010-11-25 06:12:34 (GMT) |
commit | 99f9637de8893ecdb08ade606fe3a988e6a8c848 (patch) | |
tree | 0808c60efc0a58b46601438d012aa29ee73afced /Doc/library/difflib.rst | |
parent | bd86301070e38726532ae57e7d1bdc01143b298b (diff) | |
download | cpython-99f9637de8893ecdb08ade606fe3a988e6a8c848.zip cpython-99f9637de8893ecdb08ade606fe3a988e6a8c848.tar.gz cpython-99f9637de8893ecdb08ade606fe3a988e6a8c848.tar.bz2 |
Issue 2986: Add autojunk paramater to SequenceMatcher to turn off heuristic. Patch by Terry Reedy, Eli Bendersky, and Simon Cross
Diffstat (limited to 'Doc/library/difflib.rst')
-rw-r--r-- | Doc/library/difflib.rst | 14 |
1 files changed, 13 insertions, 1 deletions
diff --git a/Doc/library/difflib.rst b/Doc/library/difflib.rst index 58bbe45..c7efa96 100644 --- a/Doc/library/difflib.rst +++ b/Doc/library/difflib.rst @@ -17,6 +17,7 @@ can be used for example, for comparing files, and can produce difference information in various formats, including HTML and context and unified diffs. For comparing directories and files, see also, the :mod:`filecmp` module. + .. class:: SequenceMatcher This is a flexible class for comparing pairs of sequences of any type, so long @@ -35,6 +36,14 @@ diffs. For comparing directories and files, see also, the :mod:`filecmp` module. complicated way on how many elements the sequences have in common; best case time is linear. + **Automatic junk heuristic:** :class:`SequenceMatcher` supports a heuristic that + automatically treats certain sequence items as junk. The heuristic counts how many + times each individual item appears in the sequence. If an item's duplicates (after + the first one) account for more than 1% of the sequence and the sequence is at least + 200 items long, this item is marked as "popular" and is treated as junk for + the purpose of sequence matching. This heuristic can be turned off by setting + the ``autojunk`` argument to ``False`` when creating the :class:`SequenceMatcher`. + .. class:: Differ @@ -324,7 +333,7 @@ SequenceMatcher Objects The :class:`SequenceMatcher` class has this constructor: -.. class:: SequenceMatcher(isjunk=None, a='', b='') +.. class:: SequenceMatcher(isjunk=None, a='', b='', autojunk=True) Optional argument *isjunk* must be ``None`` (the default) or a one-argument function that takes a sequence element and returns true if and only if the @@ -340,6 +349,9 @@ The :class:`SequenceMatcher` class has this constructor: The optional arguments *a* and *b* are sequences to be compared; both default to empty strings. The elements of both sequences must be :term:`hashable`. + The optional argument *autojunk* can be used to disable the automatic junk + heuristic. + :class:`SequenceMatcher` objects have the following methods: |