diff options
Diffstat (limited to 'Doc/library/difflib.rst')
-rw-r--r-- | Doc/library/difflib.rst | 235 |
1 files changed, 120 insertions, 115 deletions
diff --git a/Doc/library/difflib.rst b/Doc/library/difflib.rst index 585fa74..01c9c85 100644 --- a/Doc/library/difflib.rst +++ b/Doc/library/difflib.rst @@ -343,166 +343,171 @@ The :class:`SequenceMatcher` class has this constructor: The optional arguments *a* and *b* are sequences to be compared; both default to empty strings. The elements of both sequences must be :term:`hashable`. -:class:`SequenceMatcher` objects have the following methods: + :class:`SequenceMatcher` objects have the following methods: -.. method:: SequenceMatcher.set_seqs(a, b) + .. method:: set_seqs(a, b) - Set the two sequences to be compared. + Set the two sequences to be compared. -:class:`SequenceMatcher` computes and caches detailed information about the -second sequence, so if you want to compare one sequence against many sequences, -use :meth:`set_seq2` to set the commonly used sequence once and call -:meth:`set_seq1` repeatedly, once for each of the other sequences. + :class:`SequenceMatcher` computes and caches detailed information about the + second sequence, so if you want to compare one sequence against many + sequences, use :meth:`set_seq2` to set the commonly used sequence once and + call :meth:`set_seq1` repeatedly, once for each of the other sequences. -.. method:: SequenceMatcher.set_seq1(a) + .. method:: set_seq1(a) - Set the first sequence to be compared. The second sequence to be compared is - not changed. + Set the first sequence to be compared. The second sequence to be compared + is not changed. -.. method:: SequenceMatcher.set_seq2(b) + .. method:: set_seq2(b) - Set the second sequence to be compared. The first sequence to be compared is - not changed. + Set the second sequence to be compared. The first sequence to be compared + is not changed. -.. method:: SequenceMatcher.find_longest_match(alo, ahi, blo, bhi) + .. method:: find_longest_match(alo, ahi, blo, bhi) - Find longest matching block in ``a[alo:ahi]`` and ``b[blo:bhi]``. + Find longest matching block in ``a[alo:ahi]`` and ``b[blo:bhi]``. - If *isjunk* was omitted or ``None``, :meth:`find_longest_match` returns ``(i, j, - k)`` such that ``a[i:i+k]`` is equal to ``b[j:j+k]``, where ``alo <= i <= i+k <= - ahi`` and ``blo <= j <= j+k <= bhi``. For all ``(i', j', k')`` meeting those - conditions, the additional conditions ``k >= k'``, ``i <= i'``, and if ``i == - i'``, ``j <= j'`` are also met. In other words, of all maximal matching blocks, - return one that starts earliest in *a*, and of all those maximal matching blocks - that start earliest in *a*, return the one that starts earliest in *b*. + If *isjunk* was omitted or ``None``, :meth:`find_longest_match` returns + ``(i, j, k)`` such that ``a[i:i+k]`` is equal to ``b[j:j+k]``, where ``alo + <= i <= i+k <= ahi`` and ``blo <= j <= j+k <= bhi``. For all ``(i', j', + k')`` meeting those conditions, the additional conditions ``k >= k'``, ``i + <= i'``, and if ``i == i'``, ``j <= j'`` are also met. In other words, of + all maximal matching blocks, return one that starts earliest in *a*, and + of all those maximal matching blocks that start earliest in *a*, return + the one that starts earliest in *b*. - >>> s = SequenceMatcher(None, " abcd", "abcd abcd") - >>> s.find_longest_match(0, 5, 0, 9) - Match(a=0, b=4, size=5) + >>> s = SequenceMatcher(None, " abcd", "abcd abcd") + >>> s.find_longest_match(0, 5, 0, 9) + Match(a=0, b=4, size=5) - If *isjunk* was provided, first the longest matching block is determined as - above, but with the additional restriction that no junk element appears in the - block. Then that block is extended as far as possible by matching (only) junk - elements on both sides. So the resulting block never matches on junk except as - identical junk happens to be adjacent to an interesting match. + If *isjunk* was provided, first the longest matching block is determined + as above, but with the additional restriction that no junk element appears + in the block. Then that block is extended as far as possible by matching + (only) junk elements on both sides. So the resulting block never matches + on junk except as identical junk happens to be adjacent to an interesting + match. - Here's the same example as before, but considering blanks to be junk. That - prevents ``' abcd'`` from matching the ``' abcd'`` at the tail end of the second - sequence directly. Instead only the ``'abcd'`` can match, and matches the - leftmost ``'abcd'`` in the second sequence: + Here's the same example as before, but considering blanks to be junk. That + prevents ``' abcd'`` from matching the ``' abcd'`` at the tail end of the + second sequence directly. Instead only the ``'abcd'`` can match, and + matches the leftmost ``'abcd'`` in the second sequence: - >>> s = SequenceMatcher(lambda x: x==" ", " abcd", "abcd abcd") - >>> s.find_longest_match(0, 5, 0, 9) - Match(a=1, b=0, size=4) + >>> s = SequenceMatcher(lambda x: x==" ", " abcd", "abcd abcd") + >>> s.find_longest_match(0, 5, 0, 9) + Match(a=1, b=0, size=4) - If no blocks match, this returns ``(alo, blo, 0)``. + If no blocks match, this returns ``(alo, blo, 0)``. - This method returns a :term:`named tuple` ``Match(a, b, size)``. + This method returns a :term:`named tuple` ``Match(a, b, size)``. -.. method:: SequenceMatcher.get_matching_blocks() + .. method:: get_matching_blocks() - Return list of triples describing matching subsequences. Each triple is of the - form ``(i, j, n)``, and means that ``a[i:i+n] == b[j:j+n]``. The triples are - monotonically increasing in *i* and *j*. + Return list of triples describing matching subsequences. Each triple is of + the form ``(i, j, n)``, and means that ``a[i:i+n] == b[j:j+n]``. The + triples are monotonically increasing in *i* and *j*. - The last triple is a dummy, and has the value ``(len(a), len(b), 0)``. It is - the only triple with ``n == 0``. If ``(i, j, n)`` and ``(i', j', n')`` are - adjacent triples in the list, and the second is not the last triple in the list, - then ``i+n != i'`` or ``j+n != j'``; in other words, adjacent triples always - describe non-adjacent equal blocks. + The last triple is a dummy, and has the value ``(len(a), len(b), 0)``. It + is the only triple with ``n == 0``. If ``(i, j, n)`` and ``(i', j', n')`` + are adjacent triples in the list, and the second is not the last triple in + the list, then ``i+n != i'`` or ``j+n != j'``; in other words, adjacent + triples always describe non-adjacent equal blocks. - .. XXX Explain why a dummy is used! + .. XXX Explain why a dummy is used! - .. doctest:: + .. doctest:: - >>> s = SequenceMatcher(None, "abxcd", "abcd") - >>> s.get_matching_blocks() - [Match(a=0, b=0, size=2), Match(a=3, b=2, size=2), Match(a=5, b=4, size=0)] + >>> s = SequenceMatcher(None, "abxcd", "abcd") + >>> s.get_matching_blocks() + [Match(a=0, b=0, size=2), Match(a=3, b=2, size=2), Match(a=5, b=4, size=0)] -.. method:: SequenceMatcher.get_opcodes() + .. method:: get_opcodes() - Return list of 5-tuples describing how to turn *a* into *b*. Each tuple is of - the form ``(tag, i1, i2, j1, j2)``. The first tuple has ``i1 == j1 == 0``, and - remaining tuples have *i1* equal to the *i2* from the preceding tuple, and, - likewise, *j1* equal to the previous *j2*. + Return list of 5-tuples describing how to turn *a* into *b*. Each tuple is + of the form ``(tag, i1, i2, j1, j2)``. The first tuple has ``i1 == j1 == + 0``, and remaining tuples have *i1* equal to the *i2* from the preceding + tuple, and, likewise, *j1* equal to the previous *j2*. - The *tag* values are strings, with these meanings: + The *tag* values are strings, with these meanings: - +---------------+---------------------------------------------+ - | Value | Meaning | - +===============+=============================================+ - | ``'replace'`` | ``a[i1:i2]`` should be replaced by | - | | ``b[j1:j2]``. | - +---------------+---------------------------------------------+ - | ``'delete'`` | ``a[i1:i2]`` should be deleted. Note that | - | | ``j1 == j2`` in this case. | - +---------------+---------------------------------------------+ - | ``'insert'`` | ``b[j1:j2]`` should be inserted at | - | | ``a[i1:i1]``. Note that ``i1 == i2`` in | - | | this case. | - +---------------+---------------------------------------------+ - | ``'equal'`` | ``a[i1:i2] == b[j1:j2]`` (the sub-sequences | - | | are equal). | - +---------------+---------------------------------------------+ + +---------------+---------------------------------------------+ + | Value | Meaning | + +===============+=============================================+ + | ``'replace'`` | ``a[i1:i2]`` should be replaced by | + | | ``b[j1:j2]``. | + +---------------+---------------------------------------------+ + | ``'delete'`` | ``a[i1:i2]`` should be deleted. Note that | + | | ``j1 == j2`` in this case. | + +---------------+---------------------------------------------+ + | ``'insert'`` | ``b[j1:j2]`` should be inserted at | + | | ``a[i1:i1]``. Note that ``i1 == i2`` in | + | | this case. | + +---------------+---------------------------------------------+ + | ``'equal'`` | ``a[i1:i2] == b[j1:j2]`` (the sub-sequences | + | | are equal). | + +---------------+---------------------------------------------+ - For example: + For example: - >>> a = "qabxcd" - >>> b = "abycdf" - >>> s = SequenceMatcher(None, a, b) - >>> for tag, i1, i2, j1, j2 in s.get_opcodes(): - ... print(("%7s a[%d:%d] (%s) b[%d:%d] (%s)" % - ... (tag, i1, i2, a[i1:i2], j1, j2, b[j1:j2]))) - delete a[0:1] (q) b[0:0] () - equal a[1:3] (ab) b[0:2] (ab) - replace a[3:4] (x) b[2:3] (y) - equal a[4:6] (cd) b[3:5] (cd) - insert a[6:6] () b[5:6] (f) + >>> a = "qabxcd" + >>> b = "abycdf" + >>> s = SequenceMatcher(None, a, b) + >>> for tag, i1, i2, j1, j2 in s.get_opcodes(): + ... print(("%7s a[%d:%d] (%s) b[%d:%d] (%s)" % + ... (tag, i1, i2, a[i1:i2], j1, j2, b[j1:j2]))) + delete a[0:1] (q) b[0:0] () + equal a[1:3] (ab) b[0:2] (ab) + replace a[3:4] (x) b[2:3] (y) + equal a[4:6] (cd) b[3:5] (cd) + insert a[6:6] () b[5:6] (f) -.. method:: SequenceMatcher.get_grouped_opcodes([n]) + .. method:: get_grouped_opcodes([n]) - Return a :term:`generator` of groups with up to *n* lines of context. + Return a :term:`generator` of groups with up to *n* lines of context. - Starting with the groups returned by :meth:`get_opcodes`, this method splits out - smaller change clusters and eliminates intervening ranges which have no changes. + Starting with the groups returned by :meth:`get_opcodes`, this method + splits out smaller change clusters and eliminates intervening ranges which + have no changes. - The groups are returned in the same format as :meth:`get_opcodes`. + The groups are returned in the same format as :meth:`get_opcodes`. -.. method:: SequenceMatcher.ratio() + .. method:: ratio() - Return a measure of the sequences' similarity as a float in the range [0, 1]. + Return a measure of the sequences' similarity as a float in the range [0, + 1]. - Where T is the total number of elements in both sequences, and M is the number - of matches, this is 2.0\*M / T. Note that this is ``1.0`` if the sequences are - identical, and ``0.0`` if they have nothing in common. + Where T is the total number of elements in both sequences, and M is the + number of matches, this is 2.0\*M / T. Note that this is ``1.0`` if the + sequences are identical, and ``0.0`` if they have nothing in common. - This is expensive to compute if :meth:`get_matching_blocks` or - :meth:`get_opcodes` hasn't already been called, in which case you may want to - try :meth:`quick_ratio` or :meth:`real_quick_ratio` first to get an upper bound. + This is expensive to compute if :meth:`get_matching_blocks` or + :meth:`get_opcodes` hasn't already been called, in which case you may want + to try :meth:`quick_ratio` or :meth:`real_quick_ratio` first to get an + upper bound. -.. method:: SequenceMatcher.quick_ratio() + .. method:: quick_ratio() - Return an upper bound on :meth:`ratio` relatively quickly. + Return an upper bound on :meth:`ratio` relatively quickly. - This isn't defined beyond that it is an upper bound on :meth:`ratio`, and is - faster to compute. + This isn't defined beyond that it is an upper bound on :meth:`ratio`, and + is faster to compute. -.. method:: SequenceMatcher.real_quick_ratio() + .. method:: real_quick_ratio() - Return an upper bound on :meth:`ratio` very quickly. + Return an upper bound on :meth:`ratio` very quickly. - This isn't defined beyond that it is an upper bound on :meth:`ratio`, and is - faster to compute than either :meth:`ratio` or :meth:`quick_ratio`. + This isn't defined beyond that it is an upper bound on :meth:`ratio`, and + is faster to compute than either :meth:`ratio` or :meth:`quick_ratio`. The three methods that return the ratio of matching to total characters can give different results due to differing levels of approximation, although @@ -589,17 +594,17 @@ The :class:`Differ` class has this constructor: length 1), and returns true if the character is junk. The default is ``None``, meaning that no character is considered junk. -:class:`Differ` objects are used (deltas generated) via a single method: + :class:`Differ` objects are used (deltas generated) via a single method: -.. method:: Differ.compare(a, b) + .. method:: Differ.compare(a, b) - Compare two sequences of lines, and generate the delta (a sequence of lines). + Compare two sequences of lines, and generate the delta (a sequence of lines). - Each sequence must contain individual single-line strings ending with newlines. - Such sequences can be obtained from the :meth:`readlines` method of file-like - objects. The delta generated also consists of newline-terminated strings, ready - to be printed as-is via the :meth:`writelines` method of a file-like object. + Each sequence must contain individual single-line strings ending with newlines. + Such sequences can be obtained from the :meth:`readlines` method of file-like + objects. The delta generated also consists of newline-terminated strings, ready + to be printed as-is via the :meth:`writelines` method of a file-like object. .. _differ-examples: |