diff options
author | Serhiy Storchaka <storchaka@gmail.com> | 2018-01-04 09:06:13 (GMT) |
---|---|---|
committer | GitHub <noreply@github.com> | 2018-01-04 09:06:13 (GMT) |
commit | fbb490fd2f38bd817d99c20c05121ad0168a38ee (patch) | |
tree | 417d39bc824e6e62503f94033dbd2b37a8b5545b /Doc | |
parent | 0cc99c8cd70d422e4b345837a907db30e9180ab9 (diff) | |
download | cpython-fbb490fd2f38bd817d99c20c05121ad0168a38ee.zip cpython-fbb490fd2f38bd817d99c20c05121ad0168a38ee.tar.gz cpython-fbb490fd2f38bd817d99c20c05121ad0168a38ee.tar.bz2 |
bpo-32308: Replace empty matches adjacent to a previous non-empty match in re.sub(). (#4846)
Diffstat (limited to 'Doc')
-rw-r--r-- | Doc/howto/regex.rst | 4 | ||||
-rw-r--r-- | Doc/library/re.rst | 14 | ||||
-rw-r--r-- | Doc/whatsnew/3.7.rst | 13 |
3 files changed, 22 insertions, 9 deletions
diff --git a/Doc/howto/regex.rst b/Doc/howto/regex.rst index fa8c693..87a6b1a 100644 --- a/Doc/howto/regex.rst +++ b/Doc/howto/regex.rst @@ -1140,12 +1140,12 @@ new string value and the number of replacements that were performed:: >>> p.subn('colour', 'no colours at all') ('no colours at all', 0) -Empty matches are replaced only when they're not adjacent to a previous match. +Empty matches are replaced only when they're not adjacent to a previous empty match. :: >>> p = re.compile('x*') >>> p.sub('-', 'abxd') - '-a-b-d-' + '-a-b--d-' If *replacement* is a string, any backslash escapes in it are processed. That is, ``\n`` is converted to a single newline character, ``\r`` is converted to a diff --git a/Doc/library/re.rst b/Doc/library/re.rst index dae1d7e..9b175f4 100644 --- a/Doc/library/re.rst +++ b/Doc/library/re.rst @@ -708,12 +708,15 @@ form. That way, separator components are always found at the same relative indices within the result list. - The pattern can match empty strings. :: + Empty matches for the pattern split the string only when not adjacent + to a previous empty match. >>> re.split(r'\b', 'Words, words, words.') ['', 'Words', ', ', 'words', ', ', 'words', '.'] + >>> re.split(r'\W*', '...words...') + ['', '', 'w', 'o', 'r', 'd', 's', '', ''] >>> re.split(r'(\W*)', '...words...') - ['', '...', 'w', '', 'o', '', 'r', '', 'd', '', 's', '...', ''] + ['', '...', '', '', 'w', '', 'o', '', 'r', '', 'd', '', 's', '...', '', '', ''] .. versionchanged:: 3.1 Added the optional flags argument. @@ -778,8 +781,8 @@ form. The optional argument *count* is the maximum number of pattern occurrences to be replaced; *count* must be a non-negative integer. If omitted or zero, all occurrences will be replaced. Empty matches for the pattern are replaced only - when not adjacent to a previous match, so ``sub('x*', '-', 'abc')`` returns - ``'-a-b-c-'``. + when not adjacent to a previous empty match, so ``sub('x*', '-', 'abxd')`` returns + ``'-a-b--d-'``. In string-type *repl* arguments, in addition to the character escapes and backreferences described above, @@ -805,6 +808,9 @@ form. Unknown escapes in *repl* consisting of ``'\'`` and an ASCII letter now are errors. + Empty matches for the pattern are replaced when adjacent to a previous + non-empty match. + .. function:: subn(pattern, repl, string, count=0, flags=0) diff --git a/Doc/whatsnew/3.7.rst b/Doc/whatsnew/3.7.rst index 1924881..1311e9e 100644 --- a/Doc/whatsnew/3.7.rst +++ b/Doc/whatsnew/3.7.rst @@ -881,8 +881,9 @@ Changes in the Python API * The result of splitting a string on a :mod:`regular expression <re>` that could match an empty string has been changed. For example splitting on ``r'\s*'`` will now split not only on whitespaces as it - did previously, but also between any pair of non-whitespace - characters. The previous behavior can be restored by changing the pattern + did previously, but also on empty strings before all non-whitespace + characters and just before the end of the string. + The previous behavior can be restored by changing the pattern to ``r'\s+'``. A :exc:`FutureWarning` was emitted for such patterns since Python 3.5. @@ -893,7 +894,13 @@ Changes in the Python API positions 2--3. To match only blank lines, the pattern should be rewritten as ``r'(?m)^[^\S\n]*$'``. - (Contributed by Serhiy Storchaka in :issue:`25054`.) + :func:`re.sub()` now replaces empty matches adjacent to a previous + non-empty match. For example ``re.sub('x*', '-', 'abxd')`` returns now + ``'-a-b--d-'`` instead of ``'-a-b--d-'`` (the first minus between 'b' and + 'd' replaces 'x', and the second minus replaces an empty string between + 'x' and 'd'). + + (Contributed by Serhiy Storchaka in :issue:`25054` and :issue:`32308`.) * :class:`tracemalloc.Traceback` frames are now sorted from oldest to most recent to be more consistent with :mod:`traceback`. |