summaryrefslogtreecommitdiffstats
path: root/Doc
diff options
context:
space:
mode:
authorSerhiy Storchaka <storchaka@gmail.com>2018-01-04 09:06:13 (GMT)
committerGitHub <noreply@github.com>2018-01-04 09:06:13 (GMT)
commitfbb490fd2f38bd817d99c20c05121ad0168a38ee (patch)
tree417d39bc824e6e62503f94033dbd2b37a8b5545b /Doc
parent0cc99c8cd70d422e4b345837a907db30e9180ab9 (diff)
downloadcpython-fbb490fd2f38bd817d99c20c05121ad0168a38ee.zip
cpython-fbb490fd2f38bd817d99c20c05121ad0168a38ee.tar.gz
cpython-fbb490fd2f38bd817d99c20c05121ad0168a38ee.tar.bz2
bpo-32308: Replace empty matches adjacent to a previous non-empty match in re.sub(). (#4846)
Diffstat (limited to 'Doc')
-rw-r--r--Doc/howto/regex.rst4
-rw-r--r--Doc/library/re.rst14
-rw-r--r--Doc/whatsnew/3.7.rst13
3 files changed, 22 insertions, 9 deletions
diff --git a/Doc/howto/regex.rst b/Doc/howto/regex.rst
index fa8c693..87a6b1a 100644
--- a/Doc/howto/regex.rst
+++ b/Doc/howto/regex.rst
@@ -1140,12 +1140,12 @@ new string value and the number of replacements that were performed::
>>> p.subn('colour', 'no colours at all')
('no colours at all', 0)
-Empty matches are replaced only when they're not adjacent to a previous match.
+Empty matches are replaced only when they're not adjacent to a previous empty match.
::
>>> p = re.compile('x*')
>>> p.sub('-', 'abxd')
- '-a-b-d-'
+ '-a-b--d-'
If *replacement* is a string, any backslash escapes in it are processed. That
is, ``\n`` is converted to a single newline character, ``\r`` is converted to a
diff --git a/Doc/library/re.rst b/Doc/library/re.rst
index dae1d7e..9b175f4 100644
--- a/Doc/library/re.rst
+++ b/Doc/library/re.rst
@@ -708,12 +708,15 @@ form.
That way, separator components are always found at the same relative
indices within the result list.
- The pattern can match empty strings. ::
+ Empty matches for the pattern split the string only when not adjacent
+ to a previous empty match.
>>> re.split(r'\b', 'Words, words, words.')
['', 'Words', ', ', 'words', ', ', 'words', '.']
+ >>> re.split(r'\W*', '...words...')
+ ['', '', 'w', 'o', 'r', 'd', 's', '', '']
>>> re.split(r'(\W*)', '...words...')
- ['', '...', 'w', '', 'o', '', 'r', '', 'd', '', 's', '...', '']
+ ['', '...', '', '', 'w', '', 'o', '', 'r', '', 'd', '', 's', '...', '', '', '']
.. versionchanged:: 3.1
Added the optional flags argument.
@@ -778,8 +781,8 @@ form.
The optional argument *count* is the maximum number of pattern occurrences to be
replaced; *count* must be a non-negative integer. If omitted or zero, all
occurrences will be replaced. Empty matches for the pattern are replaced only
- when not adjacent to a previous match, so ``sub('x*', '-', 'abc')`` returns
- ``'-a-b-c-'``.
+ when not adjacent to a previous empty match, so ``sub('x*', '-', 'abxd')`` returns
+ ``'-a-b--d-'``.
In string-type *repl* arguments, in addition to the character escapes and
backreferences described above,
@@ -805,6 +808,9 @@ form.
Unknown escapes in *repl* consisting of ``'\'`` and an ASCII letter
now are errors.
+ Empty matches for the pattern are replaced when adjacent to a previous
+ non-empty match.
+
.. function:: subn(pattern, repl, string, count=0, flags=0)
diff --git a/Doc/whatsnew/3.7.rst b/Doc/whatsnew/3.7.rst
index 1924881..1311e9e 100644
--- a/Doc/whatsnew/3.7.rst
+++ b/Doc/whatsnew/3.7.rst
@@ -881,8 +881,9 @@ Changes in the Python API
* The result of splitting a string on a :mod:`regular expression <re>`
that could match an empty string has been changed. For example
splitting on ``r'\s*'`` will now split not only on whitespaces as it
- did previously, but also between any pair of non-whitespace
- characters. The previous behavior can be restored by changing the pattern
+ did previously, but also on empty strings before all non-whitespace
+ characters and just before the end of the string.
+ The previous behavior can be restored by changing the pattern
to ``r'\s+'``. A :exc:`FutureWarning` was emitted for such patterns since
Python 3.5.
@@ -893,7 +894,13 @@ Changes in the Python API
positions 2--3. To match only blank lines, the pattern should be rewritten
as ``r'(?m)^[^\S\n]*$'``.
- (Contributed by Serhiy Storchaka in :issue:`25054`.)
+ :func:`re.sub()` now replaces empty matches adjacent to a previous
+ non-empty match. For example ``re.sub('x*', '-', 'abxd')`` returns now
+ ``'-a-b--d-'`` instead of ``'-a-b--d-'`` (the first minus between 'b' and
+ 'd' replaces 'x', and the second minus replaces an empty string between
+ 'x' and 'd').
+
+ (Contributed by Serhiy Storchaka in :issue:`25054` and :issue:`32308`.)
* :class:`tracemalloc.Traceback` frames are now sorted from oldest to most
recent to be more consistent with :mod:`traceback`.