summaryrefslogtreecommitdiffstats
path: root/Doc
diff options
context:
space:
mode:
authorSerhiy Storchaka <storchaka@gmail.com>2022-03-21 16:28:22 (GMT)
committerGitHub <noreply@github.com>2022-03-21 16:28:22 (GMT)
commit345b390ed69f36681dbc41187bc8f49cd9135b54 (patch)
tree31ce6451bed718405b29bdb32c7eb4ff96fe5697 /Doc
parent2bde6827ea4f136297b2d882480b981ff26262b6 (diff)
downloadcpython-345b390ed69f36681dbc41187bc8f49cd9135b54.zip
cpython-345b390ed69f36681dbc41187bc8f49cd9135b54.tar.gz
cpython-345b390ed69f36681dbc41187bc8f49cd9135b54.tar.bz2
bpo-433030: Add support of atomic grouping in regular expressions (GH-31982)
* Atomic grouping: (?>...). * Possessive quantifiers: x++, x*+, x?+, x{m,n}+. Equivalent to (?>x+), (?>x*), (?>x?), (?>x{m,n}). Co-authored-by: Jeffrey C. Jacobs <timehorse@users.sourceforge.net>
Diffstat (limited to 'Doc')
-rw-r--r--Doc/library/re.rst54
-rw-r--r--Doc/whatsnew/3.11.rst6
2 files changed, 60 insertions, 0 deletions
diff --git a/Doc/library/re.rst b/Doc/library/re.rst
index 950a5b1..02c0a84 100644
--- a/Doc/library/re.rst
+++ b/Doc/library/re.rst
@@ -155,6 +155,30 @@ The special characters are:
only ``'<a>'``.
.. index::
+ single: *+; in regular expressions
+ single: ++; in regular expressions
+ single: ?+; in regular expressions
+
+``*+``, ``++``, ``?+``
+ Like the ``'*'``, ``'+'``, and ``'?'`` qualifiers, those where ``'+'`` is
+ appended also match as many times as possible.
+ However, unlike the true greedy qualifiers, these do not allow
+ back-tracking when the expression following it fails to match.
+ These are known as :dfn:`possessive` qualifiers.
+ For example, ``a*a`` will match ``'aaaa'`` because the ``a*`` will match
+ all 4 ``'a'``s, but, when the final ``'a'`` is encountered, the
+ expression is backtracked so that in the end the ``a*`` ends up matching
+ 3 ``'a'``s total, and the fourth ``'a'`` is matched by the final ``'a'``.
+ However, when ``a*+a`` is used to match ``'aaaa'``, the ``a*+`` will
+ match all 4 ``'a'``, but when the final ``'a'`` fails to find any more
+ characters to match, the expression cannot be backtracked and will thus
+ fail to match.
+ ``x*+``, ``x++`` and ``x?+`` are equivalent to ``(?>x*)``, ``(?>x+)``
+ and ``(?>x?)`` correspondigly.
+
+ .. versionadded:: 3.11
+
+.. index::
single: {} (curly brackets); in regular expressions
``{m}``
@@ -178,6 +202,21 @@ The special characters are:
6-character string ``'aaaaaa'``, ``a{3,5}`` will match 5 ``'a'`` characters,
while ``a{3,5}?`` will only match 3 characters.
+``{m,n}+``
+ Causes the resulting RE to match from *m* to *n* repetitions of the
+ preceding RE, attempting to match as many repetitions as possible
+ *without* establishing any backtracking points.
+ This is the possessive version of the qualifier above.
+ For example, on the 6-character string ``'aaaaaa'``, ``a{3,5}+aa``
+ attempt to match 5 ``'a'`` characters, then, requiring 2 more ``'a'``s,
+ will need more characters than available and thus fail, while
+ ``a{3,5}aa`` will match with ``a{3,5}`` capturing 5, then 4 ``'a'``s
+ by backtracking and then the final 2 ``'a'``s are matched by the final
+ ``aa`` in the pattern.
+ ``x{m,n}+`` is equivalent to ``(?>x{m,n})``.
+
+ .. versionadded:: 3.11
+
.. index:: single: \ (backslash); in regular expressions
``\``
@@ -336,6 +375,21 @@ The special characters are:
.. versionchanged:: 3.7
The letters ``'a'``, ``'L'`` and ``'u'`` also can be used in a group.
+``(?>...)``
+ Attempts to match ``...`` as if it was a separate regular expression, and
+ if successful, continues to match the rest of the pattern following it.
+ If the subsequent pattern fails to match, the stack can only be unwound
+ to a point *before* the ``(?>...)`` because once exited, the expression,
+ known as an :dfn:`atomic group`, has thrown away all stack points within
+ itself.
+ Thus, ``(?>.*).`` would never match anything because first the ``.*``
+ would match all characters possible, then, having nothing left to match,
+ the final ``.`` would fail to match.
+ Since there are no stack points saved in the Atomic Group, and there is
+ no stack point before it, the entire expression would thus fail to match.
+
+ .. versionadded:: 3.11
+
.. index:: single: (?P<; in regular expressions
``(?P<name>...)``
diff --git a/Doc/whatsnew/3.11.rst b/Doc/whatsnew/3.11.rst
index b7e9dc6..ca7699d 100644
--- a/Doc/whatsnew/3.11.rst
+++ b/Doc/whatsnew/3.11.rst
@@ -295,6 +295,12 @@ os
instead of ``CryptGenRandom()`` which is deprecated.
(Contributed by Dong-hee Na in :issue:`44611`.)
+re
+--
+
+* Atomic grouping (``(?>...)``) and possessive qualifiers (``*+``, ``++``,
+ ``?+``, ``{m,n}+``) are now supported in regular expressions.
+ (Contributed by Jeffrey C. Jacobs and Serhiy Storchaka in :issue:`433030`.)
shutil
------