diff options
author | Serhiy Storchaka <storchaka@gmail.com> | 2022-03-21 16:28:22 (GMT) |
---|---|---|
committer | GitHub <noreply@github.com> | 2022-03-21 16:28:22 (GMT) |
commit | 345b390ed69f36681dbc41187bc8f49cd9135b54 (patch) | |
tree | 31ce6451bed718405b29bdb32c7eb4ff96fe5697 /Doc/library/re.rst | |
parent | 2bde6827ea4f136297b2d882480b981ff26262b6 (diff) | |
download | cpython-345b390ed69f36681dbc41187bc8f49cd9135b54.zip cpython-345b390ed69f36681dbc41187bc8f49cd9135b54.tar.gz cpython-345b390ed69f36681dbc41187bc8f49cd9135b54.tar.bz2 |
bpo-433030: Add support of atomic grouping in regular expressions (GH-31982)
* Atomic grouping: (?>...).
* Possessive quantifiers: x++, x*+, x?+, x{m,n}+.
Equivalent to (?>x+), (?>x*), (?>x?), (?>x{m,n}).
Co-authored-by: Jeffrey C. Jacobs <timehorse@users.sourceforge.net>
Diffstat (limited to 'Doc/library/re.rst')
-rw-r--r-- | Doc/library/re.rst | 54 |
1 files changed, 54 insertions, 0 deletions
diff --git a/Doc/library/re.rst b/Doc/library/re.rst index 950a5b1..02c0a84 100644 --- a/Doc/library/re.rst +++ b/Doc/library/re.rst @@ -155,6 +155,30 @@ The special characters are: only ``'<a>'``. .. index:: + single: *+; in regular expressions + single: ++; in regular expressions + single: ?+; in regular expressions + +``*+``, ``++``, ``?+`` + Like the ``'*'``, ``'+'``, and ``'?'`` qualifiers, those where ``'+'`` is + appended also match as many times as possible. + However, unlike the true greedy qualifiers, these do not allow + back-tracking when the expression following it fails to match. + These are known as :dfn:`possessive` qualifiers. + For example, ``a*a`` will match ``'aaaa'`` because the ``a*`` will match + all 4 ``'a'``s, but, when the final ``'a'`` is encountered, the + expression is backtracked so that in the end the ``a*`` ends up matching + 3 ``'a'``s total, and the fourth ``'a'`` is matched by the final ``'a'``. + However, when ``a*+a`` is used to match ``'aaaa'``, the ``a*+`` will + match all 4 ``'a'``, but when the final ``'a'`` fails to find any more + characters to match, the expression cannot be backtracked and will thus + fail to match. + ``x*+``, ``x++`` and ``x?+`` are equivalent to ``(?>x*)``, ``(?>x+)`` + and ``(?>x?)`` correspondigly. + + .. versionadded:: 3.11 + +.. index:: single: {} (curly brackets); in regular expressions ``{m}`` @@ -178,6 +202,21 @@ The special characters are: 6-character string ``'aaaaaa'``, ``a{3,5}`` will match 5 ``'a'`` characters, while ``a{3,5}?`` will only match 3 characters. +``{m,n}+`` + Causes the resulting RE to match from *m* to *n* repetitions of the + preceding RE, attempting to match as many repetitions as possible + *without* establishing any backtracking points. + This is the possessive version of the qualifier above. + For example, on the 6-character string ``'aaaaaa'``, ``a{3,5}+aa`` + attempt to match 5 ``'a'`` characters, then, requiring 2 more ``'a'``s, + will need more characters than available and thus fail, while + ``a{3,5}aa`` will match with ``a{3,5}`` capturing 5, then 4 ``'a'``s + by backtracking and then the final 2 ``'a'``s are matched by the final + ``aa`` in the pattern. + ``x{m,n}+`` is equivalent to ``(?>x{m,n})``. + + .. versionadded:: 3.11 + .. index:: single: \ (backslash); in regular expressions ``\`` @@ -336,6 +375,21 @@ The special characters are: .. versionchanged:: 3.7 The letters ``'a'``, ``'L'`` and ``'u'`` also can be used in a group. +``(?>...)`` + Attempts to match ``...`` as if it was a separate regular expression, and + if successful, continues to match the rest of the pattern following it. + If the subsequent pattern fails to match, the stack can only be unwound + to a point *before* the ``(?>...)`` because once exited, the expression, + known as an :dfn:`atomic group`, has thrown away all stack points within + itself. + Thus, ``(?>.*).`` would never match anything because first the ``.*`` + would match all characters possible, then, having nothing left to match, + the final ``.`` would fail to match. + Since there are no stack points saved in the Atomic Group, and there is + no stack point before it, the entire expression would thus fail to match. + + .. versionadded:: 3.11 + .. index:: single: (?P<; in regular expressions ``(?P<name>...)`` |