diff options
author | Serhiy Storchaka <storchaka@gmail.com> | 2022-05-08 16:19:29 (GMT) |
---|---|---|
committer | GitHub <noreply@github.com> | 2022-05-08 16:19:29 (GMT) |
commit | a84a56d80fa3d9a5909d074bbcd2efff7ef8f1b7 (patch) | |
tree | 5a129f41f7e8c49aa7ffa3f3d874ff9cd41751a8 /Doc | |
parent | 7b024e3a3f77027f747da7580ed0a3ed2dec276a (diff) | |
download | cpython-a84a56d80fa3d9a5909d074bbcd2efff7ef8f1b7.zip cpython-a84a56d80fa3d9a5909d074bbcd2efff7ef8f1b7.tar.gz cpython-a84a56d80fa3d9a5909d074bbcd2efff7ef8f1b7.tar.bz2 |
gh-91760: More strict rules for numerical group references and group names in RE (GH-91792)
Only sequence of ASCII digits is now accepted as a numerical reference.
The group name in bytes patterns and replacement strings can now only
contain ASCII letters and digits and underscore.
Diffstat (limited to 'Doc')
-rw-r--r-- | Doc/library/re.rst | 19 | ||||
-rw-r--r-- | Doc/whatsnew/3.12.rst | 10 |
2 files changed, 21 insertions, 8 deletions
diff --git a/Doc/library/re.rst b/Doc/library/re.rst index 3cd9f25..39e7d23 100644 --- a/Doc/library/re.rst +++ b/Doc/library/re.rst @@ -395,7 +395,8 @@ The special characters are: ``(?P<name>...)`` Similar to regular parentheses, but the substring matched by the group is accessible via the symbolic group name *name*. Group names must be valid - Python identifiers, and each group name must be defined only once within a + Python identifiers, and in bytes patterns they must contain only characters + in the ASCII range. Each group name must be defined only once within a regular expression. A symbolic group is also a numbered group, just as if the group were not named. @@ -417,8 +418,9 @@ The special characters are: | | * ``\1`` | +---------------------------------------+----------------------------------+ - .. deprecated:: 3.11 - Group names containing non-ASCII characters in bytes patterns. + .. versionchanged:: 3.12 + In bytes patterns group names must contain only characters in + the ASCII range. .. index:: single: (?P=; in regular expressions @@ -489,8 +491,8 @@ The special characters are: will match with ``'<user@host.com>'`` as well as ``'user@host.com'``, but not with ``'<user@host.com'`` nor ``'user@host.com>'``. - .. deprecated:: 3.11 - Group *id* containing anything except ASCII digits. + .. versionchanged:: 3.12 + Group *id* can only contain ASCII digits. The special sequences consist of ``'\'`` and a character from the list below. @@ -1001,9 +1003,10 @@ form. Empty matches for the pattern are replaced when adjacent to a previous non-empty match. - .. deprecated:: 3.11 - Group *id* containing anything except ASCII digits. - Group names containing non-ASCII characters in bytes replacement strings. + .. versionchanged:: 3.12 + Group *id* can only contain ASCII digits. + In bytes replacement strings group names must contain only characters + in the ASCII range. .. function:: subn(pattern, repl, string, count=0, flags=0) diff --git a/Doc/whatsnew/3.12.rst b/Doc/whatsnew/3.12.rst index dacf041..b73c3db 100644 --- a/Doc/whatsnew/3.12.rst +++ b/Doc/whatsnew/3.12.rst @@ -114,3 +114,13 @@ Porting to Python 3.12 This section lists previously described changes and other bugfixes that may require changes to your code. + +Changes in the Python API +------------------------- + +* More strict rules are now applied for numerical group references and + group names in regular expressions. + Only sequence of ASCII digits is now accepted as a numerical reference. + The group name in bytes patterns and replacement strings can now only + contain ASCII letters and digits and underscore. + (Contributed by Serhiy Storchaka in :gh:`91760`.) |