summaryrefslogtreecommitdiffstats
path: root/Doc
diff options
context:
space:
mode:
authorSerhiy Storchaka <storchaka@gmail.com>2022-05-08 16:19:29 (GMT)
committerGitHub <noreply@github.com>2022-05-08 16:19:29 (GMT)
commita84a56d80fa3d9a5909d074bbcd2efff7ef8f1b7 (patch)
tree5a129f41f7e8c49aa7ffa3f3d874ff9cd41751a8 /Doc
parent7b024e3a3f77027f747da7580ed0a3ed2dec276a (diff)
downloadcpython-a84a56d80fa3d9a5909d074bbcd2efff7ef8f1b7.zip
cpython-a84a56d80fa3d9a5909d074bbcd2efff7ef8f1b7.tar.gz
cpython-a84a56d80fa3d9a5909d074bbcd2efff7ef8f1b7.tar.bz2
gh-91760: More strict rules for numerical group references and group names in RE (GH-91792)
Only sequence of ASCII digits is now accepted as a numerical reference. The group name in bytes patterns and replacement strings can now only contain ASCII letters and digits and underscore.
Diffstat (limited to 'Doc')
-rw-r--r--Doc/library/re.rst19
-rw-r--r--Doc/whatsnew/3.12.rst10
2 files changed, 21 insertions, 8 deletions
diff --git a/Doc/library/re.rst b/Doc/library/re.rst
index 3cd9f25..39e7d23 100644
--- a/Doc/library/re.rst
+++ b/Doc/library/re.rst
@@ -395,7 +395,8 @@ The special characters are:
``(?P<name>...)``
Similar to regular parentheses, but the substring matched by the group is
accessible via the symbolic group name *name*. Group names must be valid
- Python identifiers, and each group name must be defined only once within a
+ Python identifiers, and in bytes patterns they must contain only characters
+ in the ASCII range. Each group name must be defined only once within a
regular expression. A symbolic group is also a numbered group, just as if
the group were not named.
@@ -417,8 +418,9 @@ The special characters are:
| | * ``\1`` |
+---------------------------------------+----------------------------------+
- .. deprecated:: 3.11
- Group names containing non-ASCII characters in bytes patterns.
+ .. versionchanged:: 3.12
+ In bytes patterns group names must contain only characters in
+ the ASCII range.
.. index:: single: (?P=; in regular expressions
@@ -489,8 +491,8 @@ The special characters are:
will match with ``'<user@host.com>'`` as well as ``'user@host.com'``, but
not with ``'<user@host.com'`` nor ``'user@host.com>'``.
- .. deprecated:: 3.11
- Group *id* containing anything except ASCII digits.
+ .. versionchanged:: 3.12
+ Group *id* can only contain ASCII digits.
The special sequences consist of ``'\'`` and a character from the list below.
@@ -1001,9 +1003,10 @@ form.
Empty matches for the pattern are replaced when adjacent to a previous
non-empty match.
- .. deprecated:: 3.11
- Group *id* containing anything except ASCII digits.
- Group names containing non-ASCII characters in bytes replacement strings.
+ .. versionchanged:: 3.12
+ Group *id* can only contain ASCII digits.
+ In bytes replacement strings group names must contain only characters
+ in the ASCII range.
.. function:: subn(pattern, repl, string, count=0, flags=0)
diff --git a/Doc/whatsnew/3.12.rst b/Doc/whatsnew/3.12.rst
index dacf041..b73c3db 100644
--- a/Doc/whatsnew/3.12.rst
+++ b/Doc/whatsnew/3.12.rst
@@ -114,3 +114,13 @@ Porting to Python 3.12
This section lists previously described changes and other bugfixes
that may require changes to your code.
+
+Changes in the Python API
+-------------------------
+
+* More strict rules are now applied for numerical group references and
+ group names in regular expressions.
+ Only sequence of ASCII digits is now accepted as a numerical reference.
+ The group name in bytes patterns and replacement strings can now only
+ contain ASCII letters and digits and underscore.
+ (Contributed by Serhiy Storchaka in :gh:`91760`.)