[3.6] bpo-32614: Modify re examples to use a raw string to prevent wa… …rning (GH-5265) (GH-5500)

Modify RE examples in documentation to use raw strings to prevent DeprecationWarning. Add text to REGEX HOWTO to highlight the deprecation. Approved by Serhiy Storchaka. (cherry picked from commit 66771422d0541289d0b1287bc3c28e8b5609f6b4)
author: Terry Jan Reedy <tjreedy@udel.edu> 2018-02-02 22:37:30 (GMT)
committer: GitHub <noreply@github.com> 2018-02-02 22:37:30 (GMT)
commit: fbf8e823c02ac1c93a48609cc74e439e19ccb426 (patch)
tree: b99a53c99dae6e0409c290b53e13ad0c80010870
parent: f61951b10cc08d3926a3ebaacc154d4149150ef4 (diff)
download: cpython-fbf8e823c02ac1c93a48609cc74e439e19ccb426.zip
cpython-fbf8e823c02ac1c93a48609cc74e439e19ccb426.tar.gz
cpython-fbf8e823c02ac1c93a48609cc74e439e19ccb426.tar.bz2
4 files changed, 26 insertions, 8 deletions
diff --git a/Doc/howto/regex.rst b/Doc/howto/regex.rst
index eef6347..a3a6553 100644
--- a/Doc/howto/regex.rst
+++ b/Doc/howto/regex.rst
@@ -289,6 +289,8 @@ Putting REs in strings keeps the Python language simpler, but has one
 disadvantage which is the topic of the next section.
 
 
+.. _the-backslash-plague:
+
 The Backslash Plague
 --------------------
 
@@ -327,6 +329,13 @@ backslashes are not handled in any special way in a string literal prefixed with
 while ``"\n"`` is a one-character string containing a newline. Regular
 expressions will often be written in Python code using this raw string notation.
 
+In addition, special escape sequences that are valid in regular expressions,
+but not valid as Python string literals, now result in a
+:exc:`DeprecationWarning` and will eventually become a :exc:`SyntaxError`,
+which means the sequences will be invalid if raw string notation or escaping
+the backslashes isn't used.
+
+
 +-------------------+------------------+
 | Regular String    | Raw string       |
 +===================+==================+
@@ -457,12 +466,18 @@ In actual programs, the most common style is to store the
 Two pattern methods return all of the matches for a pattern.
 :meth:`~re.pattern.findall` returns a list of matching strings::
 
-   >>> p = re.compile('\d+')
+   >>> p = re.compile(r'\d+')
    >>> p.findall('12 drummers drumming, 11 pipers piping, 10 lords a-leaping')
    ['12', '11', '10']
 
-:meth:`~re.pattern.findall` has to create the entire list before it can be returned as the
-result.  The :meth:`~re.pattern.finditer` method returns a sequence of
+The ``r`` prefix, making the literal a raw string literal, is needed in this
+example because escape sequences in a normal "cooked" string literal that are
+not recognized by Python, as opposed to regular expressions, now result in a
+:exc:`DeprecationWarning` and will eventually become a :exc:`SyntaxError`.  See
+:ref:`the-backslash-plague`.
+
+:meth:`~re.Pattern.findall` has to create the entire list before it can be returned as the
+result.  The :meth:`~re.Pattern.finditer` method returns a sequence of
 :ref:`match object <match-objects>` instances as an :term:`iterator`::
 
    >>> iterator = p.finditer('12 drummers drumming, 11 ... 10 ...')
@@ -1096,11 +1111,11 @@ following calls::
 The module-level function :func:`re.split` adds the RE to be used as the first
 argument, but is otherwise the same.   ::
 
-   >>> re.split('[\W]+', 'Words, words, words.')
+   >>> re.split(r'[\W]+', 'Words, words, words.')
    ['Words', 'words', 'words', '']
-   >>> re.split('([\W]+)', 'Words, words, words.')
+   >>> re.split(r'([\W]+)', 'Words, words, words.')
    ['Words', ', ', 'words', ', ', 'words', '.', '']
-   >>> re.split('[\W]+', 'Words, words, words.', 1)
+   >>> re.split(r'[\W]+', 'Words, words, words.', 1)
    ['Words', 'words, words.']
 
 
diff --git a/Doc/howto/unicode.rst b/Doc/howto/unicode.rst
index 9649b9c..b54e150 100644
--- a/Doc/howto/unicode.rst
+++ b/Doc/howto/unicode.rst
@@ -463,7 +463,7 @@ The string in this example has the number 57 written in both Thai and
 Arabic numerals::
 
    import re
-   p = re.compile('\d+')
+   p = re.compile(r'\d+')
 
    s = "Over \u0e55\u0e57 57 flavours"
    m = p.search(s)
diff --git a/Doc/library/re.rst b/Doc/library/re.rst
index 874c8dd..db92c48 100644
--- a/Doc/library/re.rst
+++ b/Doc/library/re.rst
@@ -315,7 +315,7 @@ The special characters are:
 
    This example looks for a word following a hyphen:
 
-      >>> m = re.search('(?<=-)\w+', 'spam-egg')
+      >>> m = re.search(r'(?<=-)\w+', 'spam-egg')
       >>> m.group(0)
       'egg'
 
diff --git a/Misc/NEWS.d/next/Documentation/2018-02-02-07-41-57.bpo-32614.LSqzGw.rst b/Misc/NEWS.d/next/Documentation/2018-02-02-07-41-57.bpo-32614.LSqzGw.rst
new file mode 100644
index 0000000..9e9f3e3
--- /dev/null
+++ b/Misc/NEWS.d/next/Documentation/2018-02-02-07-41-57.bpo-32614.LSqzGw.rst
@@ -0,0 +1,3 @@
+Modify RE examples in documentation to use raw strings to prevent
+:exc:`DeprecationWarning` and add text to REGEX HOWTO to highlight the
+deprecation.
author	Terry Jan Reedy <tjreedy@udel.edu>	2018-02-02 22:37:30 (GMT)
committer	GitHub <noreply@github.com>	2018-02-02 22:37:30 (GMT)
commit	fbf8e823c02ac1c93a48609cc74e439e19ccb426 (patch)
tree	b99a53c99dae6e0409c290b53e13ad0c80010870
parent	f61951b10cc08d3926a3ebaacc154d4149150ef4 (diff)
download	cpython-fbf8e823c02ac1c93a48609cc74e439e19ccb426.zip cpython-fbf8e823c02ac1c93a48609cc74e439e19ccb426.tar.gz cpython-fbf8e823c02ac1c93a48609cc74e439e19ccb426.tar.bz2