summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorBenjamin Peterson <benjamin@python.org>2008-04-28 21:05:10 (GMT)
committerBenjamin Peterson <benjamin@python.org>2008-04-28 21:05:10 (GMT)
commita2f837f751bd38c85636c6431a11b216228da92e (patch)
treef2f28d4e8e1c3f0f63eb0e6047892ce0a416ed14
parenta288faef8ee3179a71d2a112a5b0321c327e6ac1 (diff)
downloadcpython-a2f837f751bd38c85636c6431a11b216228da92e.zip
cpython-a2f837f751bd38c85636c6431a11b216228da92e.tar.gz
cpython-a2f837f751bd38c85636c6431a11b216228da92e.tar.bz2
Document the fact that '\U' and '\u' escapes are not treated specially in 3.0 (see issue 2541)
-rw-r--r--Doc/reference/lexical_analysis.rst14
-rw-r--r--Doc/whatsnew/3.0.rst5
-rw-r--r--Misc/NEWS3
3 files changed, 11 insertions, 11 deletions
diff --git a/Doc/reference/lexical_analysis.rst b/Doc/reference/lexical_analysis.rst
index 566e90b..2a9fd79 100644
--- a/Doc/reference/lexical_analysis.rst
+++ b/Doc/reference/lexical_analysis.rst
@@ -423,8 +423,9 @@ characters that otherwise have a special meaning, such as newline, backslash
itself, or the quote character.
String literals may optionally be prefixed with a letter ``'r'`` or ``'R'``;
-such strings are called :dfn:`raw strings` and use different rules for
-interpreting backslash escape sequences.
+such strings are called :dfn:`raw strings` and treat backslashes as literal
+characters. As a result, ``'\U'`` and ``'\u'`` escapes in raw strings are not
+treated specially.
Bytes literals are always prefixed with ``'b'`` or ``'B'``; they produce an
instance of the :class:`bytes` type instead of the :class:`str` type. They
@@ -520,15 +521,6 @@ is more easily recognized as broken.) It is also important to note that the
escape sequences only recognized in string literals fall into the category of
unrecognized escapes for bytes literals.
-When an ``'r'`` or ``'R'`` prefix is used in a string literal, then the
-``\uXXXX`` and ``\UXXXXXXXX`` escape sequences are processed while *all other
-backslashes are left in the string*. For example, the string literal
-``r"\u0062\n"`` consists of three Unicode characters: 'LATIN SMALL LETTER B',
-'REVERSE SOLIDUS', and 'LATIN SMALL LETTER N'. Backslashes can be escaped with a
-preceding backslash; however, both remain in the string. As a result,
-``\uXXXX`` escape sequences are only recognized when there is an odd number of
-backslashes.
-
Even in a raw string, string quotes can be escaped with a backslash, but the
backslash remains in the string; for example, ``r"\""`` is a valid string
literal consisting of two characters: a backslash and a double quote; ``r"\"``
diff --git a/Doc/whatsnew/3.0.rst b/Doc/whatsnew/3.0.rst
index 7f8ba47..11b56cc 100644
--- a/Doc/whatsnew/3.0.rst
+++ b/Doc/whatsnew/3.0.rst
@@ -167,6 +167,9 @@ Strings and Bytes
explicitly convert between them, using the :meth:`str.encode` (str -> bytes)
or :meth:`bytes.decode` (bytes -> str) methods.
+* All backslashes in raw strings are interpreted literally. This means that
+ Unicode escapes are not treated specially.
+
.. XXX add bytearray
* PEP 3112: Bytes literals, e.g. ``b"abc"``, create :class:`bytes` instances.
@@ -183,6 +186,8 @@ Strings and Bytes
* The :mod:`StringIO` and :mod:`cStringIO` modules are gone. Instead, import
:class:`io.StringIO` or :class:`io.BytesIO`.
+* ``'\U'`` and ``'\u'`` escapes in raw strings are not treated specially.
+
PEP 3101: A New Approach to String Formatting
=============================================
diff --git a/Misc/NEWS b/Misc/NEWS
index 5c3b875..10640a1 100644
--- a/Misc/NEWS
+++ b/Misc/NEWS
@@ -26,6 +26,9 @@ Core and Builtins
through as unmodified as possible; as a consequence, the C API
related to command line arguments was changed to use wchar_t.
+- All backslashes in raw strings are interpreted literally. This means that
+ '\u' and '\U' escapes are not treated specially.
+
Extension Modules
-----------------