summaryrefslogtreecommitdiffstats
path: root/Doc/howto/unicode.rst
diff options
context:
space:
mode:
Diffstat (limited to 'Doc/howto/unicode.rst')
-rw-r--r--Doc/howto/unicode.rst14
1 files changed, 10 insertions, 4 deletions
diff --git a/Doc/howto/unicode.rst b/Doc/howto/unicode.rst
index b49ac39..ee31a9c 100644
--- a/Doc/howto/unicode.rst
+++ b/Doc/howto/unicode.rst
@@ -280,8 +280,9 @@ and optionally an *errors* argument.
The *errors* argument specifies the response when the input string can't be
converted according to the encoding's rules. Legal values for this argument are
``'strict'`` (raise a :exc:`UnicodeDecodeError` exception), ``'replace'`` (use
-``U+FFFD``, ``REPLACEMENT CHARACTER``), or ``'ignore'`` (just leave the
-character out of the Unicode result).
+``U+FFFD``, ``REPLACEMENT CHARACTER``), ``'ignore'`` (just leave the
+character out of the Unicode result), or ``'backslashreplace'`` (inserts a
+``\xNN`` escape sequence).
The following examples show the differences::
>>> b'\x80abc'.decode("utf-8", "strict") #doctest: +NORMALIZE_WHITESPACE
@@ -291,6 +292,8 @@ The following examples show the differences::
invalid start byte
>>> b'\x80abc'.decode("utf-8", "replace")
'\ufffdabc'
+ >>> b'\x80abc'.decode("utf-8", "backslashreplace")
+ '\\x80abc'
>>> b'\x80abc'.decode("utf-8", "ignore")
'abc'
@@ -325,8 +328,9 @@ The *errors* parameter is the same as the parameter of the
:meth:`~bytes.decode` method but supports a few more possible handlers. As well as
``'strict'``, ``'ignore'``, and ``'replace'`` (which in this case
inserts a question mark instead of the unencodable character), there is
-also ``'xmlcharrefreplace'`` (inserts an XML character reference) and
-``backslashreplace`` (inserts a ``\uNNNN`` escape sequence).
+also ``'xmlcharrefreplace'`` (inserts an XML character reference),
+``backslashreplace`` (inserts a ``\uNNNN`` escape sequence) and
+``namereplace`` (inserts a ``\N{...}`` escape sequence).
The following example shows the different results::
@@ -346,6 +350,8 @@ The following example shows the different results::
b'ꀀabcd޴'
>>> u.encode('ascii', 'backslashreplace')
b'\\ua000abcd\\u07b4'
+ >>> u.encode('ascii', 'namereplace')
+ b'\\N{YI SYLLABLE IT}abcd\\u07b4'
The low-level routines for registering and accessing the available
encodings are found in the :mod:`codecs` module. Implementing new