diff options
author | Andrew Kuchling <amk@amk.ca> | 2013-06-16 16:58:48 (GMT) |
---|---|---|
committer | Andrew Kuchling <amk@amk.ca> | 2013-06-16 16:58:48 (GMT) |
commit | c7b6c50f29fac4971e7271ac649ee3b7ef3deac7 (patch) | |
tree | 84ce5a8e35e7d3da929f85c9f025beb2ddc05d7a /Doc/library | |
parent | 893f2ffc7c8a110f069bb05c66e60632cc49cbef (diff) | |
download | cpython-c7b6c50f29fac4971e7271ac649ee3b7ef3deac7.zip cpython-c7b6c50f29fac4971e7271ac649ee3b7ef3deac7.tar.gz cpython-c7b6c50f29fac4971e7271ac649ee3b7ef3deac7.tar.bz2 |
Describe 'surrogateescape' in the documentation.
Also, improve some docstring descriptions of the 'errors' parameter.
Closes #14015.
Diffstat (limited to 'Doc/library')
-rw-r--r-- | Doc/library/codecs.rst | 6 | ||||
-rw-r--r-- | Doc/library/functions.rst | 40 |
2 files changed, 35 insertions, 11 deletions
diff --git a/Doc/library/codecs.rst b/Doc/library/codecs.rst index 0d38253..e80fc3a 100644 --- a/Doc/library/codecs.rst +++ b/Doc/library/codecs.rst @@ -78,7 +78,11 @@ It defines the following functions: reference (for encoding only) * ``'backslashreplace'``: replace with backslashed escape sequences (for encoding only) - * ``'surrogateescape'``: replace with surrogate U+DCxx, see :pep:`383` + * ``'surrogateescape'``: on decoding, replace with code points in the Unicode + Private Use Area ranging from U+DC80 to U+DCFF. These private code + points will then be turned back into the same bytes when the + ``surrogateescape`` error handler is used when encoding the data. + (See :pep:`383` for more.) as well as any other error handling name defined via :func:`register_error`. diff --git a/Doc/library/functions.rst b/Doc/library/functions.rst index 3059e17..04fb95e 100644 --- a/Doc/library/functions.rst +++ b/Doc/library/functions.rst @@ -895,16 +895,36 @@ are always available. They are listed here in alphabetical order. the list of supported encodings. *errors* is an optional string that specifies how encoding and decoding - errors are to be handled--this cannot be used in binary mode. Pass - ``'strict'`` to raise a :exc:`ValueError` exception if there is an encoding - error (the default of ``None`` has the same effect), or pass ``'ignore'`` to - ignore errors. (Note that ignoring encoding errors can lead to data loss.) - ``'replace'`` causes a replacement marker (such as ``'?'``) to be inserted - where there is malformed data. When writing, ``'xmlcharrefreplace'`` - (replace with the appropriate XML character reference) or - ``'backslashreplace'`` (replace with backslashed escape sequences) can be - used. Any other error handling name that has been registered with - :func:`codecs.register_error` is also valid. + errors are to be handled--this cannot be used in binary mode. + A variety of standard error handlers are available, though any + error handling name that has been registered with + :func:`codecs.register_error` is also valid. The standard names + are: + + * ``'strict'`` to raise a :exc:`ValueError` exception if there is + an encoding error. The default value of ``None`` has the same + effect. + + * ``'ignore'`` ignores errors. Note that ignoring encoding errors + can lead to data loss. + + * ``'replace'`` causes a replacement marker (such as ``'?'``) to be inserted + where there is malformed data. + + * ``'surrogateescape'`` will represent any incorrect bytes as code + points in the Unicode Private Use Area ranging from U+DC80 to + U+DCFF. These private code points will then be turned back into + the same bytes when the ``surrogateescape`` error handler is used + when writing data. This is useful for processing files in an + unknown encoding. + + * ``'xmlcharrefreplace'`` is only supported when writing to a file. + Characters not supported by the encoding are replaced with the + appropriate XML character reference ``&#nnn;``. + + * ``'backslashreplace'`` (also only supported when writing) + replaces unsupported characters with Python's backslashed escape + sequences. .. index:: single: universal newlines; open() built-in function |