diff options
author | Serhiy Storchaka <storchaka@gmail.com> | 2013-11-19 09:32:41 (GMT) |
---|---|---|
committer | Serhiy Storchaka <storchaka@gmail.com> | 2013-11-19 09:32:41 (GMT) |
commit | 58cf607d13c178f41aed05458296b68e985c5fff (patch) | |
tree | d9a39a30200eef16fec17f0ed934186e8e864149 /Doc/library/codecs.rst | |
parent | a938bcfe952975cd117994acfef3712d61221f20 (diff) | |
download | cpython-58cf607d13c178f41aed05458296b68e985c5fff.zip cpython-58cf607d13c178f41aed05458296b68e985c5fff.tar.gz cpython-58cf607d13c178f41aed05458296b68e985c5fff.tar.bz2 |
Issue #12892: The utf-16* and utf-32* codecs now reject (lone) surrogates.
The utf-16* and utf-32* encoders no longer allow surrogate code points
(U+D800-U+DFFF) to be encoded.
The utf-32* decoders no longer decode byte sequences that correspond to
surrogate code points.
The surrogatepass error handler now works with the utf-16* and utf-32* codecs.
Based on patches by Victor Stinner and Kang-Hao (Kenny) Lu.
Diffstat (limited to 'Doc/library/codecs.rst')
-rw-r--r-- | Doc/library/codecs.rst | 25 |
1 files changed, 18 insertions, 7 deletions
diff --git a/Doc/library/codecs.rst b/Doc/library/codecs.rst index 0925e82..358fde7 100644 --- a/Doc/library/codecs.rst +++ b/Doc/library/codecs.rst @@ -365,18 +365,23 @@ and implemented by all standard Python codecs: | | in :pep:`383`. | +-------------------------+-----------------------------------------------+ -In addition, the following error handlers are specific to a single codec: +In addition, the following error handlers are specific to Unicode encoding +schemes: -+-------------------+---------+-------------------------------------------+ -| Value | Codec | Meaning | -+===================+=========+===========================================+ -|``'surrogatepass'``| utf-8 | Allow encoding and decoding of surrogate | -| | | codes in UTF-8. | -+-------------------+---------+-------------------------------------------+ ++-------------------+------------------------+-------------------------------------------+ +| Value | Codec | Meaning | ++===================+========================+===========================================+ +|``'surrogatepass'``| utf-8, utf-16, utf-32, | Allow encoding and decoding of surrogate | +| | utf-16-be, utf-16-le, | codes in all the Unicode encoding schemes.| +| | utf-32-be, utf-32-le | | ++-------------------+------------------------+-------------------------------------------+ .. versionadded:: 3.1 The ``'surrogateescape'`` and ``'surrogatepass'`` error handlers. +.. versionchanged:: 3.4 + The ``'surrogatepass'`` error handlers now works with utf-16\* and utf-32\* codecs. + The set of allowed values can be extended via :meth:`register_error`. @@ -1167,6 +1172,12 @@ particular, the following variants typically exist: | utf_8_sig | | all languages | +-----------------+--------------------------------+--------------------------------+ +.. versionchanged:: 3.4 + The utf-16\* and utf-32\* encoders no longer allow surrogate code points + (U+D800--U+DFFF) to be encoded. The utf-32\* decoders no longer decode + byte sequences that correspond to surrogate code points. + + Python Specific Encodings ------------------------- |