summaryrefslogtreecommitdiffstats
path: root/Doc/library/codecs.rst
diff options
context:
space:
mode:
authorSerhiy Storchaka <storchaka@gmail.com>2013-11-19 09:32:41 (GMT)
committerSerhiy Storchaka <storchaka@gmail.com>2013-11-19 09:32:41 (GMT)
commit58cf607d13c178f41aed05458296b68e985c5fff (patch)
treed9a39a30200eef16fec17f0ed934186e8e864149 /Doc/library/codecs.rst
parenta938bcfe952975cd117994acfef3712d61221f20 (diff)
downloadcpython-58cf607d13c178f41aed05458296b68e985c5fff.zip
cpython-58cf607d13c178f41aed05458296b68e985c5fff.tar.gz
cpython-58cf607d13c178f41aed05458296b68e985c5fff.tar.bz2
Issue #12892: The utf-16* and utf-32* codecs now reject (lone) surrogates.
The utf-16* and utf-32* encoders no longer allow surrogate code points (U+D800-U+DFFF) to be encoded. The utf-32* decoders no longer decode byte sequences that correspond to surrogate code points. The surrogatepass error handler now works with the utf-16* and utf-32* codecs. Based on patches by Victor Stinner and Kang-Hao (Kenny) Lu.
Diffstat (limited to 'Doc/library/codecs.rst')
-rw-r--r--Doc/library/codecs.rst25
1 files changed, 18 insertions, 7 deletions
diff --git a/Doc/library/codecs.rst b/Doc/library/codecs.rst
index 0925e82..358fde7 100644
--- a/Doc/library/codecs.rst
+++ b/Doc/library/codecs.rst
@@ -365,18 +365,23 @@ and implemented by all standard Python codecs:
| | in :pep:`383`. |
+-------------------------+-----------------------------------------------+
-In addition, the following error handlers are specific to a single codec:
+In addition, the following error handlers are specific to Unicode encoding
+schemes:
-+-------------------+---------+-------------------------------------------+
-| Value | Codec | Meaning |
-+===================+=========+===========================================+
-|``'surrogatepass'``| utf-8 | Allow encoding and decoding of surrogate |
-| | | codes in UTF-8. |
-+-------------------+---------+-------------------------------------------+
++-------------------+------------------------+-------------------------------------------+
+| Value | Codec | Meaning |
++===================+========================+===========================================+
+|``'surrogatepass'``| utf-8, utf-16, utf-32, | Allow encoding and decoding of surrogate |
+| | utf-16-be, utf-16-le, | codes in all the Unicode encoding schemes.|
+| | utf-32-be, utf-32-le | |
++-------------------+------------------------+-------------------------------------------+
.. versionadded:: 3.1
The ``'surrogateescape'`` and ``'surrogatepass'`` error handlers.
+.. versionchanged:: 3.4
+ The ``'surrogatepass'`` error handlers now works with utf-16\* and utf-32\* codecs.
+
The set of allowed values can be extended via :meth:`register_error`.
@@ -1167,6 +1172,12 @@ particular, the following variants typically exist:
| utf_8_sig | | all languages |
+-----------------+--------------------------------+--------------------------------+
+.. versionchanged:: 3.4
+ The utf-16\* and utf-32\* encoders no longer allow surrogate code points
+ (U+D800--U+DFFF) to be encoded. The utf-32\* decoders no longer decode
+ byte sequences that correspond to surrogate code points.
+
+
Python Specific Encodings
-------------------------