Backport r57105 and r57145 from the py3k branch: UTF-32 codecs.

author: Walter Dörwald <walter@livinglogic.de> 2007-08-17 16:41:28 (GMT)
committer: Walter Dörwald <walter@livinglogic.de> 2007-08-17 16:41:28 (GMT)
commit: 6e390806495cf30c836615996b94e5ffa258cbef (patch)
tree: eef913ca3061a114ff6d301a042408d4d3243ecc /Doc
parent: 437e6a3b1588ece44abbb4d65f74f9a841638e1d (diff)
download: cpython-6e390806495cf30c836615996b94e5ffa258cbef.zip
cpython-6e390806495cf30c836615996b94e5ffa258cbef.tar.gz
cpython-6e390806495cf30c836615996b94e5ffa258cbef.tar.bz2
2 files changed, 79 insertions, 0 deletions
diff --git a/Doc/c-api/concrete.rst b/Doc/c-api/concrete.rst
index 2bc11fa..2707d81 100644
--- a/Doc/c-api/concrete.rst
+++ b/Doc/c-api/concrete.rst
@@ -1301,6 +1301,79 @@ These are the UTF-8 codec APIs:
    object.  Error handling is "strict".  Return *NULL* if an exception was raised
    by the codec.
 
+These are the UTF-32 codec APIs:
+
+.. % --- UTF-32 Codecs ------------------------------------------------------ */
+
+
+.. cfunction:: PyObject* PyUnicode_DecodeUTF32(const char *s, Py_ssize_t size, const char *errors, int *byteorder)
+
+   Decode *length* bytes from a UTF-32 encoded buffer string and return the
+   corresponding Unicode object.  *errors* (if non-*NULL*) defines the error
+   handling. It defaults to "strict".
+
+   If *byteorder* is non-*NULL*, the decoder starts decoding using the given byte
+   order::
+
+      *byteorder == -1: little endian
+      *byteorder == 0:  native order
+      *byteorder == 1:  big endian
+
+   and then switches if the first four bytes of the input data are a byte order mark
+   (BOM) and the specified byte order is native order.  This BOM is not copied into
+   the resulting Unicode string.  After completion, *\*byteorder* is set to the
+   current byte order at the end of input data.
+
+   In a narrow build codepoints outside the BMP will be decoded as surrogate pairs.
+
+   If *byteorder* is *NULL*, the codec starts in native order mode.
+
+   Return *NULL* if an exception was raised by the codec.
+
+   .. versionadded:: 2.6
+
+
+.. cfunction:: PyObject* PyUnicode_DecodeUTF32Stateful(const char *s, Py_ssize_t size, const char *errors, int *byteorder, Py_ssize_t *consumed)
+
+   If *consumed* is *NULL*, behave like :cfunc:`PyUnicode_DecodeUTF32`. If
+   *consumed* is not *NULL*, :cfunc:`PyUnicode_DecodeUTF32Stateful` will not treat
+   trailing incomplete UTF-32 byte sequences (such as a number of bytes not divisible
+   by four) as an error. Those bytes will not be decoded and the number of bytes
+   that have been decoded will be stored in *consumed*.
+
+   .. versionadded:: 2.6
+
+
+.. cfunction:: PyObject* PyUnicode_EncodeUTF32(const Py_UNICODE *s, Py_ssize_t size, const char *errors, int byteorder)
+
+   Return a Python bytes object holding the UTF-32 encoded value of the Unicode
+   data in *s*.  If *byteorder* is not ``0``, output is written according to the
+   following byte order::
+
+      byteorder == -1: little endian
+      byteorder == 0:  native byte order (writes a BOM mark)
+      byteorder == 1:  big endian
+
+   If byteorder is ``0``, the output string will always start with the Unicode BOM
+   mark (U+FEFF). In the other two modes, no BOM mark is prepended.
+
+   If *Py_UNICODE_WIDE* is not defined, surrogate pairs will be output
+   as a single codepoint.
+
+   Return *NULL* if an exception was raised by the codec.
+
+   .. versionadded:: 2.6
+
+
+.. cfunction:: PyObject* PyUnicode_AsUTF32String(PyObject *unicode)
+
+   Return a Python string using the UTF-32 encoding in native byte order. The
+   string always starts with a BOM mark.  Error handling is "strict".  Return
+   *NULL* if an exception was raised by the codec.
+
+   .. versionadded:: 2.6
+
+
 These are the UTF-16 codec APIs:
 
 .. % --- UTF-16 Codecs ------------------------------------------------------ */
diff --git a/Doc/library/codecs.rst b/Doc/library/codecs.rst
index e86999e..867fbae 100644
--- a/Doc/library/codecs.rst
+++ b/Doc/library/codecs.rst
@@ -1045,6 +1045,12 @@ particular, the following variants typically exist:
 | shift_jisx0213  | shiftjisx0213, sjisx0213,      | Japanese                       |
 |                 | s_jisx0213                     |                                |
 +-----------------+--------------------------------+--------------------------------+
+| utf_32          | U32, utf32                     | all languages                  |
++-----------------+--------------------------------+--------------------------------+
+| utf_32_be       | UTF-32BE                       | all languages                  |
++-----------------+--------------------------------+--------------------------------+
+| utf_32_le       | UTF-32LE                       | all languages                  |
++-----------------+--------------------------------+--------------------------------+
 | utf_16          | U16, utf16                     | all languages                  |
 +-----------------+--------------------------------+--------------------------------+
 | utf_16_be       | UTF-16BE                       | all languages (BMP only)       |
author	Walter Dörwald <walter@livinglogic.de>	2007-08-17 16:41:28 (GMT)
committer	Walter Dörwald <walter@livinglogic.de>	2007-08-17 16:41:28 (GMT)
commit	6e390806495cf30c836615996b94e5ffa258cbef (patch)
tree	eef913ca3061a114ff6d301a042408d4d3243ecc /Doc
parent	437e6a3b1588ece44abbb4d65f74f9a841638e1d (diff)
download	cpython-6e390806495cf30c836615996b94e5ffa258cbef.zip cpython-6e390806495cf30c836615996b94e5ffa258cbef.tar.gz cpython-6e390806495cf30c836615996b94e5ffa258cbef.tar.bz2