1 files changed, 706 insertions, 170 deletions
diff --git a/Doc/c-api/unicode.rst b/Doc/c-api/unicode.rst
index d1b57d9..2ac51df 100644
--- a/Doc/c-api/unicode.rst
+++ b/Doc/c-api/unicode.rst
@@ -6,38 +6,72 @@ Unicode Objects and Codecs
 --------------------------
 
 .. sectionauthor:: Marc-André Lemburg <mal@lemburg.com>
+.. sectionauthor:: Georg Brandl <georg@python.org>
 
 Unicode Objects
 ^^^^^^^^^^^^^^^
 
+Since the implementation of :pep:`393` in Python 3.3, Unicode objects internally
+use a variety of representations, in order to allow handling the complete range
+of Unicode characters while staying memory efficient.  There are special cases
+for strings where all code points are below 128, 256, or 65536; otherwise, code
+points must be below 1114112 (which is the full Unicode range).
+
+:c:type:`Py_UNICODE*` and UTF-8 representations are created on demand and cached
+in the Unicode object.  The :c:type:`Py_UNICODE*` representation is deprecated
+and inefficient; it should be avoided in performance- or memory-sensitive
+situations.
+
+Due to the transition between the old APIs and the new APIs, unicode objects
+can internally be in two states depending on how they were created:
+
+* "canonical" unicode objects are all objects created by a non-deprecated
+  unicode API.  They use the most efficient representation allowed by the
+  implementation.
+
+* "legacy" unicode objects have been created through one of the deprecated
+  APIs (typically :c:func:`PyUnicode_FromUnicode`) and only bear the
+  :c:type:`Py_UNICODE*` representation; you will have to call
+  :c:func:`PyUnicode_READY` on them before calling any other API.
+
+
 Unicode Type
 """"""""""""
 
 These are the basic Unicode object types used for the Unicode implementation in
 Python:
 
+.. c:type:: Py_UCS4
+            Py_UCS2
+            Py_UCS1
+
+   These types are typedefs for unsigned integer types wide enough to contain
+   characters of 32 bits, 16 bits and 8 bits, respectively.  When dealing with
+   single Unicode characters, use :c:type:`Py_UCS4`.
+
+   .. versionadded:: 3.3
+
 
 .. c:type:: Py_UNICODE
 
-   This type represents the storage type which is used by Python internally as
-   basis for holding Unicode ordinals.  Python's default builds use a 16-bit type
-   for :c:type:`Py_UNICODE` and store Unicode values internally as UCS2. It is also
-   possible to build a UCS4 version of Python (most recent Linux distributions come
-   with UCS4 builds of Python). These builds then use a 32-bit type for
-   :c:type:`Py_UNICODE` and store Unicode data internally as UCS4. On platforms
-   where :c:type:`wchar_t` is available and compatible with the chosen Python
-   Unicode build variant, :c:type:`Py_UNICODE` is a typedef alias for
-   :c:type:`wchar_t` to enhance native platform compatibility. On all other
-   platforms, :c:type:`Py_UNICODE` is a typedef alias for either :c:type:`unsigned
-   short` (UCS2) or :c:type:`unsigned long` (UCS4).
+   This is a typedef of :c:type:`wchar_t`, which is a 16-bit type or 32-bit type
+   depending on the platform.
 
-Note that UCS2 and UCS4 Python builds are not binary compatible. Please keep
-this in mind when writing extensions or interfaces.
+   .. versionchanged:: 3.3
+      In previous versions, this was a 16-bit type or a 32-bit type depending on
+      whether you selected a "narrow" or "wide" Unicode version of Python at
+      build time.
 
 
-.. c:type:: PyUnicodeObject
+.. c:type:: PyASCIIObject
+            PyCompactUnicodeObject
+            PyUnicodeObject
 
-   This subtype of :c:type:`PyObject` represents a Python Unicode object.
+   These subtypes of :c:type:`PyObject` represent a Python Unicode object.  In
+   almost all cases, they shouldn't be used directly, since all API functions
+   that deal with Unicode objects take and return :c:type:`PyObject` pointers.
+
+   .. versionadded:: 3.3
 
 
 .. c:var:: PyTypeObject PyUnicode_Type
@@ -45,10 +79,10 @@ this in mind when writing extensions or interfaces.
    This instance of :c:type:`PyTypeObject` represents the Python Unicode type.  It
    is exposed to Python code as ``str``.
 
+
 The following APIs are really C macros and can be used to do fast checks and to
 access internal read-only data of Unicode objects:
 
-
 .. c:function:: int PyUnicode_Check(PyObject *o)
 
    Return true if the object *o* is a Unicode object or an instance of a Unicode
@@ -61,28 +95,106 @@ access internal read-only data of Unicode objects:
    subtype.
 
 
-.. c:function:: Py_ssize_t PyUnicode_GET_SIZE(PyObject *o)
+.. c:function:: int PyUnicode_READY(PyObject *o)
 
-   Return the size of the object.  *o* has to be a :c:type:`PyUnicodeObject` (not
-   checked).
+   Ensure the string object *o* is in the "canonical" representation.  This is
+   required before using any of the access macros described below.
 
+   .. XXX expand on when it is not required
 
-.. c:function:: Py_ssize_t PyUnicode_GET_DATA_SIZE(PyObject *o)
+   Returns 0 on success and -1 with an exception set on failure, which in
+   particular happens if memory allocation fails.
 
-   Return the size of the object's internal buffer in bytes.  *o* has to be a
-   :c:type:`PyUnicodeObject` (not checked).
+   .. versionadded:: 3.3
 
 
-.. c:function:: Py_UNICODE* PyUnicode_AS_UNICODE(PyObject *o)
+.. c:function:: Py_ssize_t PyUnicode_GET_LENGTH(PyObject *o)
+
+   Return the length of the Unicode string, in code points.  *o* has to be a
+   Unicode object in the "canonical" representation (not checked).
+
+   .. versionadded:: 3.3
+
+
+.. c:function:: Py_UCS1* PyUnicode_1BYTE_DATA(PyObject *o)
+                Py_UCS2* PyUnicode_2BYTE_DATA(PyObject *o)
+                Py_UCS4* PyUnicode_4BYTE_DATA(PyObject *o)
+
+   Return a pointer to the canonical representation cast to UCS1, UCS2 or UCS4
+   integer types for direct character access.  No checks are performed if the
+   canonical representation has the correct character size; use
+   :c:func:`PyUnicode_KIND` to select the right macro.  Make sure
+   :c:func:`PyUnicode_READY` has been called before accessing this.
+
+   .. versionadded:: 3.3
+
+
+.. c:macro:: PyUnicode_WCHAR_KIND
+             PyUnicode_1BYTE_KIND
+             PyUnicode_2BYTE_KIND
+             PyUnicode_4BYTE_KIND
+
+   Return values of the :c:func:`PyUnicode_KIND` macro.
+
+   .. versionadded:: 3.3
+
+
+.. c:function:: int PyUnicode_KIND(PyObject *o)
+
+   Return one of the PyUnicode kind constants (see above) that indicate how many
+   bytes per character this Unicode object uses to store its data.  *o* has to
+   be a Unicode object in the "canonical" representation (not checked).
+
+   .. XXX document "0" return value?
+
+   .. versionadded:: 3.3
+
+
+.. c:function:: void* PyUnicode_DATA(PyObject *o)
+
+   Return a void pointer to the raw unicode buffer.  *o* has to be a Unicode
+   object in the "canonical" representation (not checked).
+
+   .. versionadded:: 3.3
+
+
+.. c:function:: void PyUnicode_WRITE(int kind, void *data, Py_ssize_t index, \
+                                     Py_UCS4 value)
+
+   Write into a canonical representation *data* (as obtained with
+   :c:func:`PyUnicode_DATA`).  This macro does not do any sanity checks and is
+   intended for usage in loops.  The caller should cache the *kind* value and
+   *data* pointer as obtained from other macro calls.  *index* is the index in
+   the string (starts at 0) and *value* is the new code point value which should
+   be written to that location.
+
+   .. versionadded:: 3.3
+
+
+.. c:function:: Py_UCS4 PyUnicode_READ(int kind, void *data, Py_ssize_t index)
+
+   Read a code point from a canonical representation *data* (as obtained with
+   :c:func:`PyUnicode_DATA`).  No checks or ready calls are performed.
+
+   .. versionadded:: 3.3
+
+
+.. c:function:: Py_UCS4 PyUnicode_READ_CHAR(PyObject *o, Py_ssize_t index)
+
+   Read a character from a Unicode object *o*, which must be in the "canonical"
+   representation.  This is less efficient than :c:func:`PyUnicode_READ` if you
+   do multiple consecutive reads.
+
+   .. versionadded:: 3.3
 
-   Return a pointer to the internal :c:type:`Py_UNICODE` buffer of the object.  *o*
-   has to be a :c:type:`PyUnicodeObject` (not checked).
 
+.. c:function:: PyUnicode_MAX_CHAR_VALUE(PyObject *o)
 
-.. c:function:: const char* PyUnicode_AS_DATA(PyObject *o)
+   Return the maximum code point that is suitable for creating another string
+   based on *o*, which must be in the "canonical" representation.  This is
+   always an approximation but more efficient than iterating over the string.
 
-   Return a pointer to the internal buffer of the object. *o* has to be a
-   :c:type:`PyUnicodeObject` (not checked).
+   .. versionadded:: 3.3
 
 
 .. c:function:: int PyUnicode_ClearFreeList()
@@ -90,6 +202,46 @@ access internal read-only data of Unicode objects:
    Clear the free list. Return the total number of freed items.
 
 
+.. c:function:: Py_ssize_t PyUnicode_GET_SIZE(PyObject *o)
+
+   Return the size of the deprecated :c:type:`Py_UNICODE` representation, in
+   code units (this includes surrogate pairs as 2 units).  *o* has to be a
+   Unicode object (not checked).
+
+   .. deprecated-removed:: 3.3 4.0
+      Part of the old-style Unicode API, please migrate to using
+      :c:func:`PyUnicode_GET_LENGTH`.
+
+
+.. c:function:: Py_ssize_t PyUnicode_GET_DATA_SIZE(PyObject *o)
+
+   Return the size of the deprecated :c:type:`Py_UNICODE` representation in
+   bytes.  *o* has to be a Unicode object (not checked).
+
+   .. deprecated-removed:: 3.3 4.0
+      Part of the old-style Unicode API, please migrate to using
+      :c:func:`PyUnicode_GET_LENGTH`.
+
+
+.. c:function:: Py_UNICODE* PyUnicode_AS_UNICODE(PyObject *o)
+                const char* PyUnicode_AS_DATA(PyObject *o)
+
+   Return a pointer to a :c:type:`Py_UNICODE` representation of the object.  The
+   ``AS_DATA`` form casts the pointer to :c:type:`const char *`.  *o* has to be
+   a Unicode object (not checked).
+
+   .. versionchanged:: 3.3
+      This macro is now inefficient -- because in many cases the
+      :c:type:`Py_UNICODE` representation does not exist and needs to be created
+      -- and can fail (return *NULL* with an exception set).  Try to port the
+      code to use the new :c:func:`PyUnicode_nBYTE_DATA` macros or use
+      :c:func:`PyUnicode_WRITE` or :c:func:`PyUnicode_READ`.
+
+   .. deprecated-removed:: 3.3 4.0
+      Part of the old-style Unicode API, please migrate to using the
+      :c:func:`PyUnicode_nBYTE_DATA` family of macros.
+
+
 Unicode Character Properties
 """"""""""""""""""""""""""""
 
@@ -166,16 +318,25 @@ These APIs can be used for fast direct character conversions:
 
    Return the character *ch* converted to lower case.
 
+   .. deprecated:: 3.3
+      This function uses simple case mappings.
+
 
 .. c:function:: Py_UNICODE Py_UNICODE_TOUPPER(Py_UNICODE ch)
 
    Return the character *ch* converted to upper case.
 
+   .. deprecated:: 3.3
+      This function uses simple case mappings.
+
 
 .. c:function:: Py_UNICODE Py_UNICODE_TOTITLE(Py_UNICODE ch)
 
    Return the character *ch* converted to title case.
 
+   .. deprecated:: 3.3
+      This function uses simple case mappings.
+
 
 .. c:function:: int Py_UNICODE_TODECIMAL(Py_UNICODE ch)
 
@@ -195,31 +356,66 @@ These APIs can be used for fast direct character conversions:
    possible.  This macro does not raise exceptions.
 
 
-Plain Py_UNICODE
-""""""""""""""""
+These APIs can be used to work with surrogates:
+
+.. c:macro:: Py_UNICODE_IS_SURROGATE(ch)
+
+   Check if *ch* is a surrogate (``0xD800 <= ch <= 0xDFFF``).
+
+.. c:macro:: Py_UNICODE_IS_HIGH_SURROGATE(ch)
+
+   Check if *ch* is an high surrogate (``0xD800 <= ch <= 0xDBFF``).
+
+.. c:macro:: Py_UNICODE_IS_LOW_SURROGATE(ch)
+
+   Check if *ch* is a low surrogate (``0xDC00 <= ch <= 0xDFFF``).
+
+.. c:macro:: Py_UNICODE_JOIN_SURROGATES(high, low)
+
+   Join two surrogate characters and return a single Py_UCS4 value.
+   *high* and *low* are respectively the leading and trailing surrogates in a
+   surrogate pair.
+
+
+Creating and accessing Unicode strings
+""""""""""""""""""""""""""""""""""""""
 
 To create Unicode objects and access their basic sequence properties, use these
 APIs:
 
+.. c:function:: PyObject* PyUnicode_New(Py_ssize_t size, Py_UCS4 maxchar)
 
-.. c:function:: PyObject* PyUnicode_FromUnicode(const Py_UNICODE *u, Py_ssize_t size)
+   Create a new Unicode object.  *maxchar* should be the true maximum code point
+   to be placed in the string.  As an approximation, it can be rounded up to the
+   nearest value in the sequence 127, 255, 65535, 1114111.
 
-   Create a Unicode object from the Py_UNICODE buffer *u* of the given size. *u*
-   may be *NULL* which causes the contents to be undefined. It is the user's
-   responsibility to fill in the needed data.  The buffer is copied into the new
-   object. If the buffer is not *NULL*, the return value might be a shared object.
-   Therefore, modification of the resulting Unicode object is only allowed when *u*
-   is *NULL*.
+   This is the recommended way to allocate a new Unicode object.  Objects
+   created using this function are not resizable.
+
+   .. versionadded:: 3.3
+
+
+.. c:function:: PyObject* PyUnicode_FromKindAndData(int kind, const void *buffer, \
+                                                    Py_ssize_t size)
+
+   Create a new Unicode object with the given *kind* (possible values are
+   :c:macro:`PyUnicode_1BYTE_KIND` etc., as returned by
+   :c:func:`PyUnicode_KIND`).  The *buffer* must point to an array of *size*
+   units of 1, 2 or 4 bytes per character, as given by the kind.
+
+   .. versionadded:: 3.3
 
 
 .. c:function:: PyObject* PyUnicode_FromStringAndSize(const char *u, Py_ssize_t size)
 
-   Create a Unicode object from the char buffer *u*.  The bytes will be interpreted
-   as being UTF-8 encoded.  *u* may also be *NULL* which
-   causes the contents to be undefined. It is the user's responsibility to fill in
-   the needed data.  The buffer is copied into the new object. If the buffer is not
-   *NULL*, the return value might be a shared object. Therefore, modification of
-   the resulting Unicode object is only allowed when *u* is *NULL*.
+   Create a Unicode object from the char buffer *u*.  The bytes will be
+   interpreted as being UTF-8 encoded.  The buffer is copied into the new
+   object. If the buffer is not *NULL*, the return value might be a shared
+   object, i.e. modification of the data is not allowed.
+
+   If *u* is *NULL*, this function behaves like :c:func:`PyUnicode_FromUnicode`
+   with the buffer set to *NULL*.  This usage is deprecated in favor of
+   :c:func:`PyUnicode_New`.
 
 
 .. c:function:: PyObject *PyUnicode_FromString(const char *u)
@@ -260,18 +456,27 @@ APIs:
    | :attr:`%ld`       | long                | Exactly equivalent to          |
    |                   |                     | ``printf("%ld")``.             |
    +-------------------+---------------------+--------------------------------+
+   | :attr:`%li`       | long                | Exactly equivalent to          |
+   |                   |                     | ``printf("%li")``.             |
+   +-------------------+---------------------+--------------------------------+
    | :attr:`%lu`       | unsigned long       | Exactly equivalent to          |
    |                   |                     | ``printf("%lu")``.             |
    +-------------------+---------------------+--------------------------------+
    | :attr:`%lld`      | long long           | Exactly equivalent to          |
    |                   |                     | ``printf("%lld")``.            |
    +-------------------+---------------------+--------------------------------+
+   | :attr:`%lli`      | long long           | Exactly equivalent to          |
+   |                   |                     | ``printf("%lli")``.            |
+   +-------------------+---------------------+--------------------------------+
    | :attr:`%llu`      | unsigned long long  | Exactly equivalent to          |
    |                   |                     | ``printf("%llu")``.            |
    +-------------------+---------------------+--------------------------------+
    | :attr:`%zd`       | Py_ssize_t          | Exactly equivalent to          |
    |                   |                     | ``printf("%zd")``.             |
    +-------------------+---------------------+--------------------------------+
+   | :attr:`%zi`       | Py_ssize_t          | Exactly equivalent to          |
+   |                   |                     | ``printf("%zi")``.             |
+   +-------------------+---------------------+--------------------------------+
    | :attr:`%zu`       | size_t              | Exactly equivalent to          |
    |                   |                     | ``printf("%zu")``.             |
    +-------------------+---------------------+--------------------------------+
@@ -322,27 +527,178 @@ APIs:
    .. versionchanged:: 3.2
       Support for ``"%lld"`` and ``"%llu"`` added.
 
+   .. versionchanged:: 3.3
+      Support for ``"%li"``, ``"%lli"`` and ``"%zi"`` added.
+
 
 .. c:function:: PyObject* PyUnicode_FromFormatV(const char *format, va_list vargs)
 
    Identical to :c:func:`PyUnicode_FromFormat` except that it takes exactly two
    arguments.
 
+
+.. c:function:: PyObject* PyUnicode_FromEncodedObject(PyObject *obj, \
+                               const char *encoding, const char *errors)
+
+   Coerce an encoded object *obj* to an Unicode object and return a reference with
+   incremented refcount.
+
+   :class:`bytes`, :class:`bytearray` and other char buffer compatible objects
+   are decoded according to the given *encoding* and using the error handling
+   defined by *errors*. Both can be *NULL* to have the interface use the default
+   values (see the next section for details).
+
+   All other objects, including Unicode objects, cause a :exc:`TypeError` to be
+   set.
+
+   The API returns *NULL* if there was an error.  The caller is responsible for
+   decref'ing the returned objects.
+
+
+.. c:function:: Py_ssize_t PyUnicode_GetLength(PyObject *unicode)
+
+   Return the length of the Unicode object, in code points.
+
+   .. versionadded:: 3.3
+
+
+.. c:function:: int PyUnicode_CopyCharacters(PyObject *to, Py_ssize_t to_start, \
+                        PyObject *to, Py_ssize_t from_start, Py_ssize_t how_many)
+
+   Copy characters from one Unicode object into another.  This function performs
+   character conversion when necessary and falls back to :c:func:`memcpy` if
+   possible.  Returns ``-1`` and sets an exception on error, otherwise returns
+   ``0``.
+
+   .. versionadded:: 3.3
+
+
+.. c:function:: Py_ssize_t PyUnicode_Fill(PyObject *unicode, Py_ssize_t start, \
+                        Py_ssize_t length, Py_UCS4 fill_char)
+
+   Fill a string with a character: write *fill_char* into
+   ``unicode[start:start+length]``.
+
+   Fail if *fill_char* is bigger than the string maximum character, or if the
+   string has more than 1 reference.
+
+   Return the number of written character, or return ``-1`` and raise an
+   exception on error.
+
+   .. versionadded:: 3.3
+
+
+.. c:function:: int PyUnicode_WriteChar(PyObject *unicode, Py_ssize_t index, \
+                                        Py_UCS4 character)
+
+   Write a character to a string.  The string must have been created through
+   :c:func:`PyUnicode_New`.  Since Unicode strings are supposed to be immutable,
+   the string must not be shared, or have been hashed yet.
+
+   This function checks that *unicode* is a Unicode object, that the index is
+   not out of bounds, and that the object can be modified safely (i.e. that it
+   its reference count is one), in contrast to the macro version
+   :c:func:`PyUnicode_WRITE_CHAR`.
+
+   .. versionadded:: 3.3
+
+
+.. c:function:: Py_UCS4 PyUnicode_ReadChar(PyObject *unicode, Py_ssize_t index)
+
+   Read a character from a string.  This function checks that *unicode* is a
+   Unicode object and the index is not out of bounds, in contrast to the macro
+   version :c:func:`PyUnicode_READ_CHAR`.
+
+   .. versionadded:: 3.3
+
+
+.. c:function:: PyObject* PyUnicode_Substring(PyObject *str, Py_ssize_t start, \
+                                              Py_ssize_t end)
+
+   Return a substring of *str*, from character index *start* (included) to
+   character index *end* (excluded).  Negative indices are not supported.
+
+   .. versionadded:: 3.3
+
+
+.. c:function:: Py_UCS4* PyUnicode_AsUCS4(PyObject *u, Py_UCS4 *buffer, \
+                                          Py_ssize_t buflen, int copy_null)
+
+   Copy the string *u* into a UCS4 buffer, including a null character, if
+   *copy_null* is set.  Returns *NULL* and sets an exception on error (in
+   particular, a :exc:`ValueError` if *buflen* is smaller than the length of
+   *u*).  *buffer* is returned on success.
+
+   .. versionadded:: 3.3
+
+
+.. c:function:: Py_UCS4* PyUnicode_AsUCS4Copy(PyObject *u)
+
+   Copy the string *u* into a new UCS4 buffer that is allocated using
+   :c:func:`PyMem_Malloc`.  If this fails, *NULL* is returned with a
+   :exc:`MemoryError` set.
+
+   .. versionadded:: 3.3
+
+
+Deprecated Py_UNICODE APIs
+""""""""""""""""""""""""""
+
+.. deprecated-removed:: 3.3 4.0
+
+These API functions are deprecated with the implementation of :pep:`393`.
+Extension modules can continue using them, as they will not be removed in Python
+3.x, but need to be aware that their use can now cause performance and memory hits.
+
+
+.. c:function:: PyObject* PyUnicode_FromUnicode(const Py_UNICODE *u, Py_ssize_t size)
+
+   Create a Unicode object from the Py_UNICODE buffer *u* of the given size. *u*
+   may be *NULL* which causes the contents to be undefined. It is the user's
+   responsibility to fill in the needed data.  The buffer is copied into the new
+   object.
+
+   If the buffer is not *NULL*, the return value might be a shared object.
+   Therefore, modification of the resulting Unicode object is only allowed when
+   *u* is *NULL*.
+
+   If the buffer is *NULL*, :c:func:`PyUnicode_READY` must be called once the
+   string content has been filled before using any of the access macros such as
+   :c:func:`PyUnicode_KIND`.
+
+   Please migrate to using :c:func:`PyUnicode_FromKindAndData` or
+   :c:func:`PyUnicode_New`.
+
+
+.. c:function:: Py_UNICODE* PyUnicode_AsUnicode(PyObject *unicode)
+
+   Return a read-only pointer to the Unicode object's internal
+   :c:type:`Py_UNICODE` buffer, or *NULL* on error. This will create the
+   :c:type:`Py_UNICODE*` representation of the object if it is not yet
+   available. Note that the resulting :c:type:`Py_UNICODE` string may contain
+   embedded null characters, which would cause the string to be truncated when
+   used in most C functions.
+
+   Please migrate to using :c:func:`PyUnicode_AsUCS4`,
+   :c:func:`PyUnicode_Substring`, :c:func:`PyUnicode_ReadChar` or similar new
+   APIs.
+
+
 .. c:function:: PyObject* PyUnicode_TransformDecimalToASCII(Py_UNICODE *s, Py_ssize_t size)
 
    Create a Unicode object by replacing all decimal digits in
    :c:type:`Py_UNICODE` buffer of the given *size* by ASCII digits 0--9
-   according to their decimal value.  Return *NULL* if an exception
-   occurs.
+   according to their decimal value.  Return *NULL* if an exception occurs.
 
 
-.. c:function:: Py_UNICODE* PyUnicode_AsUnicode(PyObject *unicode)
+.. c:function:: Py_UNICODE* PyUnicode_AsUnicodeAndSize(PyObject *unicode, Py_ssize_t *size)
 
-   Return a read-only pointer to the Unicode object's internal
-   :c:type:`Py_UNICODE` buffer, *NULL* if *unicode* is not a Unicode object.
-   Note that the resulting :c:type:`Py_UNICODE*` string may contain embedded
-   null characters, which would cause the string to be truncated when used in
-   most C functions.
+   Like :c:func:`PyUnicode_AsUnicode`, but also saves the :c:func:`Py_UNICODE`
+   array length in *size*. Note that the resulting :c:type:`Py_UNICODE*` string
+   may contain embedded null characters, which would cause the string to be
+   truncated when used in most C functions.
+
+   .. versionadded:: 3.3
 
 
 .. c:function:: Py_UNICODE* PyUnicode_AsUnicodeCopy(PyObject *unicode)
@@ -350,44 +706,76 @@ APIs:
    Create a copy of a Unicode string ending with a nul character. Return *NULL*
    and raise a :exc:`MemoryError` exception on memory allocation failure,
    otherwise return a new allocated buffer (use :c:func:`PyMem_Free` to free
-   the buffer). Note that the resulting :c:type:`Py_UNICODE*` string may contain
-   embedded null characters, which would cause the string to be truncated when
-   used in most C functions.
+   the buffer). Note that the resulting :c:type:`Py_UNICODE*` string may
+   contain embedded null characters, which would cause the string to be
+   truncated when used in most C functions.
 
    .. versionadded:: 3.2
 
+   Please migrate to using :c:func:`PyUnicode_AsUCS4Copy` or similar new APIs.
+
 
 .. c:function:: Py_ssize_t PyUnicode_GetSize(PyObject *unicode)
 
-   Return the length of the Unicode object.
+   Return the size of the deprecated :c:type:`Py_UNICODE` representation, in
+   code units (this includes surrogate pairs as 2 units).
 
+   Please migrate to using :c:func:`PyUnicode_GetLength`.
 
-.. c:function:: PyObject* PyUnicode_FromEncodedObject(PyObject *obj, const char *encoding, const char *errors)
 
-   Coerce an encoded object *obj* to an Unicode object and return a reference with
-   incremented refcount.
+.. c:function:: PyObject* PyUnicode_FromObject(PyObject *obj)
 
-   :class:`bytes`, :class:`bytearray` and other char buffer compatible objects
-   are decoded according to the given *encoding* and using the error handling
-   defined by *errors*. Both can be *NULL* to have the interface use the default
-   values (see the next section for details).
+   Shortcut for ``PyUnicode_FromEncodedObject(obj, NULL, "strict")`` which is used
+   throughout the interpreter whenever coercion to Unicode is needed.
 
-   All other objects, including Unicode objects, cause a :exc:`TypeError` to be
-   set.
 
-   The API returns *NULL* if there was an error.  The caller is responsible for
-   decref'ing the returned objects.
+Locale Encoding
+"""""""""""""""
 
+The current locale encoding can be used to decode text from the operating
+system.
 
-.. c:function:: PyObject* PyUnicode_FromObject(PyObject *obj)
+.. c:function:: PyObject* PyUnicode_DecodeLocaleAndSize(const char *str, Py_ssize_t len, int surrogateescape)
+
+   Decode a string from the current locale encoding. The decoder is strict if
+   *surrogateescape* is equal to zero, otherwise it uses the
+   ``'surrogateescape'`` error handler (:pep:`383`) to escape undecodable
+   bytes. If a byte sequence can be decoded as a surrogate character and
+   *surrogateescape* is not equal to zero, the byte sequence is escaped using
+   the ``'surrogateescape'`` error handler instead of being decoded.  *str*
+   must end with a null character but cannot contain embedded null characters.
+
+   .. seealso::
+
+      Use :c:func:`PyUnicode_DecodeFSDefaultAndSize` to decode a string from
+      :c:data:`Py_FileSystemDefaultEncoding` (the locale encoding read at
+      Python startup).
+
+   .. versionadded:: 3.3
+
+
+.. c:function:: PyObject* PyUnicode_DecodeLocale(const char *str, int surrogateescape)
+
+   Similar to :c:func:`PyUnicode_DecodeLocaleAndSize`, but compute the string
+   length using :c:func:`strlen`.
+
+   .. versionadded:: 3.3
 
-   Shortcut for ``PyUnicode_FromEncodedObject(obj, NULL, "strict")`` which is used
-   throughout the interpreter whenever coercion to Unicode is needed.
 
-If the platform supports :c:type:`wchar_t` and provides a header file wchar.h,
-Python can interface directly to this type using the following functions.
-Support is optimized if Python's own :c:type:`Py_UNICODE` type is identical to
-the system's :c:type:`wchar_t`.
+.. c:function:: PyObject* PyUnicode_EncodeLocale(PyObject *unicode, int surrogateescape)
+
+   Encode a Unicode object to the current locale encoding. The encoder is
+   strict if *surrogateescape* is equal to zero, otherwise it uses the
+   ``'surrogateescape'`` error handler (:pep:`383`). Return a :class:`bytes`
+   object. *str* cannot contain embedded null characters.
+
+   .. seealso::
+
+      Use :c:func:`PyUnicode_EncodeFSDefault` to encode a string to
+      :c:data:`Py_FileSystemDefaultEncoding` (the locale encoding read at
+      Python startup).
+
+   .. versionadded:: 3.3
 
 
 File System Encoding
@@ -430,6 +818,13 @@ used, passing :c:func:`PyUnicode_FSDecoder` as the conversion function:
    If :c:data:`Py_FileSystemDefaultEncoding` is not set, fall back to the
    locale encoding.
 
+   .. seealso::
+
+      :c:data:`Py_FileSystemDefaultEncoding` is initialized at startup from the
+      locale encoding and cannot be modified later. If you need to decode a
+      string from the current locale encoding, use
+      :c:func:`PyUnicode_DecodeLocaleAndSize`.
+
    .. versionchanged:: 3.2
       Use ``'strict'`` error handler on Windows.
 
@@ -458,6 +853,13 @@ used, passing :c:func:`PyUnicode_FSDecoder` as the conversion function:
    If :c:data:`Py_FileSystemDefaultEncoding` is not set, fall back to the
    locale encoding.
 
+   .. seealso::
+
+      :c:data:`Py_FileSystemDefaultEncoding` is initialized at startup from the
+      locale encoding and cannot be modified later. If you need to encode a
+      string to the current locale encoding, use
+      :c:func:`PyUnicode_EncodeLocale`.
+
    .. versionadded:: 3.2
 
 
@@ -479,9 +881,9 @@ wchar_t Support
    Copy the Unicode object contents into the :c:type:`wchar_t` buffer *w*.  At most
    *size* :c:type:`wchar_t` characters are copied (excluding a possibly trailing
    0-termination character).  Return the number of :c:type:`wchar_t` characters
-   copied or -1 in case of an error.  Note that the resulting :c:type:`wchar_t`
+   copied or -1 in case of an error.  Note that the resulting :c:type:`wchar_t*`
    string may or may not be 0-terminated.  It is the responsibility of the caller
-   to make sure that the :c:type:`wchar_t` string is 0-terminated in case this is
+   to make sure that the :c:type:`wchar_t*` string is 0-terminated in case this is
    required by the application. Also, note that the :c:type:`wchar_t*` string
    might contain null characters, which would cause the string to be truncated
    when used with most C functions.
@@ -497,12 +899,32 @@ wchar_t Support
    Returns a buffer allocated by :c:func:`PyMem_Alloc` (use
    :c:func:`PyMem_Free` to free it) on success. On error, returns *NULL*,
    *\*size* is undefined and raises a :exc:`MemoryError`. Note that the
-   resulting :c:type:`wchar_t*` string might contain null characters, which
+   resulting :c:type:`wchar_t` string might contain null characters, which
    would cause the string to be truncated when used with most C functions.
 
    .. versionadded:: 3.2
 
 
+UCS4 Support
+""""""""""""
+
+.. versionadded:: 3.3
+
+.. XXX are these meant to be public?
+
+.. c:function:: size_t Py_UCS4_strlen(const Py_UCS4 *u)
+                Py_UCS4* Py_UCS4_strcpy(Py_UCS4 *s1, const Py_UCS4 *s2)
+                Py_UCS4* Py_UCS4_strncpy(Py_UCS4 *s1, const Py_UCS4 *s2, size_t n)
+                Py_UCS4* Py_UCS4_strcat(Py_UCS4 *s1, const Py_UCS4 *s2)
+                int Py_UCS4_strcmp(const Py_UCS4 *s1, const Py_UCS4 *s2)
+                int Py_UCS4_strncmp(const Py_UCS4 *s1, const Py_UCS4 *s2, size_t n)
+                Py_UCS4* Py_UCS4_strchr(const Py_UCS4 *s, Py_UCS4 c)
+                Py_UCS4* Py_UCS4_strrchr(const Py_UCS4 *s, Py_UCS4 c)
+
+   These utility functions work on strings of :c:type:`Py_UCS4` characters and
+   otherwise behave like the C standard library functions with the same name.
+
+
 .. _builtincodecs:
 
 Built-in Codecs
@@ -537,7 +959,8 @@ Generic Codecs
 These are the generic codec APIs:
 
 
-.. c:function:: PyObject* PyUnicode_Decode(const char *s, Py_ssize_t size, const char *encoding, const char *errors)
+.. c:function:: PyObject* PyUnicode_Decode(const char *s, Py_ssize_t size, \
+                              const char *encoding, const char *errors)
 
    Create a Unicode object by decoding *size* bytes of the encoded string *s*.
    *encoding* and *errors* have the same meaning as the parameters of the same name
@@ -546,7 +969,18 @@ These are the generic codec APIs:
    the codec.
 
 
-.. c:function:: PyObject* PyUnicode_Encode(const Py_UNICODE *s, Py_ssize_t size, const char *encoding, const char *errors)
+.. c:function:: PyObject* PyUnicode_AsEncodedString(PyObject *unicode, \
+                              const char *encoding, const char *errors)
+
+   Encode a Unicode object and return the result as Python bytes object.
+   *encoding* and *errors* have the same meaning as the parameters of the same
+   name in the Unicode :meth:`encode` method. The codec to be used is looked up
+   using the Python codec registry. Return *NULL* if an exception was raised by
+   the codec.
+
+
+.. c:function:: PyObject* PyUnicode_Encode(const Py_UNICODE *s, Py_ssize_t size, \
+                              const char *encoding, const char *errors)
 
    Encode the :c:type:`Py_UNICODE` buffer *s* of the given *size* and return a Python
    bytes object.  *encoding* and *errors* have the same meaning as the
@@ -554,14 +988,9 @@ These are the generic codec APIs:
    to be used is looked up using the Python codec registry.  Return *NULL* if an
    exception was raised by the codec.
 
-
-.. c:function:: PyObject* PyUnicode_AsEncodedString(PyObject *unicode, const char *encoding, const char *errors)
-
-   Encode a Unicode object and return the result as Python bytes object.
-   *encoding* and *errors* have the same meaning as the parameters of the same
-   name in the Unicode :meth:`encode` method. The codec to be used is looked up
-   using the Python codec registry. Return *NULL* if an exception was raised by
-   the codec.
+   .. deprecated-removed:: 3.3 4.0
+      Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
+      :c:func:`PyUnicode_AsEncodedString`.
 
 
 UTF-8 Codecs
@@ -576,7 +1005,8 @@ These are the UTF-8 codec APIs:
    *s*. Return *NULL* if an exception was raised by the codec.
 
 
-.. c:function:: PyObject* PyUnicode_DecodeUTF8Stateful(const char *s, Py_ssize_t size, const char *errors, Py_ssize_t *consumed)
+.. c:function:: PyObject* PyUnicode_DecodeUTF8Stateful(const char *s, Py_ssize_t size, \
+                              const char *errors, Py_ssize_t *consumed)
 
    If *consumed* is *NULL*, behave like :c:func:`PyUnicode_DecodeUTF8`. If
    *consumed* is not *NULL*, trailing incomplete UTF-8 byte sequences will not be
@@ -584,18 +1014,45 @@ These are the UTF-8 codec APIs:
    that have been decoded will be stored in *consumed*.
 
 
+.. c:function:: PyObject* PyUnicode_AsUTF8String(PyObject *unicode)
+
+   Encode a Unicode object using UTF-8 and return the result as Python bytes
+   object.  Error handling is "strict".  Return *NULL* if an exception was
+   raised by the codec.
+
+
+.. c:function:: char* PyUnicode_AsUTF8AndSize(PyObject *unicode, Py_ssize_t *size)
+
+   Return a pointer to the default encoding (UTF-8) of the Unicode object, and
+   store the size of the encoded representation (in bytes) in *size*.  *size*
+   can be *NULL*, in this case no size will be stored.
+
+   In the case of an error, *NULL* is returned with an exception set and no
+   *size* is stored.
+
+   This caches the UTF-8 representation of the string in the Unicode object, and
+   subsequent calls will return a pointer to the same buffer.  The caller is not
+   responsible for deallocating the buffer.
+
+   .. versionadded:: 3.3
+
+
+.. c:function:: char* PyUnicode_AsUTF8(PyObject *unicode)
+
+   As :c:func:`PyUnicode_AsUTF8AndSize`, but does not store the size.
+
+   .. versionadded:: 3.3
+
+
 .. c:function:: PyObject* PyUnicode_EncodeUTF8(const Py_UNICODE *s, Py_ssize_t size, const char *errors)
 
    Encode the :c:type:`Py_UNICODE` buffer *s* of the given *size* using UTF-8 and
    return a Python bytes object.  Return *NULL* if an exception was raised by
    the codec.
 
-
-.. c:function:: PyObject* PyUnicode_AsUTF8String(PyObject *unicode)
-
-   Encode a Unicode object using UTF-8 and return the result as Python bytes
-   object.  Error handling is "strict".  Return *NULL* if an exception was
-   raised by the codec.
+   .. deprecated-removed:: 3.3 4.0
+      Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
+      :c:func:`PyUnicode_AsUTF8String` or :c:func:`PyUnicode_AsUTF8AndSize`.
 
 
 UTF-32 Codecs
@@ -604,7 +1061,8 @@ UTF-32 Codecs
 These are the UTF-32 codec APIs:
 
 
-.. c:function:: PyObject* PyUnicode_DecodeUTF32(const char *s, Py_ssize_t size, const char *errors, int *byteorder)
+.. c:function:: PyObject* PyUnicode_DecodeUTF32(const char *s, Py_ssize_t size, \
+                              const char *errors, int *byteorder)
 
    Decode *size* bytes from a UTF-32 encoded buffer string and return the
    corresponding Unicode object.  *errors* (if non-*NULL*) defines the error
@@ -625,14 +1083,13 @@ These are the UTF-32 codec APIs:
    After completion, *\*byteorder* is set to the current byte order at the end
    of input data.
 
-   In a narrow build codepoints outside the BMP will be decoded as surrogate pairs.
-
    If *byteorder* is *NULL*, the codec starts in native order mode.
 
    Return *NULL* if an exception was raised by the codec.
 
 
-.. c:function:: PyObject* PyUnicode_DecodeUTF32Stateful(const char *s, Py_ssize_t size, const char *errors, int *byteorder, Py_ssize_t *consumed)
+.. c:function:: PyObject* PyUnicode_DecodeUTF32Stateful(const char *s, Py_ssize_t size, \
+                              const char *errors, int *byteorder, Py_ssize_t *consumed)
 
    If *consumed* is *NULL*, behave like :c:func:`PyUnicode_DecodeUTF32`. If
    *consumed* is not *NULL*, :c:func:`PyUnicode_DecodeUTF32Stateful` will not treat
@@ -641,7 +1098,15 @@ These are the UTF-32 codec APIs:
    that have been decoded will be stored in *consumed*.
 
 
-.. c:function:: PyObject* PyUnicode_EncodeUTF32(const Py_UNICODE *s, Py_ssize_t size, const char *errors, int byteorder)
+.. c:function:: PyObject* PyUnicode_AsUTF32String(PyObject *unicode)
+
+   Return a Python byte string using the UTF-32 encoding in native byte
+   order. The string always starts with a BOM mark.  Error handling is "strict".
+   Return *NULL* if an exception was raised by the codec.
+
+
+.. c:function:: PyObject* PyUnicode_EncodeUTF32(const Py_UNICODE *s, Py_ssize_t size, \
+                              const char *errors, int byteorder)
 
    Return a Python bytes object holding the UTF-32 encoded value of the Unicode
    data in *s*.  Output is written according to the following byte order::
@@ -658,12 +1123,9 @@ These are the UTF-32 codec APIs:
 
    Return *NULL* if an exception was raised by the codec.
 
-
-.. c:function:: PyObject* PyUnicode_AsUTF32String(PyObject *unicode)
-
-   Return a Python byte string using the UTF-32 encoding in native byte
-   order. The string always starts with a BOM mark.  Error handling is "strict".
-   Return *NULL* if an exception was raised by the codec.
+   .. deprecated-removed:: 3.3 4.0
+      Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
+      :c:func:`PyUnicode_AsUTF32String`.
 
 
 UTF-16 Codecs
@@ -672,7 +1134,8 @@ UTF-16 Codecs
 These are the UTF-16 codec APIs:
 
 
-.. c:function:: PyObject* PyUnicode_DecodeUTF16(const char *s, Py_ssize_t size, const char *errors, int *byteorder)
+.. c:function:: PyObject* PyUnicode_DecodeUTF16(const char *s, Py_ssize_t size, \
+                              const char *errors, int *byteorder)
 
    Decode *size* bytes from a UTF-16 encoded buffer string and return the
    corresponding Unicode object.  *errors* (if non-*NULL*) defines the error
@@ -699,7 +1162,8 @@ These are the UTF-16 codec APIs:
    Return *NULL* if an exception was raised by the codec.
 
 
-.. c:function:: PyObject* PyUnicode_DecodeUTF16Stateful(const char *s, Py_ssize_t size, const char *errors, int *byteorder, Py_ssize_t *consumed)
+.. c:function:: PyObject* PyUnicode_DecodeUTF16Stateful(const char *s, Py_ssize_t size, \
+                              const char *errors, int *byteorder, Py_ssize_t *consumed)
 
    If *consumed* is *NULL*, behave like :c:func:`PyUnicode_DecodeUTF16`. If
    *consumed* is not *NULL*, :c:func:`PyUnicode_DecodeUTF16Stateful` will not treat
@@ -708,7 +1172,15 @@ These are the UTF-16 codec APIs:
    number of bytes that have been decoded will be stored in *consumed*.
 
 
-.. c:function:: PyObject* PyUnicode_EncodeUTF16(const Py_UNICODE *s, Py_ssize_t size, const char *errors, int byteorder)
+.. c:function:: PyObject* PyUnicode_AsUTF16String(PyObject *unicode)
+
+   Return a Python byte string using the UTF-16 encoding in native byte
+   order. The string always starts with a BOM mark.  Error handling is "strict".
+   Return *NULL* if an exception was raised by the codec.
+
+
+.. c:function:: PyObject* PyUnicode_EncodeUTF16(const Py_UNICODE *s, Py_ssize_t size, \
+                              const char *errors, int byteorder)
 
    Return a Python bytes object holding the UTF-16 encoded value of the Unicode
    data in *s*.  Output is written according to the following byte order::
@@ -726,12 +1198,9 @@ These are the UTF-16 codec APIs:
 
    Return *NULL* if an exception was raised by the codec.
 
-
-.. c:function:: PyObject* PyUnicode_AsUTF16String(PyObject *unicode)
-
-   Return a Python byte string using the UTF-16 encoding in native byte
-   order. The string always starts with a BOM mark.  Error handling is "strict".
-   Return *NULL* if an exception was raised by the codec.
+   .. deprecated-removed:: 3.3 4.0
+      Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
+      :c:func:`PyUnicode_AsUTF16String`.
 
 
 UTF-7 Codecs
@@ -746,7 +1215,8 @@ These are the UTF-7 codec APIs:
    *s*.  Return *NULL* if an exception was raised by the codec.
 
 
-.. c:function:: PyObject* PyUnicode_DecodeUTF7Stateful(const char *s, Py_ssize_t size, const char *errors, Py_ssize_t *consumed)
+.. c:function:: PyObject* PyUnicode_DecodeUTF7Stateful(const char *s, Py_ssize_t size, \
+                              const char *errors, Py_ssize_t *consumed)
 
    If *consumed* is *NULL*, behave like :c:func:`PyUnicode_DecodeUTF7`.  If
    *consumed* is not *NULL*, trailing incomplete UTF-7 base-64 sections will not
@@ -754,7 +1224,8 @@ These are the UTF-7 codec APIs:
    bytes that have been decoded will be stored in *consumed*.
 
 
-.. c:function:: PyObject* PyUnicode_EncodeUTF7(const Py_UNICODE *s, Py_ssize_t size, int base64SetO, int base64WhiteSpace, const char *errors)
+.. c:function:: PyObject* PyUnicode_EncodeUTF7(const Py_UNICODE *s, Py_ssize_t size, \
+                              int base64SetO, int base64WhiteSpace, const char *errors)
 
    Encode the :c:type:`Py_UNICODE` buffer of the given size using UTF-7 and
    return a Python bytes object.  Return *NULL* if an exception was raised by
@@ -765,6 +1236,11 @@ These are the UTF-7 codec APIs:
    nonzero, whitespace will be encoded in base-64.  Both are set to zero for the
    Python "utf-7" codec.
 
+   .. deprecated-removed:: 3.3 4.0
+      Part of the old-style :c:type:`Py_UNICODE` API.
+
+   .. XXX replace with what?
+
 
 Unicode-Escape Codecs
 """""""""""""""""""""
@@ -772,24 +1248,29 @@ Unicode-Escape Codecs
 These are the "Unicode Escape" codec APIs:
 
 
-.. c:function:: PyObject* PyUnicode_DecodeUnicodeEscape(const char *s, Py_ssize_t size, const char *errors)
+.. c:function:: PyObject* PyUnicode_DecodeUnicodeEscape(const char *s, \
+                              Py_ssize_t size, const char *errors)
 
    Create a Unicode object by decoding *size* bytes of the Unicode-Escape encoded
    string *s*.  Return *NULL* if an exception was raised by the codec.
 
 
+.. c:function:: PyObject* PyUnicode_AsUnicodeEscapeString(PyObject *unicode)
+
+   Encode a Unicode object using Unicode-Escape and return the result as Python
+   string object.  Error handling is "strict". Return *NULL* if an exception was
+   raised by the codec.
+
+
 .. c:function:: PyObject* PyUnicode_EncodeUnicodeEscape(const Py_UNICODE *s, Py_ssize_t size)
 
    Encode the :c:type:`Py_UNICODE` buffer of the given *size* using Unicode-Escape and
    return a Python string object.  Return *NULL* if an exception was raised by the
    codec.
 
-
-.. c:function:: PyObject* PyUnicode_AsUnicodeEscapeString(PyObject *unicode)
-
-   Encode a Unicode object using Unicode-Escape and return the result as Python
-   string object.  Error handling is "strict". Return *NULL* if an exception was
-   raised by the codec.
+   .. deprecated-removed:: 3.3 4.0
+      Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
+      :c:func:`PyUnicode_AsUnicodeEscapeString`.
 
 
 Raw-Unicode-Escape Codecs
@@ -798,19 +1279,13 @@ Raw-Unicode-Escape Codecs
 These are the "Raw Unicode Escape" codec APIs:
 
 
-.. c:function:: PyObject* PyUnicode_DecodeRawUnicodeEscape(const char *s, Py_ssize_t size, const char *errors)
+.. c:function:: PyObject* PyUnicode_DecodeRawUnicodeEscape(const char *s, \
+                              Py_ssize_t size, const char *errors)
 
    Create a Unicode object by decoding *size* bytes of the Raw-Unicode-Escape
    encoded string *s*.  Return *NULL* if an exception was raised by the codec.
 
 
-.. c:function:: PyObject* PyUnicode_EncodeRawUnicodeEscape(const Py_UNICODE *s, Py_ssize_t size, const char *errors)
-
-   Encode the :c:type:`Py_UNICODE` buffer of the given *size* using Raw-Unicode-Escape
-   and return a Python string object.  Return *NULL* if an exception was raised by
-   the codec.
-
-
 .. c:function:: PyObject* PyUnicode_AsRawUnicodeEscapeString(PyObject *unicode)
 
    Encode a Unicode object using Raw-Unicode-Escape and return the result as
@@ -818,6 +1293,18 @@ These are the "Raw Unicode Escape" codec APIs:
    was raised by the codec.
 
 
+.. c:function:: PyObject* PyUnicode_EncodeRawUnicodeEscape(const Py_UNICODE *s, \
+                              Py_ssize_t size, const char *errors)
+
+   Encode the :c:type:`Py_UNICODE` buffer of the given *size* using Raw-Unicode-Escape
+   and return a Python string object.  Return *NULL* if an exception was raised by
+   the codec.
+
+   .. deprecated-removed:: 3.3 4.0
+      Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
+      :c:func:`PyUnicode_AsRawUnicodeEscapeString`.
+
+
 Latin-1 Codecs
 """"""""""""""
 
@@ -831,18 +1318,22 @@ ordinals and only these are accepted by the codecs during encoding.
    *s*.  Return *NULL* if an exception was raised by the codec.
 
 
+.. c:function:: PyObject* PyUnicode_AsLatin1String(PyObject *unicode)
+
+   Encode a Unicode object using Latin-1 and return the result as Python bytes
+   object.  Error handling is "strict".  Return *NULL* if an exception was
+   raised by the codec.
+
+
 .. c:function:: PyObject* PyUnicode_EncodeLatin1(const Py_UNICODE *s, Py_ssize_t size, const char *errors)
 
    Encode the :c:type:`Py_UNICODE` buffer of the given *size* using Latin-1 and
    return a Python bytes object.  Return *NULL* if an exception was raised by
    the codec.
 
-
-.. c:function:: PyObject* PyUnicode_AsLatin1String(PyObject *unicode)
-
-   Encode a Unicode object using Latin-1 and return the result as Python bytes
-   object.  Error handling is "strict".  Return *NULL* if an exception was
-   raised by the codec.
+   .. deprecated-removed:: 3.3 4.0
+      Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
+      :c:func:`PyUnicode_AsLatin1String`.
 
 
 ASCII Codecs
@@ -858,18 +1349,22 @@ codes generate errors.
    *s*.  Return *NULL* if an exception was raised by the codec.
 
 
+.. c:function:: PyObject* PyUnicode_AsASCIIString(PyObject *unicode)
+
+   Encode a Unicode object using ASCII and return the result as Python bytes
+   object.  Error handling is "strict".  Return *NULL* if an exception was
+   raised by the codec.
+
+
 .. c:function:: PyObject* PyUnicode_EncodeASCII(const Py_UNICODE *s, Py_ssize_t size, const char *errors)
 
    Encode the :c:type:`Py_UNICODE` buffer of the given *size* using ASCII and
    return a Python bytes object.  Return *NULL* if an exception was raised by
    the codec.
 
-
-.. c:function:: PyObject* PyUnicode_AsASCIIString(PyObject *unicode)
-
-   Encode a Unicode object using ASCII and return the result as Python bytes
-   object.  Error handling is "strict".  Return *NULL* if an exception was
-   raised by the codec.
+   .. deprecated-removed:: 3.3 4.0
+      Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
+      :c:func:`PyUnicode_AsASCIIString`.
 
 
 Character Map Codecs
@@ -898,7 +1393,8 @@ characters to different code points.
 
 These are the mapping codec APIs:
 
-.. c:function:: PyObject* PyUnicode_DecodeCharmap(const char *s, Py_ssize_t size, PyObject *mapping, const char *errors)
+.. c:function:: PyObject* PyUnicode_DecodeCharmap(const char *s, Py_ssize_t size, \
+                              PyObject *mapping, const char *errors)
 
    Create a Unicode object by decoding *size* bytes of the encoded string *s* using
    the given *mapping* object.  Return *NULL* if an exception was raised by the
@@ -908,13 +1404,6 @@ These are the mapping codec APIs:
    treated as "undefined mapping".
 
 
-.. c:function:: PyObject* PyUnicode_EncodeCharmap(const Py_UNICODE *s, Py_ssize_t size, PyObject *mapping, const char *errors)
-
-   Encode the :c:type:`Py_UNICODE` buffer of the given *size* using the given
-   *mapping* object and return a Python string object. Return *NULL* if an
-   exception was raised by the codec.
-
-
 .. c:function:: PyObject* PyUnicode_AsCharmapString(PyObject *unicode, PyObject *mapping)
 
    Encode a Unicode object using the given *mapping* object and return the result
@@ -924,7 +1413,8 @@ These are the mapping codec APIs:
 The following codec API is special in that maps Unicode to Unicode.
 
 
-.. c:function:: PyObject* PyUnicode_TranslateCharmap(const Py_UNICODE *s, Py_ssize_t size, PyObject *table, const char *errors)
+.. c:function:: PyObject* PyUnicode_TranslateCharmap(const Py_UNICODE *s, Py_ssize_t size, \
+                              PyObject *table, const char *errors)
 
    Translate a :c:type:`Py_UNICODE` buffer of the given *size* by applying a
    character mapping *table* to it and return the resulting Unicode object.  Return
@@ -937,6 +1427,22 @@ The following codec API is special in that maps Unicode to Unicode.
    and sequences work well.  Unmapped character ordinals (ones which cause a
    :exc:`LookupError`) are left untouched and are copied as-is.
 
+   .. deprecated-removed:: 3.3 4.0
+      Part of the old-style :c:type:`Py_UNICODE` API.
+
+   .. XXX replace with what?
+
+
+.. c:function:: PyObject* PyUnicode_EncodeCharmap(const Py_UNICODE *s, Py_ssize_t size, \
+                              PyObject *mapping, const char *errors)
+
+   Encode the :c:type:`Py_UNICODE` buffer of the given *size* using the given
+   *mapping* object and return a Python string object. Return *NULL* if an
+   exception was raised by the codec.
+
+   .. deprecated-removed:: 3.3 4.0
+      Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
+      :c:func:`PyUnicode_AsCharmapString`.
 
 
 MBCS codecs for Windows
@@ -953,7 +1459,8 @@ the user settings on the machine running the codec.
    Return *NULL* if an exception was raised by the codec.
 
 
-.. c:function:: PyObject* PyUnicode_DecodeMBCSStateful(const char *s, int size, const char *errors, int *consumed)
+.. c:function:: PyObject* PyUnicode_DecodeMBCSStateful(const char *s, int size, \
+                              const char *errors, int *consumed)
 
    If *consumed* is *NULL*, behave like :c:func:`PyUnicode_DecodeMBCS`. If
    *consumed* is not *NULL*, :c:func:`PyUnicode_DecodeMBCSStateful` will not decode
@@ -961,18 +1468,31 @@ the user settings on the machine running the codec.
    in *consumed*.
 
 
+.. c:function:: PyObject* PyUnicode_AsMBCSString(PyObject *unicode)
+
+   Encode a Unicode object using MBCS and return the result as Python bytes
+   object.  Error handling is "strict".  Return *NULL* if an exception was
+   raised by the codec.
+
+
+.. c:function:: PyObject* PyUnicode_EncodeCodePage(int code_page, PyObject *unicode, const char *errors)
+
+   Encode the Unicode object using the specified code page and return a Python
+   bytes object.  Return *NULL* if an exception was raised by the codec. Use
+   :c:data:`CP_ACP` code page to get the MBCS encoder.
+
+   .. versionadded:: 3.3
+
+
 .. c:function:: PyObject* PyUnicode_EncodeMBCS(const Py_UNICODE *s, Py_ssize_t size, const char *errors)
 
    Encode the :c:type:`Py_UNICODE` buffer of the given *size* using MBCS and return
    a Python bytes object.  Return *NULL* if an exception was raised by the
    codec.
 
-
-.. c:function:: PyObject* PyUnicode_AsMBCSString(PyObject *unicode)
-
-   Encode a Unicode object using MBCS and return the result as Python bytes
-   object.  Error handling is "strict".  Return *NULL* if an exception was
-   raised by the codec.
+   .. deprecated-removed:: 3.3 4.0
+      Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
+      :c:func:`PyUnicode_AsMBCSString` or :c:func:`PyUnicode_EncodeCodePage`.
 
 
 Methods & Slots
@@ -1011,7 +1531,8 @@ They all return *NULL* or ``-1`` if an exception occurs.
    characters are not included in the resulting strings.
 
 
-.. c:function:: PyObject* PyUnicode_Translate(PyObject *str, PyObject *table, const char *errors)
+.. c:function:: PyObject* PyUnicode_Translate(PyObject *str, PyObject *table, \
+                              const char *errors)
 
    Translate a string by applying a character mapping table to it and return the
    resulting Unicode object.
@@ -1033,14 +1554,16 @@ They all return *NULL* or ``-1`` if an exception occurs.
    Unicode string.
 
 
-.. c:function:: int PyUnicode_Tailmatch(PyObject *str, PyObject *substr, Py_ssize_t start, Py_ssize_t end, int direction)
+.. c:function:: int PyUnicode_Tailmatch(PyObject *str, PyObject *substr, \
+                        Py_ssize_t start, Py_ssize_t end, int direction)
 
    Return 1 if *substr* matches ``str[start:end]`` at the given tail end
    (*direction* == -1 means to do a prefix match, *direction* == 1 a suffix match),
    0 otherwise. Return ``-1`` if an error occurred.
 
 
-.. c:function:: Py_ssize_t PyUnicode_Find(PyObject *str, PyObject *substr, Py_ssize_t start, Py_ssize_t end, int direction)
+.. c:function:: Py_ssize_t PyUnicode_Find(PyObject *str, PyObject *substr, \
+                               Py_ssize_t start, Py_ssize_t end, int direction)
 
    Return the first position of *substr* in ``str[start:end]`` using the given
    *direction* (*direction* == 1 means to do a forward search, *direction* == -1 a
@@ -1049,13 +1572,27 @@ They all return *NULL* or ``-1`` if an exception occurs.
    occurred and an exception has been set.
 
 
-.. c:function:: Py_ssize_t PyUnicode_Count(PyObject *str, PyObject *substr, Py_ssize_t start, Py_ssize_t end)
+.. c:function:: Py_ssize_t PyUnicode_FindChar(PyObject *str, Py_UCS4 ch, \
+                               Py_ssize_t start, Py_ssize_t end, int direction)
+
+   Return the first position of the character *ch* in ``str[start:end]`` using
+   the given *direction* (*direction* == 1 means to do a forward search,
+   *direction* == -1 a backward search).  The return value is the index of the
+   first match; a value of ``-1`` indicates that no match was found, and ``-2``
+   indicates that an error occurred and an exception has been set.
+
+   .. versionadded:: 3.3
+
+
+.. c:function:: Py_ssize_t PyUnicode_Count(PyObject *str, PyObject *substr, \
+                               Py_ssize_t start, Py_ssize_t end)
 
    Return the number of non-overlapping occurrences of *substr* in
    ``str[start:end]``.  Return ``-1`` if an error occurred.
 
 
-.. c:function:: PyObject* PyUnicode_Replace(PyObject *str, PyObject *substr, PyObject *replstr, Py_ssize_t maxcount)
+.. c:function:: PyObject* PyUnicode_Replace(PyObject *str, PyObject *substr, \
+                              PyObject *replstr, Py_ssize_t maxcount)
 
    Replace at most *maxcount* occurrences of *substr* in *str* with *replstr* and
    return the resulting Unicode object. *maxcount* == -1 means replace all
@@ -1076,7 +1613,7 @@ They all return *NULL* or ``-1`` if an exception occurs.
    ISO-8859-1 if it contains non-ASCII characters".
 
 
-.. c:function:: int PyUnicode_RichCompare(PyObject *left,  PyObject *right,  int op)
+.. c:function:: PyObject* PyUnicode_RichCompare(PyObject *left,  PyObject *right,  int op)
 
    Rich compare two unicode strings and return one of the following:
 
@@ -1103,8 +1640,8 @@ They all return *NULL* or ``-1`` if an exception occurs.
    Check whether *element* is contained in *container* and return true or false
    accordingly.
 
-   *element* has to coerce to a one element Unicode string. ``-1`` is returned if
-   there was an error.
+   *element* has to coerce to a one element Unicode string. ``-1`` is returned
+   if there was an error.
 
 
 .. c:function:: void PyUnicode_InternInPlace(PyObject **string)
@@ -1123,7 +1660,6 @@ They all return *NULL* or ``-1`` if an exception occurs.
 .. c:function:: PyObject* PyUnicode_InternFromString(const char *v)
 
    A combination of :c:func:`PyUnicode_FromString` and
-   :c:func:`PyUnicode_InternInPlace`, returning either a new unicode string object
-   that has been interned, or a new ("owned") reference to an earlier interned
-   string object with the same value.
-
+   :c:func:`PyUnicode_InternInPlace`, returning either a new unicode string
+   object that has been interned, or a new ("owned") reference to an earlier
+   interned string object with the same value.