diff options
author | Inada Naoki <songofacandy@gmail.com> | 2022-05-12 05:48:38 (GMT) |
---|---|---|
committer | GitHub <noreply@github.com> | 2022-05-12 05:48:38 (GMT) |
commit | f9c9354a7a173eaca2aa19e667b5cf12167b7fed (patch) | |
tree | eb0fdd3219f53c973f1a7dbbcb9f8b0e0babdf36 /Doc/c-api/unicode.rst | |
parent | 68fec31364e96d122aae0571c14683b4ddb0ebd0 (diff) | |
download | cpython-f9c9354a7a173eaca2aa19e667b5cf12167b7fed.zip cpython-f9c9354a7a173eaca2aa19e667b5cf12167b7fed.tar.gz cpython-f9c9354a7a173eaca2aa19e667b5cf12167b7fed.tar.bz2 |
gh-92536: PEP 623: Remove wstr and legacy APIs from Unicode (GH-92537)
Diffstat (limited to 'Doc/c-api/unicode.rst')
-rw-r--r-- | Doc/c-api/unicode.rst | 177 |
1 files changed, 21 insertions, 156 deletions
diff --git a/Doc/c-api/unicode.rst b/Doc/c-api/unicode.rst index 792a469..8fab3b7 100644 --- a/Doc/c-api/unicode.rst +++ b/Doc/c-api/unicode.rst @@ -17,26 +17,12 @@ of Unicode characters while staying memory efficient. There are special cases for strings where all code points are below 128, 256, or 65536; otherwise, code points must be below 1114112 (which is the full Unicode range). -:c:type:`Py_UNICODE*` and UTF-8 representations are created on demand and cached -in the Unicode object. The :c:type:`Py_UNICODE*` representation is deprecated -and inefficient. - -Due to the transition between the old APIs and the new APIs, Unicode objects -can internally be in two states depending on how they were created: - -* "canonical" Unicode objects are all objects created by a non-deprecated - Unicode API. They use the most efficient representation allowed by the - implementation. - -* "legacy" Unicode objects have been created through one of the deprecated - APIs (typically :c:func:`PyUnicode_FromUnicode`) and only bear the - :c:type:`Py_UNICODE*` representation; you will have to call - :c:func:`PyUnicode_READY` on them before calling any other API. +UTF-8 representation is created on demand and cached in the Unicode object. .. note:: - The "legacy" Unicode object will be removed in Python 3.12 with deprecated - APIs. All Unicode objects will be "canonical" since then. See :pep:`623` - for more information. + The :c:type:`Py_UNICODE` representation has been removed since Python 3.12 + with deprecated APIs. + See :pep:`623` for more information. Unicode Type @@ -101,18 +87,12 @@ access to internal read-only data of Unicode objects: .. c:function:: int PyUnicode_READY(PyObject *o) - Ensure the string object *o* is in the "canonical" representation. This is - required before using any of the access macros described below. - - .. XXX expand on when it is not required - - Returns ``0`` on success and ``-1`` with an exception set on failure, which in - particular happens if memory allocation fails. + Returns ``0``. This API is kept only for backward compatibility. .. versionadded:: 3.3 - .. deprecated-removed:: 3.10 3.12 - This API will be removed with :c:func:`PyUnicode_FromUnicode`. + .. deprecated:: 3.10 + This API do nothing since Python 3.12. Please remove code using this function. .. c:function:: Py_ssize_t PyUnicode_GET_LENGTH(PyObject *o) @@ -130,14 +110,12 @@ access to internal read-only data of Unicode objects: Return a pointer to the canonical representation cast to UCS1, UCS2 or UCS4 integer types for direct character access. No checks are performed if the canonical representation has the correct character size; use - :c:func:`PyUnicode_KIND` to select the right function. Make sure - :c:func:`PyUnicode_READY` has been called before accessing this. + :c:func:`PyUnicode_KIND` to select the right function. .. versionadded:: 3.3 -.. c:macro:: PyUnicode_WCHAR_KIND - PyUnicode_1BYTE_KIND +.. c:macro:: PyUnicode_1BYTE_KIND PyUnicode_2BYTE_KIND PyUnicode_4BYTE_KIND @@ -145,8 +123,8 @@ access to internal read-only data of Unicode objects: .. versionadded:: 3.3 - .. deprecated-removed:: 3.10 3.12 - ``PyUnicode_WCHAR_KIND`` is deprecated. + .. versionchanged:: 3.12 + ``PyUnicode_WCHAR_KIND`` has been removed. .. c:function:: int PyUnicode_KIND(PyObject *o) @@ -155,8 +133,6 @@ access to internal read-only data of Unicode objects: bytes per character this Unicode object uses to store its data. *o* has to be a Unicode object in the "canonical" representation (not checked). - .. XXX document "0" return value? - .. versionadded:: 3.3 @@ -208,49 +184,6 @@ access to internal read-only data of Unicode objects: .. versionadded:: 3.3 -.. c:function:: Py_ssize_t PyUnicode_GET_SIZE(PyObject *o) - - Return the size of the deprecated :c:type:`Py_UNICODE` representation, in - code units (this includes surrogate pairs as 2 units). *o* has to be a - Unicode object (not checked). - - .. deprecated-removed:: 3.3 3.12 - Part of the old-style Unicode API, please migrate to using - :c:func:`PyUnicode_GET_LENGTH`. - - -.. c:function:: Py_ssize_t PyUnicode_GET_DATA_SIZE(PyObject *o) - - Return the size of the deprecated :c:type:`Py_UNICODE` representation in - bytes. *o* has to be a Unicode object (not checked). - - .. deprecated-removed:: 3.3 3.12 - Part of the old-style Unicode API, please migrate to using - :c:func:`PyUnicode_GET_LENGTH`. - - -.. c:function:: Py_UNICODE* PyUnicode_AS_UNICODE(PyObject *o) - const char* PyUnicode_AS_DATA(PyObject *o) - - Return a pointer to a :c:type:`Py_UNICODE` representation of the object. The - returned buffer is always terminated with an extra null code point. It - may also contain embedded null code points, which would cause the string - to be truncated when used in most C functions. The ``AS_DATA`` form - casts the pointer to :c:type:`const char *`. The *o* argument has to be - a Unicode object (not checked). - - .. versionchanged:: 3.3 - This function is now inefficient -- because in many cases the - :c:type:`Py_UNICODE` representation does not exist and needs to be created - -- and can fail (return ``NULL`` with an exception set). Try to port the - code to use the new :c:func:`PyUnicode_nBYTE_DATA` macros or use - :c:func:`PyUnicode_WRITE` or :c:func:`PyUnicode_READ`. - - .. deprecated-removed:: 3.3 3.12 - Part of the old-style Unicode API, please migrate to using the - :c:func:`PyUnicode_nBYTE_DATA` family of macros. - - .. c:function:: int PyUnicode_IsIdentifier(PyObject *o) Return ``1`` if the string is a valid identifier according to the language @@ -436,12 +369,17 @@ APIs: Create a Unicode object from the char buffer *u*. The bytes will be interpreted as being UTF-8 encoded. The buffer is copied into the new - object. If the buffer is not ``NULL``, the return value might be a shared - object, i.e. modification of the data is not allowed. + object. + The return value might be a shared object, i.e. modification of the data is + not allowed. - If *u* is ``NULL``, this function behaves like :c:func:`PyUnicode_FromUnicode` - with the buffer set to ``NULL``. This usage is deprecated in favor of - :c:func:`PyUnicode_New`, and will be removed in Python 3.12. + This function raises :exc:`SystemError` when: + + * *size* < 0, + * *u* is ``NULL`` and *size* > 0 + + .. versionchanged:: 3.12 + *u* == ``NULL`` with *size* > 0 is not allowed anymore. .. c:function:: PyObject *PyUnicode_FromString(const char *u) @@ -680,79 +618,6 @@ APIs: .. versionadded:: 3.3 -Deprecated Py_UNICODE APIs -"""""""""""""""""""""""""" - -.. deprecated-removed:: 3.3 3.12 - -These API functions are deprecated with the implementation of :pep:`393`. -Extension modules can continue using them, as they will not be removed in Python -3.x, but need to be aware that their use can now cause performance and memory hits. - - -.. c:function:: PyObject* PyUnicode_FromUnicode(const Py_UNICODE *u, Py_ssize_t size) - - Create a Unicode object from the Py_UNICODE buffer *u* of the given size. *u* - may be ``NULL`` which causes the contents to be undefined. It is the user's - responsibility to fill in the needed data. The buffer is copied into the new - object. - - If the buffer is not ``NULL``, the return value might be a shared object. - Therefore, modification of the resulting Unicode object is only allowed when - *u* is ``NULL``. - - If the buffer is ``NULL``, :c:func:`PyUnicode_READY` must be called once the - string content has been filled before using any of the access macros such as - :c:func:`PyUnicode_KIND`. - - .. deprecated-removed:: 3.3 3.12 - Part of the old-style Unicode API, please migrate to using - :c:func:`PyUnicode_FromKindAndData`, :c:func:`PyUnicode_FromWideChar`, or - :c:func:`PyUnicode_New`. - - -.. c:function:: Py_UNICODE* PyUnicode_AsUnicode(PyObject *unicode) - - Return a read-only pointer to the Unicode object's internal - :c:type:`Py_UNICODE` buffer, or ``NULL`` on error. This will create the - :c:type:`Py_UNICODE*` representation of the object if it is not yet - available. The buffer is always terminated with an extra null code point. - Note that the resulting :c:type:`Py_UNICODE` string may also contain - embedded null code points, which would cause the string to be truncated when - used in most C functions. - - .. deprecated-removed:: 3.3 3.12 - Part of the old-style Unicode API, please migrate to using - :c:func:`PyUnicode_AsUCS4`, :c:func:`PyUnicode_AsWideChar`, - :c:func:`PyUnicode_ReadChar` or similar new APIs. - - -.. c:function:: Py_UNICODE* PyUnicode_AsUnicodeAndSize(PyObject *unicode, Py_ssize_t *size) - - Like :c:func:`PyUnicode_AsUnicode`, but also saves the :c:func:`Py_UNICODE` - array length (excluding the extra null terminator) in *size*. - Note that the resulting :c:type:`Py_UNICODE*` string - may contain embedded null code points, which would cause the string to be - truncated when used in most C functions. - - .. versionadded:: 3.3 - - .. deprecated-removed:: 3.3 3.12 - Part of the old-style Unicode API, please migrate to using - :c:func:`PyUnicode_AsUCS4`, :c:func:`PyUnicode_AsWideChar`, - :c:func:`PyUnicode_ReadChar` or similar new APIs. - - -.. c:function:: Py_ssize_t PyUnicode_GetSize(PyObject *unicode) - - Return the size of the deprecated :c:type:`Py_UNICODE` representation, in - code units (this includes surrogate pairs as 2 units). - - .. deprecated-removed:: 3.3 3.12 - Part of the old-style Unicode API, please migrate to using - :c:func:`PyUnicode_GET_LENGTH`. - - .. c:function:: PyObject* PyUnicode_FromObject(PyObject *obj) Copy an instance of a Unicode subtype to a new true Unicode object if |