summaryrefslogtreecommitdiffstats
path: root/Doc/c-api/unicode.rst
diff options
context:
space:
mode:
authorInada Naoki <songofacandy@gmail.com>2022-05-12 05:48:38 (GMT)
committerGitHub <noreply@github.com>2022-05-12 05:48:38 (GMT)
commitf9c9354a7a173eaca2aa19e667b5cf12167b7fed (patch)
treeeb0fdd3219f53c973f1a7dbbcb9f8b0e0babdf36 /Doc/c-api/unicode.rst
parent68fec31364e96d122aae0571c14683b4ddb0ebd0 (diff)
downloadcpython-f9c9354a7a173eaca2aa19e667b5cf12167b7fed.zip
cpython-f9c9354a7a173eaca2aa19e667b5cf12167b7fed.tar.gz
cpython-f9c9354a7a173eaca2aa19e667b5cf12167b7fed.tar.bz2
gh-92536: PEP 623: Remove wstr and legacy APIs from Unicode (GH-92537)
Diffstat (limited to 'Doc/c-api/unicode.rst')
-rw-r--r--Doc/c-api/unicode.rst177
1 files changed, 21 insertions, 156 deletions
diff --git a/Doc/c-api/unicode.rst b/Doc/c-api/unicode.rst
index 792a469..8fab3b7 100644
--- a/Doc/c-api/unicode.rst
+++ b/Doc/c-api/unicode.rst
@@ -17,26 +17,12 @@ of Unicode characters while staying memory efficient. There are special cases
for strings where all code points are below 128, 256, or 65536; otherwise, code
points must be below 1114112 (which is the full Unicode range).
-:c:type:`Py_UNICODE*` and UTF-8 representations are created on demand and cached
-in the Unicode object. The :c:type:`Py_UNICODE*` representation is deprecated
-and inefficient.
-
-Due to the transition between the old APIs and the new APIs, Unicode objects
-can internally be in two states depending on how they were created:
-
-* "canonical" Unicode objects are all objects created by a non-deprecated
- Unicode API. They use the most efficient representation allowed by the
- implementation.
-
-* "legacy" Unicode objects have been created through one of the deprecated
- APIs (typically :c:func:`PyUnicode_FromUnicode`) and only bear the
- :c:type:`Py_UNICODE*` representation; you will have to call
- :c:func:`PyUnicode_READY` on them before calling any other API.
+UTF-8 representation is created on demand and cached in the Unicode object.
.. note::
- The "legacy" Unicode object will be removed in Python 3.12 with deprecated
- APIs. All Unicode objects will be "canonical" since then. See :pep:`623`
- for more information.
+ The :c:type:`Py_UNICODE` representation has been removed since Python 3.12
+ with deprecated APIs.
+ See :pep:`623` for more information.
Unicode Type
@@ -101,18 +87,12 @@ access to internal read-only data of Unicode objects:
.. c:function:: int PyUnicode_READY(PyObject *o)
- Ensure the string object *o* is in the "canonical" representation. This is
- required before using any of the access macros described below.
-
- .. XXX expand on when it is not required
-
- Returns ``0`` on success and ``-1`` with an exception set on failure, which in
- particular happens if memory allocation fails.
+ Returns ``0``. This API is kept only for backward compatibility.
.. versionadded:: 3.3
- .. deprecated-removed:: 3.10 3.12
- This API will be removed with :c:func:`PyUnicode_FromUnicode`.
+ .. deprecated:: 3.10
+ This API do nothing since Python 3.12. Please remove code using this function.
.. c:function:: Py_ssize_t PyUnicode_GET_LENGTH(PyObject *o)
@@ -130,14 +110,12 @@ access to internal read-only data of Unicode objects:
Return a pointer to the canonical representation cast to UCS1, UCS2 or UCS4
integer types for direct character access. No checks are performed if the
canonical representation has the correct character size; use
- :c:func:`PyUnicode_KIND` to select the right function. Make sure
- :c:func:`PyUnicode_READY` has been called before accessing this.
+ :c:func:`PyUnicode_KIND` to select the right function.
.. versionadded:: 3.3
-.. c:macro:: PyUnicode_WCHAR_KIND
- PyUnicode_1BYTE_KIND
+.. c:macro:: PyUnicode_1BYTE_KIND
PyUnicode_2BYTE_KIND
PyUnicode_4BYTE_KIND
@@ -145,8 +123,8 @@ access to internal read-only data of Unicode objects:
.. versionadded:: 3.3
- .. deprecated-removed:: 3.10 3.12
- ``PyUnicode_WCHAR_KIND`` is deprecated.
+ .. versionchanged:: 3.12
+ ``PyUnicode_WCHAR_KIND`` has been removed.
.. c:function:: int PyUnicode_KIND(PyObject *o)
@@ -155,8 +133,6 @@ access to internal read-only data of Unicode objects:
bytes per character this Unicode object uses to store its data. *o* has to
be a Unicode object in the "canonical" representation (not checked).
- .. XXX document "0" return value?
-
.. versionadded:: 3.3
@@ -208,49 +184,6 @@ access to internal read-only data of Unicode objects:
.. versionadded:: 3.3
-.. c:function:: Py_ssize_t PyUnicode_GET_SIZE(PyObject *o)
-
- Return the size of the deprecated :c:type:`Py_UNICODE` representation, in
- code units (this includes surrogate pairs as 2 units). *o* has to be a
- Unicode object (not checked).
-
- .. deprecated-removed:: 3.3 3.12
- Part of the old-style Unicode API, please migrate to using
- :c:func:`PyUnicode_GET_LENGTH`.
-
-
-.. c:function:: Py_ssize_t PyUnicode_GET_DATA_SIZE(PyObject *o)
-
- Return the size of the deprecated :c:type:`Py_UNICODE` representation in
- bytes. *o* has to be a Unicode object (not checked).
-
- .. deprecated-removed:: 3.3 3.12
- Part of the old-style Unicode API, please migrate to using
- :c:func:`PyUnicode_GET_LENGTH`.
-
-
-.. c:function:: Py_UNICODE* PyUnicode_AS_UNICODE(PyObject *o)
- const char* PyUnicode_AS_DATA(PyObject *o)
-
- Return a pointer to a :c:type:`Py_UNICODE` representation of the object. The
- returned buffer is always terminated with an extra null code point. It
- may also contain embedded null code points, which would cause the string
- to be truncated when used in most C functions. The ``AS_DATA`` form
- casts the pointer to :c:type:`const char *`. The *o* argument has to be
- a Unicode object (not checked).
-
- .. versionchanged:: 3.3
- This function is now inefficient -- because in many cases the
- :c:type:`Py_UNICODE` representation does not exist and needs to be created
- -- and can fail (return ``NULL`` with an exception set). Try to port the
- code to use the new :c:func:`PyUnicode_nBYTE_DATA` macros or use
- :c:func:`PyUnicode_WRITE` or :c:func:`PyUnicode_READ`.
-
- .. deprecated-removed:: 3.3 3.12
- Part of the old-style Unicode API, please migrate to using the
- :c:func:`PyUnicode_nBYTE_DATA` family of macros.
-
-
.. c:function:: int PyUnicode_IsIdentifier(PyObject *o)
Return ``1`` if the string is a valid identifier according to the language
@@ -436,12 +369,17 @@ APIs:
Create a Unicode object from the char buffer *u*. The bytes will be
interpreted as being UTF-8 encoded. The buffer is copied into the new
- object. If the buffer is not ``NULL``, the return value might be a shared
- object, i.e. modification of the data is not allowed.
+ object.
+ The return value might be a shared object, i.e. modification of the data is
+ not allowed.
- If *u* is ``NULL``, this function behaves like :c:func:`PyUnicode_FromUnicode`
- with the buffer set to ``NULL``. This usage is deprecated in favor of
- :c:func:`PyUnicode_New`, and will be removed in Python 3.12.
+ This function raises :exc:`SystemError` when:
+
+ * *size* < 0,
+ * *u* is ``NULL`` and *size* > 0
+
+ .. versionchanged:: 3.12
+ *u* == ``NULL`` with *size* > 0 is not allowed anymore.
.. c:function:: PyObject *PyUnicode_FromString(const char *u)
@@ -680,79 +618,6 @@ APIs:
.. versionadded:: 3.3
-Deprecated Py_UNICODE APIs
-""""""""""""""""""""""""""
-
-.. deprecated-removed:: 3.3 3.12
-
-These API functions are deprecated with the implementation of :pep:`393`.
-Extension modules can continue using them, as they will not be removed in Python
-3.x, but need to be aware that their use can now cause performance and memory hits.
-
-
-.. c:function:: PyObject* PyUnicode_FromUnicode(const Py_UNICODE *u, Py_ssize_t size)
-
- Create a Unicode object from the Py_UNICODE buffer *u* of the given size. *u*
- may be ``NULL`` which causes the contents to be undefined. It is the user's
- responsibility to fill in the needed data. The buffer is copied into the new
- object.
-
- If the buffer is not ``NULL``, the return value might be a shared object.
- Therefore, modification of the resulting Unicode object is only allowed when
- *u* is ``NULL``.
-
- If the buffer is ``NULL``, :c:func:`PyUnicode_READY` must be called once the
- string content has been filled before using any of the access macros such as
- :c:func:`PyUnicode_KIND`.
-
- .. deprecated-removed:: 3.3 3.12
- Part of the old-style Unicode API, please migrate to using
- :c:func:`PyUnicode_FromKindAndData`, :c:func:`PyUnicode_FromWideChar`, or
- :c:func:`PyUnicode_New`.
-
-
-.. c:function:: Py_UNICODE* PyUnicode_AsUnicode(PyObject *unicode)
-
- Return a read-only pointer to the Unicode object's internal
- :c:type:`Py_UNICODE` buffer, or ``NULL`` on error. This will create the
- :c:type:`Py_UNICODE*` representation of the object if it is not yet
- available. The buffer is always terminated with an extra null code point.
- Note that the resulting :c:type:`Py_UNICODE` string may also contain
- embedded null code points, which would cause the string to be truncated when
- used in most C functions.
-
- .. deprecated-removed:: 3.3 3.12
- Part of the old-style Unicode API, please migrate to using
- :c:func:`PyUnicode_AsUCS4`, :c:func:`PyUnicode_AsWideChar`,
- :c:func:`PyUnicode_ReadChar` or similar new APIs.
-
-
-.. c:function:: Py_UNICODE* PyUnicode_AsUnicodeAndSize(PyObject *unicode, Py_ssize_t *size)
-
- Like :c:func:`PyUnicode_AsUnicode`, but also saves the :c:func:`Py_UNICODE`
- array length (excluding the extra null terminator) in *size*.
- Note that the resulting :c:type:`Py_UNICODE*` string
- may contain embedded null code points, which would cause the string to be
- truncated when used in most C functions.
-
- .. versionadded:: 3.3
-
- .. deprecated-removed:: 3.3 3.12
- Part of the old-style Unicode API, please migrate to using
- :c:func:`PyUnicode_AsUCS4`, :c:func:`PyUnicode_AsWideChar`,
- :c:func:`PyUnicode_ReadChar` or similar new APIs.
-
-
-.. c:function:: Py_ssize_t PyUnicode_GetSize(PyObject *unicode)
-
- Return the size of the deprecated :c:type:`Py_UNICODE` representation, in
- code units (this includes surrogate pairs as 2 units).
-
- .. deprecated-removed:: 3.3 3.12
- Part of the old-style Unicode API, please migrate to using
- :c:func:`PyUnicode_GET_LENGTH`.
-
-
.. c:function:: PyObject* PyUnicode_FromObject(PyObject *obj)
Copy an instance of a Unicode subtype to a new true Unicode object if