summaryrefslogtreecommitdiffstats
path: root/Include/cpython/unicodeobject.h
Commit message (Collapse)AuthorAgeFilesLines
* gh-142217: Deprecate the private _Py_Identifier C API (#142221)Victor Stinner2025-12-121-1/+1
| | | | | | | Deprecate functions: * _PyObject_CallMethodId() * _PyObject_GetAttrId() * _PyUnicode_FromId()
* gh-131510: Use PyUnstable_Unicode_GET_CACHED_HASH() (GH-141520)Victor Stinner2025-11-141-1/+0
| | | | | | | | Replace code that directly accesses PyASCIIObject.hash with PyUnstable_Unicode_GET_CACHED_HASH(). Remove redundant "assert(PyUnicode_Check(op))" from PyUnstable_Unicode_GET_CACHED_HASH(), _PyASCIIObject_CAST() already implements the check.
* gh-92536: Fix comment about number of unicode string types (#136439)Arseniy Terekhin2025-07-081-1/+1
|
* gh-127545: Add _Py_ALIGNED_DEF(N, T) and use it for PyObject (GH-135209)Petr Viktorin2025-06-111-61/+59
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Replace _Py_ALIGN_AS(V) by _Py_ALIGNED_DEF(N, T) This is now a common façade for the various `_Alignas` alternatives, which behave in interesting ways -- see the source comment. The new macro (and MSVC's `__declspec(align)`) should not be used on a variable/member declaration that includes a struct declaraton. A workaround is to separate the struct definition. Do that for `PyASCIIObject.state`. * Specify minimum PyGC_Head and PyObject alignment As documented in InternalDocs/garbage_collector.md, the garbage collector stores flags in the least significant two bits of the _gc_prev pointer in struct PyGC_Head. Consequently, this pointer is only capable of storing a location that's aligned to a 4-byte boundary. Encode this requirement using _Py_ALIGNED_DEF. This patch fixes a segfault in m68k, which was previously investigated by Adrian Glaubitz here: https://lists.debian.org/debian-68k/2024/11/msg00020.html https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1087600 Original patch (using the GCC-only Py_ALIGNED) by Finn Thain. Co-authored-by: Finn Thain <fthain@linux-m68k.org> Co-authored-by: Victor Stinner <vstinner@python.org> Co-authored-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
* gh-134891: Add PyUnstable_Unicode_GET_CACHED_HASH (GH-134892)Petr Viktorin2025-06-061-0/+11
|
* gh-133968: Add PyUnicodeWriter_WriteASCII() function (#133973)Victor Stinner2025-05-291-0/+4
| | | | | | | | | | Replace most PyUnicodeWriter_WriteUTF8() calls with PyUnicodeWriter_WriteASCII(). Unrelated change to please the linter: remove an unused import in test_ctypes. Co-authored-by: Peter Bierma <zintensitydev@gmail.com> Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
* gh-128972: Add `_Py_ALIGN_AS` and revert `PyASCIIObject` memory layout. ↵Petr Viktorin2025-05-021-10/+20
| | | | | | | | | | | | | (GH-133085) Add `_Py_ALIGN_AS` as per C API WG vote: https://github.com/capi-workgroup/decisions/issues/61 This patch only adds it to free-threaded builds; the `#ifdef Py_GIL_DISABLED` can be removed in the future. Use this to revert `PyASCIIObject` memory layout for non-free-threaded builds. The long-term plan is to deprecate the entire struct; until that happens it's better to keep it unchanged, as courtesy to people that rely on it despite it not being stable ABI.
* gh-130790: Remove references about unicode's readiness from comments (#130801)Sergey Miryanov2025-03-031-5/+4
|
* gh-46236: PyUnicode docs improvements (GH-129966)Petr Viktorin2025-02-281-2/+2
| | | | | | | | | | | | | | | | | | | Move deprecated PyUnicode API docs to new section Move Py_UNICODE to a new "Deprecated API" section. Formally soft-deprecate PyUnicode_READY, and move it Document and soft-deprecate PyUnicode_IS_READY, and move it Document PyUnicode_IS_ASCII, PyUnicode_CHECK_INTERNED PyUnicode_New docs: Clarify requirements for "fresh" strings PyUnicodeWriter_DecodeUTF8Stateful: Link "error-handlers" Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
* gh-128863: Deprecate the private _PyUnicodeWriter API (#129245)Victor Stinner2025-02-201-32/+28
| | | | | | | | | | | | | | | Deprecate private C API functions: * _PyUnicodeWriter_Init() * _PyUnicodeWriter_Finish() * _PyUnicodeWriter_Dealloc() * _PyUnicodeWriter_WriteChar() * _PyUnicodeWriter_WriteStr() * _PyUnicodeWriter_WriteSubstring() * _PyUnicodeWriter_WriteASCIIString() * _PyUnicodeWriter_WriteLatin1String() These functions are not deprecated in the internal C API (if the Py_BUILD_CORE macro is defined).
* gh-89188: Implement PyUnicode_KIND() as a function (#129412)Victor Stinner2025-01-301-2/+6
| | | | | Implement PyUnicode_KIND() and PyUnicode_DATA() as function, in addition to the macros with the same names. The macros rely on C bit fields which have compiler-specific layout.
* gh-128863: Deprecate private C API functions (#128864)Victor Stinner2025-01-221-2/+6
| | | | | | | | | | | | | | | Deprecate private C API functions: * _PyBytes_Join() * _PyDict_GetItemStringWithError() * _PyDict_Pop() * _PyThreadState_UncheckedGet() * _PyUnicode_AsString() * _Py_HashPointer() * _Py_fopen_obj() Replace _Py_HashPointer() with Py_HashPointer(). Remove references to deprecated functions.
* gh-128137: Update PyASCIIObject to handle interned field with the atomic ↵Donghee Na2025-01-051-7/+13
| | | | operation (gh-128196)
* gh-119182: Add PyUnicodeWriter_WriteUCS4() function (#120849)Victor Stinner2024-06-241-0/+4
|
* gh-119182: Add PyUnicodeWriter_DecodeUTF8Stateful() (#120639)Victor Stinner2024-06-211-0/+10
| | | | | | Add PyUnicodeWriter_WriteWideChar() and PyUnicodeWriter_DecodeUTF8Stateful() functions. Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
* gh-119182: Add PyUnicodeWriter C API (#119184)Victor Stinner2024-06-171-2/+35
|
* gh-112026: Restore removed private C API (#112115)Victor Stinner2023-11-151-0/+131
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Restore removed private C API functions, macros and structures which have no simple replacement for now: * _PyDict_GetItem_KnownHash() * _PyDict_NewPresized() * _PyHASH_BITS * _PyHASH_IMAG * _PyHASH_INF * _PyHASH_MODULUS * _PyHASH_MULTIPLIER * _PyLong_Copy() * _PyLong_FromDigits() * _PyLong_New() * _PyLong_Sign() * _PyObject_CallMethodId() * _PyObject_CallMethodNoArgs() * _PyObject_CallMethodOneArg() * _PyObject_CallOneArg() * _PyObject_EXTRA_INIT * _PyObject_FastCallDict() * _PyObject_GetAttrId() * _PyObject_Vectorcall() * _PyObject_VectorcallMethod() * _PyStack_AsDict() * _PyThread_CurrentFrames() * _PyUnicodeWriter structure * _PyUnicodeWriter_Dealloc() * _PyUnicodeWriter_Finish() * _PyUnicodeWriter_Init() * _PyUnicodeWriter_Prepare() * _PyUnicodeWriter_PrepareKind() * _PyUnicodeWriter_WriteASCIIString() * _PyUnicodeWriter_WriteChar() * _PyUnicodeWriter_WriteLatin1String() * _PyUnicodeWriter_WriteStr() * _PyUnicodeWriter_WriteSubstring() * _PyUnicode_AsString() * _PyUnicode_FromId() * _PyVectorcall_Function() * _Py_HashDouble() * _Py_HashPointer() * _Py_IDENTIFIER() * _Py_c_abs() * _Py_c_diff() * _Py_c_neg() * _Py_c_pow() * _Py_c_prod() * _Py_c_quot() * _Py_c_sum() * _Py_static_string() * _Py_static_string_init()
* gh-111089: Revert PyUnicode_AsUTF8() changes (#111833)Victor Stinner2023-11-071-0/+16
| | | | | | | | | | | | | | | | | | | | | * Revert "gh-111089: Use PyUnicode_AsUTF8() in Argument Clinic (#111585)" This reverts commit d9b606b3d04fc56fb0bcc479d7d6c14562edb5e2. * Revert "gh-111089: Use PyUnicode_AsUTF8() in getargs.c (#111620)" This reverts commit cde1071b2a72e8261ca66053ef61431b7f3a81fd. * Revert "gh-111089: PyUnicode_AsUTF8() now raises on embedded NUL (#111091)" This reverts commit d731579bfb9a497cfb0076cb6b221058a20088fe. * Revert "gh-111089: Add PyUnicode_AsUTF8() to the limited C API (#111121)" This reverts commit d8f32be5b6a736dc2fc9dca3f1bf176c82fc9b44. * Revert "gh-111089: Use PyUnicode_AsUTF8() in sqlite3 (#111122)" This reverts commit 37e4e20eaa8f27ada926d49e5971fecf0477ad26.
* gh-111089: Add PyUnicode_AsUTF8() to the limited C API (#111121)Victor Stinner2023-10-201-13/+0
| | | | | | | | Add PyUnicode_AsUTF8() function to the limited C API. multiprocessing posixshmem now uses PyUnicode_AsUTF8() instead of PyUnicode_AsUTF8AndSize(): the extension is built with the limited C API. The function now raises an exception if the filename contains an embedded null character instead of truncating silently the filename.
* gh-111089: PyUnicode_AsUTF8() now raises on embedded NUL (#111091)Victor Stinner2023-10-201-10/+10
| | | | | | | | | * PyUnicode_AsUTF8() now raises an exception if the string contains embedded null characters. * Update related C API tests (test_capi.test_unicode). * type_new_set_doc() uses PyUnicode_AsUTF8AndSize() to silently truncate doc containing null bytes. Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
* gh-106931: Intern Statically Allocated Strings Globally (gh-107272)Eric Snow2023-07-271-1/+3
| | | | | We tried this before with a dict and for all interned strings. That ran into problems due to interpreter isolation. However, exclusively using a per-interpreter cache caused some inconsistency that can eliminate the benefit of interning. Here we circle back to using a global cache, but only for statically allocated strings. We also use a more-basic _Py_hashtable_t for that global cache instead of a dict. Ideally we would only have the global cache, but the optional isolation of each interpreter's allocator means that a non-static string object must not outlive its interpreter. Thus we would have to store a copy of each such interned string in the global cache, tied to the main interpreter.
* gh-106320: Remove private _PyUnicode C API (#107185)Victor Stinner2023-07-241-36/+0
| | | | | | | | | | | | | | | Move private _PyUnicode functions to the internal C API (pycore_unicodeobject.h): * _PyUnicode_IsCaseIgnorable() * _PyUnicode_IsCased() * _PyUnicode_IsXidContinue() * _PyUnicode_IsXidStart() * _PyUnicode_ToFoldedFull() * _PyUnicode_ToLowerFull() * _PyUnicode_ToTitleFull() * _PyUnicode_ToUpperFull() No longer export these functions.
* gh-106320: Remove private _PyUnicode_AsString() alias (#107021)Victor Stinner2023-07-221-5/+0
| | | | | | | | Remove private _PyUnicode_AsString() alias to PyUnicode_AsUTF8(). It was kept for backward compatibility with Python 3.0 - 3.2. The PyUnicode_AsUTF8() is available since Python 3.3. The PyUnicode_AsUTF8String() function can be used to keep compatibility with Python 3.2 and older.
* gh-106320: Remove _PyUnicode_TransformDecimalAndSpaceToASCII() (#106398)Victor Stinner2023-07-041-37/+0
| | | | | | | | Remove private _PyUnicode_TransformDecimalAndSpaceToASCII() and other private _PyUnicode C API functions: move them to the internal C API (pycore_unicodeobject.h). No longer most of these functions. Replace _testcapi.unicode_transformdecimalandspacetoascii() with _testinternal._PyUnicode_TransformDecimalAndSpaceToASCII().
* gh-106320: Remove private _PyUnicode codecs C API functions (#106385)Victor Stinner2023-07-041-106/+0
| | | | | Remove private _PyUnicode codecs C API functions: move them to the internal C API (pycore_unicodeobject.h). No longer export most of these functions.
* gh-106320: Remove more private _PyUnicode C API functions (#106382)Victor Stinner2023-07-031-69/+0
| | | | | | Remove more private _PyUnicode C API functions: move them to the internal C API (pycore_unicodeobject.h). No longer export most pycore_unicodeobject.h functions.
* gh-106320: Move _PyUnicodeWriter to the internal C API (#106342)Victor Stinner2023-07-031-139/+0
| | | | | | Move also _PyUnicode_FormatAdvancedWriter(). CJK codecs and multibytecodec.c now define the Py_BUILD_CORE_MODULE macro.
* gh-105156: Cleanup usage of old Py_UNICODE type (#105158)Victor Stinner2023-06-011-5/+3
| | | | | | | | | | | | * refcounts.dat: * Remove Py_UNICODE functions. * Replace Py_UNICODE argument type with wchar_t. * _PyUnicode_ToLowercase(), _PyUnicode_ToUppercase(), _PyUnicode_ToTitlecase() are no longer deprecated in comments. It's no longer needed since they now use Py_UCS4 type, rather than the deprecated Py_UNICODE type. * gdb: Remove unused char_width() method.
* gh-105156: Deprecate the old Py_UNICODE type in C API (#105157)Victor Stinner2023-06-011-2/+2
| | | | | | | | Deprecate the old Py_UNICODE and PY_UNICODE_TYPE types in the C API: use wchar_t instead. Replace Py_UNICODE with wchar_t in multiple C files. Co-authored-by: Inada Naoki <songofacandy@gmail.com>
* gh-84436: Implement Immortal Objects (gh-19474)Eddie Elizondo2023-04-221-4/+13
| | | | | | | | | This is the implementation of PEP683 Motivation: The PR introduces the ability to immortalize instances in CPython which bypasses reference counting. Tagging objects as immortal allows up to skip certain operations when we know that the object will be around for the entire execution of the runtime. Note that this by itself will bring a performance regression to the runtime due to the extra reference count checks. However, this brings the ability of having truly immutable objects that are useful in other contexts such as immutable data sharing between sub-interpreters.
* PyUnicode_KIND() uses _Py_RVALUE() (#100060)Victor Stinner2022-12-061-1/+1
| | | | The PyUnicode_KIND() macro is modified to use _Py_RVALUE(), so it can no longer be used as a l-value.
* gh-89653: PEP 670: Convert macros to functions (#99843)Victor Stinner2022-11-281-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert macros to static inline functions to avoid macro pitfalls, like duplication of side effects: * DK_ENTRIES() * DK_UNICODE_ENTRIES() * PyCode_GetNumFree() * PyFloat_AS_DOUBLE() * PyInstanceMethod_GET_FUNCTION() * PyMemoryView_GET_BASE() * PyMemoryView_GET_BUFFER() * PyMethod_GET_FUNCTION() * PyMethod_GET_SELF() * PySet_GET_SIZE() * _PyHeapType_GET_MEMBERS() Changes: * PyCode_GetNumFree() casts PyCode_GetNumFree.co_nfreevars from int to Py_ssize_t to be future proof, and because Py_ssize_t is commonly used in the C API. * PyCode_GetNumFree() doesn't cast its argument: the replaced macro already required the exact type PyCodeObject*. * Add assertions in some functions using "CAST" macros to check the arguments type when Python is built with assertions (debug build). * Remove an outdated comment in unicodeobject.h.
* gh-99706: unicodeobject: Fix padding in `PyASCIIObject.state` (GH-99707)David Hewitt2022-11-241-1/+1
|
* gh-98783: Fix crashes when `str` subclasses are used in `_PyUnicode_Equal` ↵Nikita Sobolev2022-10-301-1/+1
| | | | (#98806)
* gh-89653: PEP 670: Macros always cast arguments in cpython/ (#93766)Victor Stinner2022-06-131-46/+14
| | | | Header files in the Include/cpython/ are only included if the Py_LIMITED_API macro is not defined.
* gh-92536: Mark PyUnicode_READY() argument as unused (#93011)Wenzel Jakob2022-05-231-2/+2
|
* gh-89653: Add assertions on PyUnicode_READ() index (#92883)Victor Stinner2022-05-171-1/+9
| | | | Add assertions on the index argument of PyUnicode_READ(), PyUnicode_READ_CHAR() and PyUnicode_WRITE() functions.
* gh-89653: PEP 670: Fix PyUnicode_READ() cast (#92872)Victor Stinner2022-05-171-1/+2
| | | | _Py_CAST() cannot be used with a constant type: use _Py_STATIC_CAST() instead.
* gh-92781: Avoid mixing declarations and code in C API (#92783)Victor Stinner2022-05-151-2/+5
| | | | | Avoid mixing declarations and code in the C API to fix the compiler warning: "ISO C90 forbids mixed declarations and code" [-Werror=declaration-after-statement].
* gh-85858: Remove PyUnicode_InternImmortal() function (#92579)Victor Stinner2022-05-131-10/+3
| | | | | | | | | | | | | | | | | Remove the PyUnicode_InternImmortal() function and the SSTATE_INTERNED_IMMORTAL macro. The PyUnicode_InternImmortal() function is still exported in the stable ABI. The function is removed from the API. PyASCIIObject.state.interned size is now a single bit, rather than 2 bits. Keep SSTATE_NOT_INTERNED and SSTATE_INTERNED_MORTAL macros for backward compatibility, but no longer use them internally since the interned member is now a single bit and so can only have two values (interned or not interned). Update stats of _PyUnicode_ClearInterned().
* gh-89653: Use int type for Unicode kind (#92704)Victor Stinner2022-05-131-2/+2
| | | | Use the same type that PyUnicode_FromKindAndData() kind parameter type (public C API): int.
* gh-89653: PEP 670: Convert PyUnicode_KIND() macro to function (#92705)Victor Stinner2022-05-131-1/+15
| | | | | | | | In the limited C API version 3.12, PyUnicode_KIND() is now implemented as a static inline function. Keep the macro for the regular C API and for the limited C API version 3.11 and older to prevent introducing new compiler warnings. Update _decimal.c and stringlib/eq.h for PyUnicode_KIND().
* gh-92536: PEP 623: Remove wstr and legacy APIs from Unicode (GH-92537)Inada Naoki2022-05-121-219/+14
|
* gh-89653: PEP 670: unicodeobject.h uses _Py_CAST() (#92696)Victor Stinner2022-05-111-6/+12
| | | | | | | | | | Use _Py_CAST() and _Py_STATIC_CAST() in macros wrapping static inline functions of unicodeobject.h. Change also the kind type from unsigned int to int: same parameter type than PyUnicode_FromKindAndData(). The limited API version 3.11 no longer casts arguments to expected types.
* gh-89653: Add assertions to unicodeobject.h functions (#92692)Victor Stinner2022-05-111-2/+8
|
* gh-89653: PEP 670: Convert unicodeobject.h macros to functions (#92648)Victor Stinner2022-05-111-48/+66
| | | | | | | | | | | | | | | | | | | | Convert the following Unicode macros to static inline functions. Surrogate functions: * Py_UNICODE_IS_SURROGATE() * Py_UNICODE_IS_HIGH_SURROGATE() * Py_UNICODE_IS_LOW_SURROGATE() * Py_UNICODE_HIGH_SURROGATE() * Py_UNICODE_LOW_SURROGATE() * Py_UNICODE_JOIN_SURROGATES() "Is" functions: * Py_UNICODE_ISALNUM() * Py_UNICODE_ISSPACE() In the implementation of these functions, the character type is now well defined to Py_UCS4.
* gh-91321: Add _Py_NULL macro (#92253)Victor Stinner2022-05-031-3/+3
| | | | | | | | Fix C++ compiler warnings: "zero as null pointer constant" (clang -Wzero-as-null-pointer-constant). * Add the _Py_NULL macro used by static inline functions to use nullptr in C++. * Replace NULL with nullptr in _testcppext.cpp.
* gh-91320: Fix more old-style cast warnings in C++ (#92247)Victor Stinner2022-05-031-15/+16
| | | | | | | Use _Py_CAST(), _Py_STATIC_CAST() and _PyASCIIObject_CAST() in static inline functions to fix C++ compiler warnings: "use of old-style cast" (clang -Wold-style-cast). test_cppext now builds the C++ test extension with -Wold-style-cast.
* gh-92135: Rename _Py_reinterpret_cast() to _Py_CAST() (#92230)Victor Stinner2022-05-031-3/+3
| | | Rename also _Py_static_cast() to _Py_STATIC_CAST().
* gh-91320: Add _Py_reinterpret_cast() macro (#91959)Victor Stinner2022-04-271-3/+6
| | | | | | | | | | | | | | Fix C++ compiler warnings about "old-style cast" (g++ -Wold-style-cast) in the Python C API. Use C++ reinterpret_cast<> and static_cast<> casts when the Python C API is used in C++. Example of fixed warning: Include/object.h:107:43: error: use of old-style cast to ‘PyObject*’ {aka ‘struct _object*’} [-Werror=old-style-cast] #define _PyObject_CAST(op) ((PyObject*)(op)) Add _Py_reinterpret_cast() and _Py_static_cast() macros.