summaryrefslogtreecommitdiffstats
path: root/Include/internal/pycore_unicodeobject.h
Commit message (Collapse)AuthorAgeFilesLines
* [3.13] gh-133767: Fix use-after-free in the unicode-escape decoder with an ↵Serhiy Storchaka2025-05-201-0/+13
| | | | | | | | | | | | | | | | | error handler (GH-129648) (GH-133944) If the error handler is used, a new bytes object is created to set as the object attribute of UnicodeDecodeError, and that bytes object then replaces the original data. A pointer to the decoded data will became invalid after destroying that temporary bytes object. So we need other way to return the first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal(). _PyBytes_DecodeEscape() does not have such issue, because it does not use the error handlers registry, but it should be changed for compatibility with _PyUnicode_DecodeUnicodeEscapeInternal(). (cherry picked from commit 9f69a58623bd01349a18ba0c7a9cb1dad6a51e8e) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
* [3.13] gh-113993: Allow interned strings to be mortal, and fix related ↵Petr Viktorin2024-06-241-1/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | issues (GH-120520) (GH-120945) * Add an InternalDocs file describing how interning should work and how to use it. * Add internal functions to *explicitly* request what kind of interning is done: - `_PyUnicode_InternMortal` - `_PyUnicode_InternImmortal` - `_PyUnicode_InternStatic` * Switch uses of `PyUnicode_InternInPlace` to those. * Disallow using `_Py_SetImmortal` on strings directly. You should use `_PyUnicode_InternImmortal` instead: - Strings should be interned before immortalization, otherwise you're possibly interning a immortalizing copy. - `_Py_SetImmortal` doesn't handle the `SSTATE_INTERNED_MORTAL` to `SSTATE_INTERNED_IMMORTAL` update, and those flags can't be changed in backports, as they are now part of public API and version-specific ABI. * Add private `_only_immortal` argument for `sys.getunicodeinternedsize`, used in refleak test machinery. * Make sure the statically allocated string singletons are unique. This means these sets are now disjoint: - `_Py_ID` - `_Py_STR` (including the empty string) - one-character latin-1 singletons Now, when you intern a singleton, that exact singleton will be interned. * Add a `_Py_LATIN1_CHR` macro, use it instead of `_Py_ID`/`_Py_STR` for one-character latin-1 singletons everywhere (including Clinic). * Intern `_Py_STR` singletons at startup. * For free-threaded builds, intern `_Py_LATIN1_CHR` singletons at startup. * Beef up the tests. Cover internal details (marked with `@cpython_only`). * Add lots of assertions Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
* GH-115802: JIT "small" code for Windows (GH-115964)Brandt Bucher2024-02-291-2/+2
|
* gh-111924: Use PyMutex for Runtime-global Locks. (gh-112207)Sam Gross2023-12-071-1/+2
| | | | | This replaces some usages of PyThread_type_lock with PyMutex, which does not require memory allocation to initialize. This simplifies some of the runtime initialization and is also one step towards avoiding changing the default raw memory allocator during initialize/finalization, which can be non-thread-safe in some circumstances.
* gh-112026: Restore removed private C API (#112115)Victor Stinner2023-11-151-121/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Restore removed private C API functions, macros and structures which have no simple replacement for now: * _PyDict_GetItem_KnownHash() * _PyDict_NewPresized() * _PyHASH_BITS * _PyHASH_IMAG * _PyHASH_INF * _PyHASH_MODULUS * _PyHASH_MULTIPLIER * _PyLong_Copy() * _PyLong_FromDigits() * _PyLong_New() * _PyLong_Sign() * _PyObject_CallMethodId() * _PyObject_CallMethodNoArgs() * _PyObject_CallMethodOneArg() * _PyObject_CallOneArg() * _PyObject_EXTRA_INIT * _PyObject_FastCallDict() * _PyObject_GetAttrId() * _PyObject_Vectorcall() * _PyObject_VectorcallMethod() * _PyStack_AsDict() * _PyThread_CurrentFrames() * _PyUnicodeWriter structure * _PyUnicodeWriter_Dealloc() * _PyUnicodeWriter_Finish() * _PyUnicodeWriter_Init() * _PyUnicodeWriter_Prepare() * _PyUnicodeWriter_PrepareKind() * _PyUnicodeWriter_WriteASCIIString() * _PyUnicodeWriter_WriteChar() * _PyUnicodeWriter_WriteLatin1String() * _PyUnicodeWriter_WriteStr() * _PyUnicodeWriter_WriteSubstring() * _PyUnicode_AsString() * _PyUnicode_FromId() * _PyVectorcall_Function() * _Py_HashDouble() * _Py_HashPointer() * _Py_IDENTIFIER() * _Py_c_abs() * _Py_c_diff() * _Py_c_neg() * _Py_c_pow() * _Py_c_prod() * _Py_c_quot() * _Py_c_sum() * _Py_static_string() * _Py_static_string_init()
* Add private _PyUnicode_AsUTF8NoNUL() function (GH-111957)Serhiy Storchaka2023-11-101-0/+4
| | | | Like PyUnicode_AsUTF8(), but check for embedded null characters.
* gh-106320: Remove private _Py_Identifier API (#108593)Victor Stinner2023-08-291-0/+1
| | | | | | | | | | | | | | | Remove the private _Py_Identifier type and related private functions from the public C API: * _PyObject_GetAttrId() * _PyObject_LookupSpecialId() * _PyObject_SetAttrId() * _PyType_LookupId() * _Py_IDENTIFIER() * _Py_static_string() * _Py_static_string_init() Move them to the internal C API: add a new pycore_identifier.h header file. No longer export these functions.
* gh-107211: No longer export internal functions (7) (#108425)Victor Stinner2023-08-241-34/+45
| | | | | | | | | No longer export _PyUnicode_FromId() internal C API function. Change comment style to "// comment" and add comment explaining why other functions have to be exported. Update Tools/build/generate_token.py to update Include/internal/pycore_token.h comments.
* gh-107211: No longer export internal functions (3) (#107215)Victor Stinner2023-07-251-3/+5
| | | | | | | | | | | | | | | | | | No longer export these 14 internal C API functions: * _PySys_Audit() * _PySys_SetAttr() * _PyTraceBack_FromFrame() * _PyTraceBack_Print_Indented() * _PyUnicode_FormatAdvancedWriter() * _PyUnicode_ScanIdentifier() * _PyWarnings_Init() * _Py_DumpASCII() * _Py_DumpDecimal() * _Py_DumpHexadecimal() * _Py_DumpTraceback() * _Py_DumpTracebackThreads() * _Py_WriteIndent() * _Py_WriteIndentedMargin()
* gh-106320: Remove private _PyUnicode C API (#107185)Victor Stinner2023-07-241-0/+13
| | | | | | | | | | | | | | | Move private _PyUnicode functions to the internal C API (pycore_unicodeobject.h): * _PyUnicode_IsCaseIgnorable() * _PyUnicode_IsCased() * _PyUnicode_IsXidContinue() * _PyUnicode_IsXidStart() * _PyUnicode_ToFoldedFull() * _PyUnicode_ToLowerFull() * _PyUnicode_ToTitleFull() * _PyUnicode_ToUpperFull() No longer export these functions.
* gh-106320: Remove _PyUnicode_TransformDecimalAndSpaceToASCII() (#106398)Victor Stinner2023-07-041-2/+37
| | | | | | | | Remove private _PyUnicode_TransformDecimalAndSpaceToASCII() and other private _PyUnicode C API functions: move them to the internal C API (pycore_unicodeobject.h). No longer most of these functions. Replace _testcapi.unicode_transformdecimalandspacetoascii() with _testinternal._PyUnicode_TransformDecimalAndSpaceToASCII().
* gh-106320: Remove private _PyUnicode codecs C API functions (#106385)Victor Stinner2023-07-041-0/+100
| | | | | Remove private _PyUnicode codecs C API functions: move them to the internal C API (pycore_unicodeobject.h). No longer export most of these functions.
* gh-106320: Remove more private _PyUnicode C API functions (#106382)Victor Stinner2023-07-031-1/+69
| | | | | | Remove more private _PyUnicode C API functions: move them to the internal C API (pycore_unicodeobject.h). No longer export most pycore_unicodeobject.h functions.
* gh-106320: Move _PyUnicodeWriter to the internal C API (#106342)Victor Stinner2023-07-031-2/+143
| | | | | | Move also _PyUnicode_FormatAdvancedWriter(). CJK codecs and multibytecodec.c now define the Py_BUILD_CORE_MODULE macro.
* gh-84436: Implement Immortal Objects (gh-19474)Eddie Elizondo2023-04-221-0/+1
| | | | | | | | | This is the implementation of PEP683 Motivation: The PR introduces the ability to immortalize instances in CPython which bypasses reference counting. Tagging objects as immortal allows up to skip certain operations when we know that the object will be around for the entire execution of the runtime. Note that this by itself will bring a performance regression to the runtime due to the extra reference count checks. However, this brings the ability of having truly immutable objects that are useful in other contexts such as immutable data sharing between sub-interpreters.
* gh-100227: Move the Dict of Interned Strings to PyInterpreterState (gh-102339)Eric Snow2023-03-281-0/+1
| | | | | We can revisit the options for keeping it global later, if desired. For now the approach seems quite complex, so we've gone with the simpler isolation solution in the meantime. https://github.com/python/cpython/issues/100227
* gh-100227: Revert gh-102925 "gh-100227: Make the Global Interned Dict Safe ↵Eric Snow2023-03-271-1/+0
| | | | | | | for Isolated Interpreters" (gh-103063) This reverts commit 87be8d9. This approach to keeping the interned strings safe is turning out to be too complex for my taste (due to obmalloc isolation). For now I'm going with the simpler solution, making the dict per-interpreter. We can revisit that later if we want a sharing solution.
* gh-100227: Make the Global Interned Dict Safe for Isolated Interpreters ↵Eric Snow2023-03-231-0/+1
| | | | | | | | | (gh-102925) This is effectively two changes. The first (the bulk of the change) is where we add _Py_AddToGlobalDict() (and _PyRuntime.cached_objects.main_tstate, etc.). The second (much smaller) change is where we update PyUnicode_InternInPlace() to use _Py_AddToGlobalDict() instead of calling PyDict_SetDefault() directly. Basically, _Py_AddToGlobalDict() is a wrapper around PyDict_SetDefault() that should be used whenever we need to add a value to a runtime-global dict object (in the few cases where we are leaving the container global rather than moving it to PyInterpreterState, e.g. the interned strings dict). _Py_AddToGlobalDict() does all the necessary work to make sure the target global dict is shared safely between isolated interpreters. This is especially important as we move the obmalloc state to each interpreter (gh-101660), as well as, potentially, the GIL (PEP 684). https://github.com/python/cpython/issues/100227
* gh-81057: Move More Globals to _PyRuntimeState (gh-100092)Eric Snow2022-12-071-0/+3
| | | https://github.com/python/cpython/issues/81057
* gh-81057: Move More Globals in Core Code to _PyRuntimeState (gh-99516)Eric Snow2022-11-161-0/+4
| | | https://github.com/python/cpython/issues/81057
* GH-96458: Statically initialize utf8 representation of static strings (#96481)Kumar Aditya2022-09-031-1/+0
|
* gh-90667: Add specializations of Py_DECREF when types are known (GH-30872)Dennis Sweeney2022-04-191-0/+1
|
* gh-91576: Speed up iteration of strings (#91574)Kumar Aditya2022-04-181-0/+1
|
* bpo-47084: Clear Unicode cached representations on finalization (GH-32032)Jeremy Kloth2022-03-221-0/+1
|
* bpo-46881: Statically allocate and initialize the latin1 characters. (GH-31616)Kumar Aditya2022-03-091-3/+0
|
* bpo-46541: Replace core use of _Py_IDENTIFIER() with statically initialized ↵Eric Snow2022-02-081-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | global objects. (gh-30928) We're no longer using _Py_IDENTIFIER() (or _Py_static_string()) in any core CPython code. It is still used in a number of non-builtin stdlib modules. The replacement is: PyUnicodeObject (not pointer) fields under _PyRuntimeState, statically initialized as part of _PyRuntime. A new _Py_GET_GLOBAL_IDENTIFIER() macro facilitates lookup of the fields (along with _Py_GET_GLOBAL_STRING() for non-identifier strings). https://bugs.python.org/issue46541#msg411799 explains the rationale for this change. The core of the change is in: * (new) Include/internal/pycore_global_strings.h - the declarations for the global strings, along with the macros * Include/internal/pycore_runtime_init.h - added the static initializers for the global strings * Include/internal/pycore_global_objects.h - where the struct in pycore_global_strings.h is hooked into _PyRuntimeState * Tools/scripts/generate_global_objects.py - added generation of the global string declarations and static initializers I've also added a --check flag to generate_global_objects.py (along with make check-global-objects) to check for unused global strings. That check is added to the PR CI config. The remainder of this change updates the core code to use _Py_GET_GLOBAL_IDENTIFIER() instead of _Py_IDENTIFIER() and the related _Py*Id functions (likewise for _Py_GET_GLOBAL_STRING() instead of _Py_static_string()). This includes adding a few functions where there wasn't already an alternative to _Py*Id(), replacing the _Py_Identifier * parameter with PyObject *. The following are not changed (yet): * stop using _Py_IDENTIFIER() in the stdlib modules * (maybe) get rid of _Py_IDENTIFIER(), etc. entirely -- this may not be doable as at least one package on PyPI using this (private) API * (maybe) intern the strings during runtime init https://bugs.python.org/issue46541
* bpo-46417: Clear Unicode static types at exit (GH-30806)Victor Stinner2022-01-221-0/+1
| | | | | | | | | | | Add _PyUnicode_FiniTypes() function, called by finalize_interp_types(). It clears these static types: * EncodingMapType * PyFieldNameIter_Type * PyFormatterIter_Type _PyStaticType_Dealloc() now does nothing if tp_subclasses is not NULL.
* bpo-46303: Move fileutils.h private functions to internal C API (GH-30484)Victor Stinner2022-01-111-0/+2
| | | | | | | | | | Move almost all private functions of Include/cpython/fileutils.h to the internal C API Include/internal/pycore_fileutils.h. Only keep _Py_fopen_obj() in Include/cpython/fileutils.h, since it's used by _testcapi which must not use the internal C API. Move EncodeLocaleEx() and DecodeLocaleEx() functions from _testcapi to _testinternalcapi, since the C API moved to the internal C API.
* bpo-46006: Revert "bpo-40521: Per-interpreter interned strings (GH-20085)" ↵Victor Stinner2022-01-061-11/+1
| | | | | | | | | | | (GH-30422) This reverts commit ea251806b8dffff11b30d2182af1e589caf88acf. Keep "assert(interned == NULL);" in _PyUnicode_Fini(), but only for the main interpreter. Keep _PyUnicode_ClearInterned() changes avoiding the creation of a temporary Python list object.
* bpo-46008: Make runtime-global object/type lifecycle functions and state ↵Eric Snow2021-12-091-0/+71
consistent. (gh-29998) This change is strictly renames and moving code around. It helps in the following ways: * ensures type-related init functions focus strictly on one of the three aspects (state, objects, types) * passes in PyInterpreterState * to all those functions, simplifying work on moving types/objects/state to the interpreter * consistent naming conventions help make what's going on more clear * keeping API related to a type in the corresponding header file makes it more obvious where to look for it https://bugs.python.org/issue46008