summaryrefslogtreecommitdiffstats
path: root/Objects/unicodeobject.c
Commit message (Collapse)AuthorAgeFilesLines
* [3.13] gh-113993: Allow interned strings to be mortal, and fix related ↵Petr Viktorin2024-06-241-115/+429
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | issues (GH-120520) (GH-120945) * Add an InternalDocs file describing how interning should work and how to use it. * Add internal functions to *explicitly* request what kind of interning is done: - `_PyUnicode_InternMortal` - `_PyUnicode_InternImmortal` - `_PyUnicode_InternStatic` * Switch uses of `PyUnicode_InternInPlace` to those. * Disallow using `_Py_SetImmortal` on strings directly. You should use `_PyUnicode_InternImmortal` instead: - Strings should be interned before immortalization, otherwise you're possibly interning a immortalizing copy. - `_Py_SetImmortal` doesn't handle the `SSTATE_INTERNED_MORTAL` to `SSTATE_INTERNED_IMMORTAL` update, and those flags can't be changed in backports, as they are now part of public API and version-specific ABI. * Add private `_only_immortal` argument for `sys.getunicodeinternedsize`, used in refleak test machinery. * Make sure the statically allocated string singletons are unique. This means these sets are now disjoint: - `_Py_ID` - `_Py_STR` (including the empty string) - one-character latin-1 singletons Now, when you intern a singleton, that exact singleton will be interned. * Add a `_Py_LATIN1_CHR` macro, use it instead of `_Py_ID`/`_Py_STR` for one-character latin-1 singletons everywhere (including Clinic). * Intern `_Py_STR` singletons at startup. * For free-threaded builds, intern `_Py_LATIN1_CHR` singletons at startup. * Beef up the tests. Cover internal details (marked with `@cpython_only`). * Add lots of assertions Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
* [3.13] gh-117398: Use Per-Interpreter State for the _datetime Static Types ↵Miss Islington (bot)2024-06-031-3/+3
| | | | | | | | | | | (gh-120009) We make use of the same mechanism that we use for the static builtin types. This required a few tweaks. This change is the final piece needed to make _datetime support multiple interpreters. I've updated the module slot accordingly. (cherry picked from commit 105f22ea46ac16866e6df18ebae2a8ba422b7f45, AKA gh-119929) Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
* [3.13] gh-117657: Fix data races report by TSAN unicode-hash (gh-119907) ↵Miss Islington (bot)2024-06-031-8/+11
| | | | | | | | (gh-119963) gh-117657: Fix data races report by TSAN unicode-hash (gh-119907) (cherry picked from commit 0594a27e5f1d87d59fa8a761dd8ca9df4e42816d) Co-authored-by: Donghee Na <donghee.na@python.org>
* [3.13] gh-111999: Fix the signature of str.format_map() (GH-119540) (#119543)Miss Islington (bot)2024-05-251-1/+1
| | | | | (cherry picked from commit 08e65430aafa1047029e6f132a5f748c415bda14) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
* [3.13] gh-119247: Add macros to use PySequence_Fast safely in free-threaded ↵Miss Islington (bot)2024-05-221-3/+5
| | | | | | | | | | | build (GH-119315) (#119419) Add `Py_BEGIN_CRITICAL_SECTION_SEQUENCE_FAST` and `Py_END_CRITICAL_SECTION_SEQUENCE_FAST` macros and update `str.join` to use them. Also add a regression test that would crash reliably without this patch. (cherry picked from commit baf347d91643a83483bae110092750d39471e0c2) Co-authored-by: Josh {*()} Rosenberg <26495692+MojoVampire@users.noreply.github.com>
* gh-116322: Add Py_mod_gil module slot (#116882)Brett Simmers2024-05-031-0/+1
| | | | | | | | | | | | | | This PR adds the ability to enable the GIL if it was disabled at interpreter startup, and modifies the multi-phase module initialization path to enable the GIL when loading a module, unless that module's spec includes a slot indicating it can run safely without the GIL. PEP 703 called the constant for the slot `Py_mod_gil_not_used`; I went with `Py_MOD_GIL_NOT_USED` for consistency with gh-104148. A warning will be issued up to once per interpreter for the first GIL-using module that is loaded. If `-v` is given, a shorter message will be printed to stderr every time a GIL-using module is loaded (including the first one that issues a warning).
* gh-116738: Make `_codecs` module thread-safe (#117530)Brett Simmers2024-05-021-1/+5
| | | | | | | | | | | | | | | The module itself is a thin wrapper around calls to functions in `Python/codecs.c`, so that's where the meaningful changes happened: - Move codecs-related state that lives on `PyInterpreterState` to a struct declared in `pycore_codecs.h`. - In free-threaded builds, add a mutex to `codecs_state` to synchronize operations on `search_path`. Because `search_path_mutex` is used as a normal mutex and not a critical section, we must be extremely careful with operations called while holding it. - The codec registry is explicitly initialized as part of `_PyUnicode_InitEncodings` to simplify thread-safety.
* gh-117709: Add vectorcall support for str() with positional-only arguments ↵Erlend E. Aasland2024-04-111-0/+51
| | | | | | | | (#117746) Fall back to tp_call() for cases when arguments are passed by name. Co-authored-by: Donghee Na <donghee.na@python.org> Co-authored-by: Victor Stinner <vstinner@python.org>
* gh-117642: Fix PEP 737 implementation (GH-117643)Serhiy Storchaka2024-04-081-4/+3
| | | | | * Fix implementation of %#T and %#N (they were implemented as %T# and %N#). * Restore tests removed in gh-116417.
* gh-117439: Make refleak checking thread-safe without the GIL (#117469)Sam Gross2024-04-081-1/+1
| | | | | This keeps track of the per-thread total reference count operations in PyThreadState in the free-threaded builds. The count is merged into the interpreter's total when the thread exits.
* gh-117431: Adapt str.find and friends to Argument Clinic (#117468)Erlend E. Aasland2024-04-031-209/+144
| | | | | | | | | | This change gives a significant speedup, as the METH_FASTCALL calling convention is now used. The following methods are adapted: - str.count - str.find - str.index - str.rfind - str.rindex
* gh-117431: Fix str.endswith docstring (#117499)Erlend E. Aasland2024-04-031-4/+12
| | | | | The first parameter is named 'suffix', not 'prefix'. Regression introduced by commit 444156ed
* gh-117431: Adapt str.startswith and str.endswith to Argument Clinic (#117466)Erlend E. Aasland2024-04-031-46/+42
| | | | This change gives a significant speedup, as the METH_FASTCALL calling convention is now used.
* gh-113964: Don't prevent new threads until all non-daemon threads exit (#116677)Sam Gross2024-03-191-1/+1
| | | | | | | | | | Starting in Python 3.12, we prevented calling fork() and starting new threads during interpreter finalization (shutdown). This has led to a number of regressions and flaky tests. We should not prevent starting new threads (or `fork()`) until all non-daemon threads exit and finalization starts in earnest. This changes the checks to use `_PyInterpreterState_GetFinalizing(interp)`, which is set immediately before terminating non-daemon threads.
* gh-111696, PEP 737: Add %T and %N to PyUnicode_FromFormat() (#116839)Victor Stinner2024-03-141-0/+58
|
* gh-112066: Use `PyDict_SetDefaultRef` in place of `PyDict_SetDefault`. (#112211)Sam Gross2024-02-071-5/+7
| | | | | This changes a number of internal usages of `PyDict_SetDefault` to use `PyDict_SetDefaultRef`. Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com>
* gh-114569: Use PyMem_* APIs for non-PyObjects in unicodeobject.c (#114690)Erlend E. Aasland2024-01-291-6/+6
|
* gh-111971: Make _PyUnicode_FromId thread-safe in --disable-gil (gh-113489)Donghee Na2023-12-261-3/+7
|
* gh-110383: Improve accuracy of str.split() and str.rsplit() docstrings (#113355)Erlend E. Aasland2023-12-211-2/+4
| | | | Clarify split direction in the docstring body, instead of in the 'maxsplit' param docstring.
* gh-111924: Use PyMutex for Runtime-global Locks. (gh-112207)Sam Gross2023-12-071-2/+2
| | | | | This replaces some usages of PyThread_type_lock with PyMutex, which does not require memory allocation to initialize. This simplifies some of the runtime initialization and is also one step towards avoiding changing the default raw memory allocator during initialize/finalization, which can be non-thread-safe in some circumstances.
* gh-111972: Make Unicode name C APIcapsule initialization thread-safe (#112249)Kirill Podoprigora2023-11-301-11/+21
|
* gh-111999: Add signatures and improve docstrings for builtins (GH-112000)Serhiy Storchaka2023-11-131-5/+7
|
* Add private _PyUnicode_AsUTF8NoNUL() function (GH-111957)Serhiy Storchaka2023-11-101-0/+12
| | | | Like PyUnicode_AsUTF8(), but check for embedded null characters.
* gh-111089: Revert PyUnicode_AsUTF8() changes (#111833)Victor Stinner2023-11-071-7/+1
| | | | | | | | | | | | | | | | | | | | | * Revert "gh-111089: Use PyUnicode_AsUTF8() in Argument Clinic (#111585)" This reverts commit d9b606b3d04fc56fb0bcc479d7d6c14562edb5e2. * Revert "gh-111089: Use PyUnicode_AsUTF8() in getargs.c (#111620)" This reverts commit cde1071b2a72e8261ca66053ef61431b7f3a81fd. * Revert "gh-111089: PyUnicode_AsUTF8() now raises on embedded NUL (#111091)" This reverts commit d731579bfb9a497cfb0076cb6b221058a20088fe. * Revert "gh-111089: Add PyUnicode_AsUTF8() to the limited C API (#111121)" This reverts commit d8f32be5b6a736dc2fc9dca3f1bf176c82fc9b44. * Revert "gh-111089: Use PyUnicode_AsUTF8() in sqlite3 (#111122)" This reverts commit 37e4e20eaa8f27ada926d49e5971fecf0477ad26.
* gh-110481: Implement biased reference counting (gh-110764)Sam Gross2023-10-301-1/+1
|
* gh-111089: PyUnicode_AsUTF8AndSize() sets size on error (#111106)Victor Stinner2023-10-201-1/+8
| | | | On error, PyUnicode_AsUTF8AndSize() now sets the size argument to -1, to avoid undefined value.
* gh-111089: PyUnicode_AsUTF8() now raises on embedded NUL (#111091)Victor Stinner2023-10-201-1/+7
| | | | | | | | | * PyUnicode_AsUTF8() now raises an exception if the string contains embedded null characters. * Update related C API tests (test_capi.test_unicode). * type_new_set_doc() uses PyUnicode_AsUTF8AndSize() to silently truncate doc containing null bytes. Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
* gh-110289: C API: Add PyUnicode_EqualToUTF8() and ↵Serhiy Storchaka2023-10-111-0/+76
| | | | PyUnicode_EqualToUTF8AndSize() functions (GH-110297)
* gh-110079: Remove extern "C" { ...} in C code (#110080)Victor Stinner2023-09-291-10/+0
|
* gh-109693: Remove pycore_atomic_funcs.h (#109694)Sam Gross2023-09-211-4/+3
| | | _PyUnicode_FromId() now uses pyatomic.h functions instead.
* gh-108915: Removes extra backslashes in str.split docstring (#109044)Daniel Weiss2023-09-071-2/+2
|
* gh-107913: Fix possible losses of OSError error codes (GH-107930)Serhiy Storchaka2023-08-261-1/+1
| | | | | | Functions like PyErr_SetFromErrno() and SetFromWindowsErr() should be called immediately after using the C API which sets errno or the Windows error code.
* gh-108444: Replace _PyLong_AsInt() with PyLong_AsInt() (#108459)Victor Stinner2023-08-241-1/+1
| | | | | | Change generated by the command: sed -i -e 's!_PyLong_AsInt!PyLong_AsInt!g' \ $(find -name "*.c" -o -name "*.h")
* gh-106320: Remove private _PyEval function (#108433)Victor Stinner2023-08-241-0/+2
| | | | | | | | | | | | | | Move private _PyEval functions to the internal C API (pycore_ceval.h): * _PyEval_GetBuiltin() * _PyEval_GetBuiltinId() * _PyEval_GetSwitchInterval() * _PyEval_MakePendingCalls() * _PyEval_SetProfile() * _PyEval_SetSwitchInterval() * _PyEval_SetTrace() No longer export most of these functions.
* GH-84436: Skip refcounting for known immortals (GH-107605)Brandt Bucher2023-08-041-21/+12
|
* gh-106931: Intern Statically Allocated Strings Globally (gh-107272)Eric Snow2023-07-271-3/+69
| | | | | We tried this before with a dict and for all interned strings. That ran into problems due to interpreter isolation. However, exclusively using a per-interpreter cache caused some inconsistency that can eliminate the benefit of interning. Here we circle back to using a global cache, but only for statically allocated strings. We also use a more-basic _Py_hashtable_t for that global cache instead of a dict. Ideally we would only have the global cache, but the optional isolation of each interpreter's allocator means that a non-static string object must not outlive its interpreter. Thus we would have to store a copy of each such interned string in the global cache, tied to the main interpreter.
* gh-105699: Fix a Crasher Related to a Deprecated Global Variable (gh-106923)Eric Snow2023-07-211-4/+7
| | | There was a slight race in _Py_ClearFileSystemEncoding() (when called from _Py_SetFileSystemEncoding()), between freeing the value and setting the variable to NULL, which occasionally caused crashes when multiple isolated interpreters were used. (Notably, I saw at least 10 different, seemingly unrelated spooky-action-at-a-distance, ways this crashed. Yay, free threading!) We avoid the problem by only setting the global variables with the main interpreter (i.e. runtime init).
* gh-105699: Fix an Interned Strings Crasher (gh-106930)Eric Snow2023-07-211-1/+12
| | | | | | | A static (process-global) str object must only have its "interned" state cleared when no longer interned in any interpreters. They are the only ones that can be shared by interpreters so we don't have to worry about any other str objects. We trigger clearing the state with the main interpreter, since no other interpreters may exist at that point and _PyUnicode_ClearInterned() is only called during interpreter finalization. We do not address here the fact that a string will only be interned in the first interpreter that interns it. In any subsequent interpreters str.state.interned is already set so _PyUnicode_InternInPlace() will skip it. That needs to be addressed separately from fixing the crasher.
* gh-106487: Allow the 'count' argument of `str.replace` to be a keyword (#106488)Hugo van Kemenade2023-07-101-2/+2
|
* gh-106320: Remove private _PyErr C API functions (#106356)Victor Stinner2023-07-031-0/+1
| | | | Remove private _PyErr C API functions: move them to the internal C API (pycore_pyerrors.h).
* gh-104922: remove PY_SSIZE_T_CLEAN (#106315)Inada Naoki2023-07-021-1/+0
|
* gh-106320: Remove _PyInterpreterState_Get() alias (#106321)Victor Stinner2023-07-011-1/+1
| | | | Replace calls to the (removed) slow _PyInterpreterState_Get() with fast inlined _PyInterpreterState_GET() function.
* Remove private _PyCodec_Lookup() function (#106269)Victor Stinner2023-06-301-1/+2
| | | | | | | | | | | | | | | Remove the following private functions of the C API: * _PyCodecInfo_GetIncrementalDecoder() * _PyCodecInfo_GetIncrementalEncoder() * _PyCodec_DecodeText() * _PyCodec_EncodeText() * _PyCodec_Forget() * _PyCodec_Lookup() * _PyCodec_LookupTextEncoding() Move these functions to a new pycore_codecs.h internal header file. These functions are no longer exported.
* gh-105375: Improve error handling in PyUnicode_BuildEncodingMap() (#105491)Erlend E. Aasland2023-06-111-12/+17
| | | Bail on first error to prevent exceptions from possibly being overwritten.
* gh-105156: Deprecate the old Py_UNICODE type in C API (#105157)Victor Stinner2023-06-011-4/+4
| | | | | | | | Deprecate the old Py_UNICODE and PY_UNICODE_TYPE types in the C API: use wchar_t instead. Replace Py_UNICODE with wchar_t in multiple C files. Co-authored-by: Inada Naoki <songofacandy@gmail.com>
* Fix compiler warning in unicodeobject.c (#105050)Inada Naoki2023-05-291-1/+1
|
* gh-98836: Extend PyUnicode_FromFormat() (GH-98838)Serhiy Storchaka2023-05-211-108/+218
| | | | | | | | | * Support for conversion specifiers o (octal) and X (uppercase hexadecimal). * Support for length modifiers j (intmax_t) and t (ptrdiff_t). * Length modifiers are now applied to all integer conversions. * Support for wchar_t C strings (%ls and %lV). * Support for variable width and precision (*). * Support for flag - (left alignment).
* gh-104018: remove unused format "z" handling in string formatfloat() (#104107)John Belmonte2023-05-071-2/+0
| | | This is a cleanup overlooked in PR #104033.
* gh-99113: Add Py_MOD_PER_INTERPRETER_GIL_SUPPORTED (gh-104205)Eric Snow2023-05-051-0/+6
| | | Here we are doing no more than adding the value for Py_mod_multiple_interpreters and using it for stdlib modules. We will start checking for it in gh-104206 (once PyInterpreterState.ceval.own_gil is added in gh-104204).
* gh-94673: Properly Initialize and Finalize Static Builtin Types for Each ↵Eric Snow2023-05-021-10/+6
| | | | | Interpreter (gh-104072) Until now, we haven't been initializing nor finalizing the per-interpreter state properly.