| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
| |
gh-58124: Avoid CP_UTF8 in UnicodeDecodeError (GH-137415)
Fix name of the Python encoding in Unicode errors of the code page
codec: use "cp65000" and "cp65001" instead of "CP_UTF7" and "CP_UTF8"
which are not valid Python code names.
(cherry picked from commit ce1b747ff68754635b7b12870dfc527184ee3b39)
Co-authored-by: Victor Stinner <vstinner@python.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
error handler (GH-129648) (GH-133944)
If the error handler is used, a new bytes object is created to set as
the object attribute of UnicodeDecodeError, and that bytes object then
replaces the original data. A pointer to the decoded data will became invalid
after destroying that temporary bytes object. So we need other way to return
the first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal().
_PyBytes_DecodeEscape() does not have such issue, because it does not
use the error handlers registry, but it should be changed for compatibility
with _PyUnicode_DecodeUnicodeEscapeInternal().
(cherry picked from commit 9f69a58623bd01349a18ba0c7a9cb1dad6a51e8e)
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(gh-133039) (gh-133126)
* gh-132070: Use _PyObject_IsUniquelyReferenced in unicodeobject (gh-133039)
---------
(cherry picked from commit 75cbb8d89e7e92ccaba5c615c72459f241dca8b1)
Co-authored-by: Donghee Na <donghee.na@python.org>
Co-authored-by: Kumar Aditya <kumaraditya@python.org>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
* Add _PyObject_IsUniquelyReferenced
---------
Co-authored-by: Kumar Aditya <kumaraditya@python.org>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(GH-130127)
We had the definition of what makes a character "printable" documented in three places, giving two different definitions.
The definition in the comment on `_PyUnicode_IsPrintable` was inverted; correct that.
With that correction, the two definitions turn out to be equivalent -- but to confirm that, you have to go look up, or happen to know, that those are the only five "Other" categories and only three "Separator" categories in the Unicode character database. That makes it hard for the reader to tell whether they really are the same, or if there's some subtle difference in the intended semantics.
Fix that by cutting the C API docs' and the C comment's copies of the subtle details, in favor of referring to the Python-level docs. That ensures it's explicit that these are all meant to agree, and also lets us concentrate improvements to the wording in one place.
Speaking of which, borrow some ideas from the C comment, along with other tweaks, to hopefully add a bit more clarity to that one newly-centralized copy in the docs.
Also add a thorough test that the implementation agrees with this definition.
Author: Greg Price <gnprice@gmail.com>
Co-authored-by: Greg Price <gnprice@gmail.com>
(cherry picked from commit 3402e133ef26736296c07992266a82b181a5d532)
|
| |
|
|
|
|
|
|
| |
`Objects/unicodeobject::_copy_characters` (GH-127876) (#128458)
gh-127903: Fix a crash on debug builds when calling `Objects/unicodeobject::_copy_characters`` (GH-127876)
(cherry picked from commit 46cb6340d7bad955edfc0a20f6a52dabc03b0932)
Co-authored-by: Alexander Shadchin <shadchin@yandex-team.com>
|
| |
|
|
| |
(#128021) (#128417)
|
| |
|
|
|
| |
(#127823)
(cherry picked from commit 30aeb00d367d0cc9e5a7603371636cddea09f1c0)
|
| |
|
|
|
|
|
| |
Fix Unicode encode_wstr_utf8() (#127420)
Raise RuntimeError instead of RuntimeWarning.
Co-authored-by: Victor Stinner <vstinner@python.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(gh-124865) (gh-125709) (GH-125204)
* gh-116510: Fix a Crash Due to Shared Immortal Interned Strings (gh-124865)
Fix a crash caused by immortal interned strings being shared between
sub-interpreters that use basic single-phase init. In that case, the string
can be used by an interpreter that outlives the interpreter that created and
interned it. For interpreters that share obmalloc state, also share the
interned dict with the main interpreter.
This is an un-revert of gh-124646 that then addresses the Py_TRACE_REFS
failures identified by gh-124785.
(cherry picked from commit f2cb39947093feda3ff85b8dc820922cc5e5f954)
Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
* [3.13] gh-125286: Share the Main Refchain With Legacy Interpreters (gh-125709)
They used to be shared, before 3.12. Returning to sharing them resolves a failure on Py_TRACE_REFS builds.
---------
Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
| |
interned strings (gh-124646)" (gh-124807) (#124812)
gh-124785: Revert "gh-116510: Fix crash due to shared immortal interned strings (gh-124646)" (gh-124807)
Revert "gh-116510: Fix crash due to shared immortal interned strings. (gh-124646)"
This reverts commit 98b2ed7e239c807f379cd2bf864f372d79064aac.
(cherry picked from commit 7bdfabe2d1ec353ecdc75a5aec41cce83e572391)
Co-authored-by: T. Wouters <thomas@python.org>
|
| |
|
|
|
|
|
|
| |
(gh-124646) (#124648)
gh-116510: Fix crash due to shared immortal interned strings. (gh-124646)
(cherry picked from commit 98b2ed7e239c807f379cd2bf864f372d79064aac)
Co-authored-by: Neil Schemenauer <nas-github@arctrix.com>
|
| |
|
|
|
|
| |
Fixes GH-122888
(cherry picked from commit 53ebb6232a8ebc03827cf2251bfc67f1886ffd70)
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
|
| |
|
|
|
|
|
| |
(GH-122347)
(cherry picked from commit bb09ba679223666e01f8da780f97888a29d07131)
Co-authored-by: Petr Viktorin <encukou@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
immortalizing in other API (GH-121364) (GH-121854)
* Switch PyUnicode_InternInPlace to _PyUnicode_InternMortal, clarify docs
* Document immortality in some functions that take `const char *`
This is PyUnicode_InternFromString;
PyDict_SetItemString, PyObject_SetAttrString;
PyObject_DelAttrString; PyUnicode_InternFromString;
and the PyModule_Add convenience functions.
Always point out a non-immortalizing alternative.
* Don't immortalize user-provided attr names in _ctypes
(cherry picked from commit b4aedb23ae7954fb58084dda16cd41786819a8cf)
|
| |
|
|
|
|
|
|
|
|
|
| |
_Py_IsImmortal (GH-121358) (GH-121851)
gh-113993: For string interning, do not rely on (or assert) _Py_IsImmortal (GH-121358)
Older stable ABI extensions are allowed to make immortal objects mortal.
Instead, use `_PyUnicode_STATE` (`interned` and `statically_allocated`).
(cherry picked from commit 956270d08d5c23f59937e2f29f8e0b7f63d68afd)
Co-authored-by: Petr Viktorin <encukou@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
issues (GH-120520) (GH-120945)
* Add an InternalDocs file describing how interning should work and how to use it.
* Add internal functions to *explicitly* request what kind of interning is done:
- `_PyUnicode_InternMortal`
- `_PyUnicode_InternImmortal`
- `_PyUnicode_InternStatic`
* Switch uses of `PyUnicode_InternInPlace` to those.
* Disallow using `_Py_SetImmortal` on strings directly.
You should use `_PyUnicode_InternImmortal` instead:
- Strings should be interned before immortalization, otherwise you're possibly
interning a immortalizing copy.
- `_Py_SetImmortal` doesn't handle the `SSTATE_INTERNED_MORTAL` to
`SSTATE_INTERNED_IMMORTAL` update, and those flags can't be changed in
backports, as they are now part of public API and version-specific ABI.
* Add private `_only_immortal` argument for `sys.getunicodeinternedsize`, used in refleak test machinery.
* Make sure the statically allocated string singletons are unique. This means these sets are now disjoint:
- `_Py_ID`
- `_Py_STR` (including the empty string)
- one-character latin-1 singletons
Now, when you intern a singleton, that exact singleton will be interned.
* Add a `_Py_LATIN1_CHR` macro, use it instead of `_Py_ID`/`_Py_STR` for one-character latin-1 singletons everywhere (including Clinic).
* Intern `_Py_STR` singletons at startup.
* For free-threaded builds, intern `_Py_LATIN1_CHR` singletons at startup.
* Beef up the tests. Cover internal details (marked with `@cpython_only`).
* Add lots of assertions
Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
| |
(gh-120009)
We make use of the same mechanism that we use for the static builtin types. This required a few tweaks.
This change is the final piece needed to make _datetime support multiple interpreters. I've updated the module slot accordingly.
(cherry picked from commit 105f22ea46ac16866e6df18ebae2a8ba422b7f45, AKA gh-119929)
Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
|
| |
|
|
|
|
|
|
| |
(gh-119963)
gh-117657: Fix data races report by TSAN unicode-hash (gh-119907)
(cherry picked from commit 0594a27e5f1d87d59fa8a761dd8ca9df4e42816d)
Co-authored-by: Donghee Na <donghee.na@python.org>
|
| |
|
|
|
| |
(cherry picked from commit 08e65430aafa1047029e6f132a5f748c415bda14)
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
| |
build (GH-119315) (#119419)
Add `Py_BEGIN_CRITICAL_SECTION_SEQUENCE_FAST` and
`Py_END_CRITICAL_SECTION_SEQUENCE_FAST` macros and update `str.join` to use
them. Also add a regression test that would crash reliably without this
patch.
(cherry picked from commit baf347d91643a83483bae110092750d39471e0c2)
Co-authored-by: Josh {*()} Rosenberg <26495692+MojoVampire@users.noreply.github.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This PR adds the ability to enable the GIL if it was disabled at
interpreter startup, and modifies the multi-phase module initialization
path to enable the GIL when loading a module, unless that module's spec
includes a slot indicating it can run safely without the GIL.
PEP 703 called the constant for the slot `Py_mod_gil_not_used`; I went
with `Py_MOD_GIL_NOT_USED` for consistency with gh-104148.
A warning will be issued up to once per interpreter for the first
GIL-using module that is loaded. If `-v` is given, a shorter message
will be printed to stderr every time a GIL-using module is loaded
(including the first one that issues a warning).
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The module itself is a thin wrapper around calls to functions in
`Python/codecs.c`, so that's where the meaningful changes happened:
- Move codecs-related state that lives on `PyInterpreterState` to a
struct declared in `pycore_codecs.h`.
- In free-threaded builds, add a mutex to `codecs_state` to synchronize
operations on `search_path`. Because `search_path_mutex` is used as a
normal mutex and not a critical section, we must be extremely careful
with operations called while holding it.
- The codec registry is explicitly initialized as part of
`_PyUnicode_InitEncodings` to simplify thread-safety.
|
| |
|
|
|
|
|
|
| |
(#117746)
Fall back to tp_call() for cases when arguments are passed by name.
Co-authored-by: Donghee Na <donghee.na@python.org>
Co-authored-by: Victor Stinner <vstinner@python.org>
|
| |
|
|
|
| |
* Fix implementation of %#T and %#N (they were implemented as %T# and
%N#).
* Restore tests removed in gh-116417.
|
| |
|
|
|
| |
This keeps track of the per-thread total reference count operations in
PyThreadState in the free-threaded builds. The count is merged into the
interpreter's total when the thread exits.
|
| |
|
|
|
|
|
|
|
|
| |
This change gives a significant speedup, as the METH_FASTCALL calling
convention is now used. The following methods are adapted:
- str.count
- str.find
- str.index
- str.rfind
- str.rindex
|
| |
|
|
|
| |
The first parameter is named 'suffix', not 'prefix'.
Regression introduced by commit 444156ed
|
| |
|
|
| |
This change gives a significant speedup, as the METH_FASTCALL calling
convention is now used.
|
| |
|
|
|
|
|
|
|
|
| |
Starting in Python 3.12, we prevented calling fork() and starting new threads
during interpreter finalization (shutdown). This has led to a number of
regressions and flaky tests. We should not prevent starting new threads
(or `fork()`) until all non-daemon threads exit and finalization starts in
earnest.
This changes the checks to use `_PyInterpreterState_GetFinalizing(interp)`,
which is set immediately before terminating non-daemon threads.
|
| | |
|
| |
|
|
|
| |
This changes a number of internal usages of `PyDict_SetDefault` to use `PyDict_SetDefaultRef`.
Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com>
|
| | |
|
| | |
|
| |
|
|
| |
Clarify split direction in the docstring body,
instead of in the 'maxsplit' param docstring.
|
| |
|
|
|
| |
This replaces some usages of PyThread_type_lock with PyMutex, which does not require memory allocation to initialize.
This simplifies some of the runtime initialization and is also one step towards avoiding changing the default raw memory allocator during initialize/finalization, which can be non-thread-safe in some circumstances.
|
| | |
|
| | |
|
| |
|
|
| |
Like PyUnicode_AsUTF8(), but check for embedded null characters.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Revert "gh-111089: Use PyUnicode_AsUTF8() in Argument Clinic (#111585)"
This reverts commit d9b606b3d04fc56fb0bcc479d7d6c14562edb5e2.
* Revert "gh-111089: Use PyUnicode_AsUTF8() in getargs.c (#111620)"
This reverts commit cde1071b2a72e8261ca66053ef61431b7f3a81fd.
* Revert "gh-111089: PyUnicode_AsUTF8() now raises on embedded NUL (#111091)"
This reverts commit d731579bfb9a497cfb0076cb6b221058a20088fe.
* Revert "gh-111089: Add PyUnicode_AsUTF8() to the limited C API (#111121)"
This reverts commit d8f32be5b6a736dc2fc9dca3f1bf176c82fc9b44.
* Revert "gh-111089: Use PyUnicode_AsUTF8() in sqlite3 (#111122)"
This reverts commit 37e4e20eaa8f27ada926d49e5971fecf0477ad26.
|
| | |
|
| |
|
|
| |
On error, PyUnicode_AsUTF8AndSize() now sets the size argument to -1,
to avoid undefined value.
|
| |
|
|
|
|
|
|
|
| |
* PyUnicode_AsUTF8() now raises an exception if the string contains
embedded null characters.
* Update related C API tests (test_capi.test_unicode).
* type_new_set_doc() uses PyUnicode_AsUTF8AndSize() to silently
truncate doc containing null bytes.
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
|
| |
|
|
| |
PyUnicode_EqualToUTF8AndSize() functions (GH-110297)
|
| | |
|
| |
|
| |
_PyUnicode_FromId() now uses pyatomic.h functions instead.
|
| | |
|
| |
|
|
|
|
| |
Functions like PyErr_SetFromErrno() and SetFromWindowsErr() should be
called immediately after using the C API which sets errno or the Windows
error code.
|
| |
|
|
|
|
| |
Change generated by the command:
sed -i -e 's!_PyLong_AsInt!PyLong_AsInt!g' \
$(find -name "*.c" -o -name "*.h")
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Move private _PyEval functions to the internal C API
(pycore_ceval.h):
* _PyEval_GetBuiltin()
* _PyEval_GetBuiltinId()
* _PyEval_GetSwitchInterval()
* _PyEval_MakePendingCalls()
* _PyEval_SetProfile()
* _PyEval_SetSwitchInterval()
* _PyEval_SetTrace()
No longer export most of these functions.
|
| | |
|