summaryrefslogtreecommitdiffstats
path: root/Objects/unicodeobject.c
Commit message (Collapse)AuthorAgeFilesLines
...
* gh-94673: Ensure Builtin Static Types are Readied Properly (gh-103940)Eric Snow2023-04-271-4/+0
| | | There were cases where we do unnecessary work for builtin static types. This also simplifies some work necessary for a per-interpreter GIL.
* gh-84436: Implement Immortal Objects (gh-19474)Eddie Elizondo2023-04-221-35/+66
| | | | | | | | | This is the implementation of PEP683 Motivation: The PR introduces the ability to immortalize instances in CPython which bypasses reference counting. Tagging objects as immortal allows up to skip certain operations when we know that the object will be around for the entire execution of the runtime. Note that this by itself will bring a performance regression to the runtime due to the extra reference count checks. However, this brings the ability of having truly immutable objects that are useful in other contexts such as immutable data sharing between sub-interpreters.
* gh-100227: Move the Dict of Interned Strings to PyInterpreterState (gh-102339)Eric Snow2023-03-281-42/+56
| | | | | We can revisit the options for keeping it global later, if desired. For now the approach seems quite complex, so we've gone with the simpler isolation solution in the meantime. https://github.com/python/cpython/issues/100227
* gh-100227: Revert gh-102925 "gh-100227: Make the Global Interned Dict Safe ↵Eric Snow2023-03-271-4/+9
| | | | | | | for Isolated Interpreters" (gh-103063) This reverts commit 87be8d9. This approach to keeping the interned strings safe is turning out to be too complex for my taste (due to obmalloc isolation). For now I'm going with the simpler solution, making the dict per-interpreter. We can revisit that later if we want a sharing solution.
* gh-100227: Make the Global Interned Dict Safe for Isolated Interpreters ↵Eric Snow2023-03-231-9/+4
| | | | | | | | | (gh-102925) This is effectively two changes. The first (the bulk of the change) is where we add _Py_AddToGlobalDict() (and _PyRuntime.cached_objects.main_tstate, etc.). The second (much smaller) change is where we update PyUnicode_InternInPlace() to use _Py_AddToGlobalDict() instead of calling PyDict_SetDefault() directly. Basically, _Py_AddToGlobalDict() is a wrapper around PyDict_SetDefault() that should be used whenever we need to add a value to a runtime-global dict object (in the few cases where we are leaving the container global rather than moving it to PyInterpreterState, e.g. the interned strings dict). _Py_AddToGlobalDict() does all the necessary work to make sure the target global dict is shared safely between isolated interpreters. This is especially important as we move the obmalloc state to each interpreter (gh-101660), as well as, potentially, the GIL (PEP 684). https://github.com/python/cpython/issues/100227
* GH-100227: cleanup initialization of global interned dict (#102682)Kumar Aditya2023-03-141-8/+10
|
* gh-102304: Consolidate Direct Usage of _Py_RefTotal (gh-102514)Eric Snow2023-03-081-5/+2
| | | | | This simplifies further changes to _Py_RefTotal (e.g. make it atomic or move it to PyInterpreterState). https://github.com/python/cpython/issues/102304
* gh-101765: unicodeobject: use Py_XDECREF correctly (#102283)Jelle Zijlstra2023-02-261-1/+1
|
* gh-101765: Fix refcount issues in list and unicode pickling (#102265)Jelle Zijlstra2023-02-261-1/+3
| | | Followup from #101769.
* gh-101765: Fix SystemError / segmentation fault in iter `__reduce__` when ↵Ionite2023-02-241-3/+8
| | | | internal access of `builtins.__dict__` exhausts the iterator (#101769)
* gh-90111: Minor Cleanup for Runtime-Global Objects (gh-100254)Eric Snow2022-12-141-2/+2
| | | | | | | | * move _PyRuntime.global_objects.interned to _PyRuntime.cached_objects.interned_strings (and use _Py_CACHED_OBJECT()) * rename _PyRuntime.global_objects to _PyRuntime.static_objects (This also relates to gh-96075.) https://github.com/python/cpython/issues/90111
* gh-81057: Move More Globals to _PyRuntimeState (gh-100092)Eric Snow2022-12-071-3/+5
| | | https://github.com/python/cpython/issues/81057
* bpo-15999: Accept arbitrary values for boolean parameters. (#15609)Serhiy Storchaka2022-12-031-2/+2
| | | builtins and extension module functions and methods that expect boolean values for parameters now accept any Python object rather than just a bool or int type. This is more consistent with how native Python code itself behaves.
* gh-99612: Fix PyUnicode_DecodeUTF8Stateful() for ASCII-only data (GH-99613)Serhiy Storchaka2022-12-011-0/+3
| | | | Previously *consumed was not set in this case.
* gh-99537: Use Py_SETREF() function in C code (#99657)Victor Stinner2022-11-221-4/+2
| | | | | | | | | | | | | | | Fix potential race condition in code patterns: * Replace "Py_DECREF(var); var = new;" with "Py_SETREF(var, new);" * Replace "Py_XDECREF(var); var = new;" with "Py_XSETREF(var, new);" * Replace "Py_CLEAR(var); var = new;" with "Py_XSETREF(var, new);" Other changes: * Replace "old = var; var = new; Py_DECREF(var)" with "Py_SETREF(var, new);" * Replace "old = var; var = new; Py_XDECREF(var)" with "Py_XSETREF(var, new);" * And remove the "old" variable.
* gh-81057: Move More Globals in Core Code to _PyRuntimeState (gh-99516)Eric Snow2022-11-161-4/+15
| | | https://github.com/python/cpython/issues/81057
* gh-99300: Use Py_NewRef() in Objects/ directory (#99351)Victor Stinner2022-11-101-26/+13
| | | | Replace Py_INCREF() and Py_XINCREF() with Py_NewRef() and Py_XNewRef() in C files of the Objects/ directory.
* gh-90868: Adjust the Generated Objects (gh-99223)Eric Snow2022-11-081-1/+1
| | | | | | | | | | | We do the following: * move the generated _PyUnicode_InitStaticStrings() to its own file * move the generated _PyStaticObjects_CheckRefcnt() to its own file * include pycore_global_objects.h in extension modules instead of pycore_runtime_init.h These changes help us avoid including things that aren't needed. https://github.com/python/cpython/issues/90868
* gh-98783: Fix crashes when `str` subclasses are used in `_PyUnicode_Equal` ↵Nikita Sobolev2022-10-301-2/+2
| | | | (#98806)
* gh-98393: os module reject bytes-like, only accept bytes (#98394)Victor Stinner2022-10-181-30/+8
| | | | | The os module and the PyUnicode_FSDecoder() function no longer accept bytes-like paths, like bytearray and memoryview types: only the exact bytes type is accepted for bytes strings.
* gh-97982: Factorize PyUnicode_Count() and unicode_count() code (#98025)Nikita Sobolev2022-10-121-60/+26
| | | | Add unicode_count_impl() to factorize PyUnicode_Count() and unicode_count() code.
* gh-97982: Remove asciilib_count() (#98164)Victor Stinner2022-10-111-14/+5
| | | | | asciilib_count() is the same than ucs1lib_count(): the code is not specialized for ASCII strings, so it's not worth it to have a separated function. Remove asciilib_count() function.
* GH-96458: Statically initialize utf8 representation of static strings (#96481)Kumar Aditya2022-09-031-33/+0
|
* GH-96075: move interned dict under runtime state (GH-96077)Kumar Aditya2022-08-221-14/+25
|
* gh-95504: Fix negative numbers in PyUnicode_FromFormat (GH-95848)Petr Viktorin2022-08-101-6/+19
| | | Co-authored-by: philg314 <110174000+philg314@users.noreply.github.com>
* gh-95781: More strict format string checking in PyUnicode_FromFormatV() ↵Serhiy Storchaka2022-08-081-23/+10
| | | | | | | | | (GH-95784) An unrecognized format character in PyUnicode_FromFormat() and PyUnicode_FromFormatV() now sets a SystemError. In previous versions it caused all the rest of the format string to be copied as-is to the result string, and any extra arguments discarded.
* gh-91146: More reduce allocation size of list from str.split/rsplit (gh-95493)Dong-hee Na2022-08-011-9/+22
| | | Co-authored-by: Inada Naoki <songofacandy@gmail.com>
* gh-91146: Reduce allocation size of list from str.split()/rsplit() (gh-95473)Dong-hee Na2022-07-311-19/+20
|
* Fix Unicode doc and replace use of macro with PyMem_New function (GH-94088)Pamela Fox2022-07-281-1/+1
|
* gh-94673: Add _PyStaticType_InitBuiltin() (#95152)Eric Snow2022-07-251-3/+3
| | | | | | | | | | | | This is the first of several precursors to storing tp_subclasses (and tp_weaklist) on the interpreter state for static builtin types. We do the following: * add `_PyStaticType_InitBuiltin()` * add `_Py_TPFLAGS_STATIC_BUILTIN` * set it on all static builtin types in `_PyStaticType_InitBuiltin()` * shuffle some code around to be able to use _PyStaticType_InitBuiltin() * rename `_PyStructSequence_InitType()` to `_PyStructSequence_InitBuiltinWithFlags()` * add `_PyStructSequence_InitBuiltin()`.
* GH-90699: Intern statically allocated strings (GH-93597)Kumar Aditya2022-07-081-0/+9
| | | This is similar to how strings are interned for deepfreeze.
* bpo-40514: Drop EXPERIMENTAL_ISOLATED_SUBINTERPRETERS (gh-93185)Eric Snow2022-05-271-17/+0
| | | | | | | This was added for bpo-40514 (gh-84694) to test out a per-interpreter GIL. However, it has since proven unnecessary to keep the experiment in the repo. (It can be done as a branch in a fork like normal.) So here we are removing: * the configure option * the macro * the code enabled by the macro
* GH-93207: Remove HAVE_STDARG_PROTOTYPES configure check for stdarg.h (#93215)Kumar Aditya2022-05-271-4/+0
|
* gh-91924: Optimize unicode_check_encoding_errors() (#93200)Victor Stinner2022-05-261-2/+16
| | | | | | Avoid _PyCodec_Lookup() and PyCodec_LookupError() for most common built-in encodings and error handlers to avoid creating a temporary Unicode string object, whereas these encodings and error handlers are known to be valid.
* gh-85858: Remove PyUnicode_InternImmortal() function (#92579)Victor Stinner2022-05-131-52/+17
| | | | | | | | | | | | | | | | | Remove the PyUnicode_InternImmortal() function and the SSTATE_INTERNED_IMMORTAL macro. The PyUnicode_InternImmortal() function is still exported in the stable ABI. The function is removed from the API. PyASCIIObject.state.interned size is now a single bit, rather than 2 bits. Keep SSTATE_NOT_INTERNED and SSTATE_INTERNED_MORTAL macros for backward compatibility, but no longer use them internally since the interned member is now a single bit and so can only have two values (interned or not interned). Update stats of _PyUnicode_ClearInterned().
* gh-89653: Use int type for Unicode kind (#92704)Victor Stinner2022-05-131-28/+28
| | | | Use the same type that PyUnicode_FromKindAndData() kind parameter type (public C API): int.
* gh-92536: PEP 623: Remove wstr and legacy APIs from Unicode (GH-92537)Inada Naoki2022-05-121-1025/+91
|
* gh-91320: Use _PyCFunction_CAST() (#92251)Victor Stinner2022-05-031-1/+1
| | | | | | | | | | Replace "(PyCFunction)(void(*)(void))func" cast with _PyCFunction_CAST(func). Change generated by the command: sed -i -e \ 's!(PyCFunction)(void(\*)(void)) *\([A-Za-z0-9_]\+\)!_PyCFunction_CAST(\1)!g' \ $(find -name "*.c")
* bpo-36819: Fix crashes in built-in encoders with weird error handlers (GH-28593)Serhiy Storchaka2022-05-021-21/+39
| | | | | | | If the error handler returns position less or equal than the starting position of non-encodable characters, most of built-in encoders didn't properly re-size the output buffer. This led to out-of-bounds writes, and segfaults.
* gh-81548: Deprecate octal escape sequences with value larger than 0o377 ↵Serhiy Storchaka2022-04-301-5/+24
| | | | (GH-91668)
* gh-90667: Add specializations of Py_DECREF when types are known (GH-30872)Dennis Sweeney2022-04-191-2/+8
|
* bpo-46712: share more global strings in deepfreeze (gh-32152)Kumar Aditya2022-04-191-0/+1
| | | (for gh-90868)
* gh-91102: Use Argument Clinic for EncodingMap (#31725)Oleg Iarygin2022-04-181-47/+23
| | | Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
* gh-91576: Speed up iteration of strings (#91574)Kumar Aditya2022-04-181-6/+45
|
* gh-91421: Use constant value check during runtime (GH-91422)Tobias Stoeckmann2022-04-131-1/+1
| | | | | | | | | | | | | The left-hand side expression of the if-check can be converted to a constant by the compiler, but the addition on the right-hand side is performed during runtime. Move the addition from the right-hand side to the left-hand side by turning it into a subtraction there. Since the values are known to be large enough to not turn negative, this is a safe operation. Prevents a very unlikely integer overflow on 32 bit systems. Fixes GH-91421.
* bpo-45995: add "z" format specifer to coerce negative 0 to zero (GH-30049)John Belmonte2022-04-111-4/+4
| | | | | | | | Add "z" format specifier to coerce negative 0 to zero. See https://github.com/python/cpython/issues/90153 (originally https://bugs.python.org/issue45995) for discussion. This covers `str.format()` and f-strings. Old-style string interpolation is not supported. Co-authored-by: Mark Dickinson <dickinsm@gmail.com>
* Fix bad grammar and import docstring for split/rsplit (GH-32381)Raymond Hettinger2022-04-081-9/+16
|
* bpo-47182: Fix crash by named unicode characters after interpreter ↵Christian Heimes2022-03-311-0/+3
| | | | | reinitialization (GH-32212) Automerge-Triggered-By: GH:tiran
* bpo-47164: Add _PyASCIIObject_CAST() macro (GH-32191)Victor Stinner2022-03-311-30/+27
| | | | | | | | | | | | Add macros to cast objects to PyASCIIObject*, PyCompactUnicodeObject* and PyUnicodeObject*: _PyASCIIObject_CAST(), _PyCompactUnicodeObject_CAST() and _PyUnicodeObject_CAST(). Using these new macros make the code more readable and check their argument with: assert(PyUnicode_Check(op)). Remove redundant assert(PyUnicode_Check(op)) in macros using directly or indirectly these new CAST macros. Replacing existing casts with these macros.
* bpo-47070: Add _PyBytes_Repeat() (GH-31999)Pieter Eendebak2022-03-281-9/+3
| | | Use it where appropriate: the repeat functions of `array.array`, `bytes`, `bytearray`, and `str`.