summaryrefslogtreecommitdiffstats
path: root/Objects/unicodeobject.c
Commit message (Collapse)AuthorAgeFilesLines
* gh-99300: Use Py_NewRef() in Objects/ directory (#99351)Victor Stinner2022-11-101-26/+13
| | | | Replace Py_INCREF() and Py_XINCREF() with Py_NewRef() and Py_XNewRef() in C files of the Objects/ directory.
* gh-90868: Adjust the Generated Objects (gh-99223)Eric Snow2022-11-081-1/+1
| | | | | | | | | | | We do the following: * move the generated _PyUnicode_InitStaticStrings() to its own file * move the generated _PyStaticObjects_CheckRefcnt() to its own file * include pycore_global_objects.h in extension modules instead of pycore_runtime_init.h These changes help us avoid including things that aren't needed. https://github.com/python/cpython/issues/90868
* gh-98783: Fix crashes when `str` subclasses are used in `_PyUnicode_Equal` ↵Nikita Sobolev2022-10-301-2/+2
| | | | (#98806)
* gh-98393: os module reject bytes-like, only accept bytes (#98394)Victor Stinner2022-10-181-30/+8
| | | | | The os module and the PyUnicode_FSDecoder() function no longer accept bytes-like paths, like bytearray and memoryview types: only the exact bytes type is accepted for bytes strings.
* gh-97982: Factorize PyUnicode_Count() and unicode_count() code (#98025)Nikita Sobolev2022-10-121-60/+26
| | | | Add unicode_count_impl() to factorize PyUnicode_Count() and unicode_count() code.
* gh-97982: Remove asciilib_count() (#98164)Victor Stinner2022-10-111-14/+5
| | | | | asciilib_count() is the same than ucs1lib_count(): the code is not specialized for ASCII strings, so it's not worth it to have a separated function. Remove asciilib_count() function.
* GH-96458: Statically initialize utf8 representation of static strings (#96481)Kumar Aditya2022-09-031-33/+0
|
* GH-96075: move interned dict under runtime state (GH-96077)Kumar Aditya2022-08-221-14/+25
|
* gh-95504: Fix negative numbers in PyUnicode_FromFormat (GH-95848)Petr Viktorin2022-08-101-6/+19
| | | Co-authored-by: philg314 <110174000+philg314@users.noreply.github.com>
* gh-95781: More strict format string checking in PyUnicode_FromFormatV() ↵Serhiy Storchaka2022-08-081-23/+10
| | | | | | | | | (GH-95784) An unrecognized format character in PyUnicode_FromFormat() and PyUnicode_FromFormatV() now sets a SystemError. In previous versions it caused all the rest of the format string to be copied as-is to the result string, and any extra arguments discarded.
* gh-91146: More reduce allocation size of list from str.split/rsplit (gh-95493)Dong-hee Na2022-08-011-9/+22
| | | Co-authored-by: Inada Naoki <songofacandy@gmail.com>
* gh-91146: Reduce allocation size of list from str.split()/rsplit() (gh-95473)Dong-hee Na2022-07-311-19/+20
|
* Fix Unicode doc and replace use of macro with PyMem_New function (GH-94088)Pamela Fox2022-07-281-1/+1
|
* gh-94673: Add _PyStaticType_InitBuiltin() (#95152)Eric Snow2022-07-251-3/+3
| | | | | | | | | | | | This is the first of several precursors to storing tp_subclasses (and tp_weaklist) on the interpreter state for static builtin types. We do the following: * add `_PyStaticType_InitBuiltin()` * add `_Py_TPFLAGS_STATIC_BUILTIN` * set it on all static builtin types in `_PyStaticType_InitBuiltin()` * shuffle some code around to be able to use _PyStaticType_InitBuiltin() * rename `_PyStructSequence_InitType()` to `_PyStructSequence_InitBuiltinWithFlags()` * add `_PyStructSequence_InitBuiltin()`.
* GH-90699: Intern statically allocated strings (GH-93597)Kumar Aditya2022-07-081-0/+9
| | | This is similar to how strings are interned for deepfreeze.
* bpo-40514: Drop EXPERIMENTAL_ISOLATED_SUBINTERPRETERS (gh-93185)Eric Snow2022-05-271-17/+0
| | | | | | | This was added for bpo-40514 (gh-84694) to test out a per-interpreter GIL. However, it has since proven unnecessary to keep the experiment in the repo. (It can be done as a branch in a fork like normal.) So here we are removing: * the configure option * the macro * the code enabled by the macro
* GH-93207: Remove HAVE_STDARG_PROTOTYPES configure check for stdarg.h (#93215)Kumar Aditya2022-05-271-4/+0
|
* gh-91924: Optimize unicode_check_encoding_errors() (#93200)Victor Stinner2022-05-261-2/+16
| | | | | | Avoid _PyCodec_Lookup() and PyCodec_LookupError() for most common built-in encodings and error handlers to avoid creating a temporary Unicode string object, whereas these encodings and error handlers are known to be valid.
* gh-85858: Remove PyUnicode_InternImmortal() function (#92579)Victor Stinner2022-05-131-52/+17
| | | | | | | | | | | | | | | | | Remove the PyUnicode_InternImmortal() function and the SSTATE_INTERNED_IMMORTAL macro. The PyUnicode_InternImmortal() function is still exported in the stable ABI. The function is removed from the API. PyASCIIObject.state.interned size is now a single bit, rather than 2 bits. Keep SSTATE_NOT_INTERNED and SSTATE_INTERNED_MORTAL macros for backward compatibility, but no longer use them internally since the interned member is now a single bit and so can only have two values (interned or not interned). Update stats of _PyUnicode_ClearInterned().
* gh-89653: Use int type for Unicode kind (#92704)Victor Stinner2022-05-131-28/+28
| | | | Use the same type that PyUnicode_FromKindAndData() kind parameter type (public C API): int.
* gh-92536: PEP 623: Remove wstr and legacy APIs from Unicode (GH-92537)Inada Naoki2022-05-121-1025/+91
|
* gh-91320: Use _PyCFunction_CAST() (#92251)Victor Stinner2022-05-031-1/+1
| | | | | | | | | | Replace "(PyCFunction)(void(*)(void))func" cast with _PyCFunction_CAST(func). Change generated by the command: sed -i -e \ 's!(PyCFunction)(void(\*)(void)) *\([A-Za-z0-9_]\+\)!_PyCFunction_CAST(\1)!g' \ $(find -name "*.c")
* bpo-36819: Fix crashes in built-in encoders with weird error handlers (GH-28593)Serhiy Storchaka2022-05-021-21/+39
| | | | | | | If the error handler returns position less or equal than the starting position of non-encodable characters, most of built-in encoders didn't properly re-size the output buffer. This led to out-of-bounds writes, and segfaults.
* gh-81548: Deprecate octal escape sequences with value larger than 0o377 ↵Serhiy Storchaka2022-04-301-5/+24
| | | | (GH-91668)
* gh-90667: Add specializations of Py_DECREF when types are known (GH-30872)Dennis Sweeney2022-04-191-2/+8
|
* bpo-46712: share more global strings in deepfreeze (gh-32152)Kumar Aditya2022-04-191-0/+1
| | | (for gh-90868)
* gh-91102: Use Argument Clinic for EncodingMap (#31725)Oleg Iarygin2022-04-181-47/+23
| | | Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
* gh-91576: Speed up iteration of strings (#91574)Kumar Aditya2022-04-181-6/+45
|
* gh-91421: Use constant value check during runtime (GH-91422)Tobias Stoeckmann2022-04-131-1/+1
| | | | | | | | | | | | | The left-hand side expression of the if-check can be converted to a constant by the compiler, but the addition on the right-hand side is performed during runtime. Move the addition from the right-hand side to the left-hand side by turning it into a subtraction there. Since the values are known to be large enough to not turn negative, this is a safe operation. Prevents a very unlikely integer overflow on 32 bit systems. Fixes GH-91421.
* bpo-45995: add "z" format specifer to coerce negative 0 to zero (GH-30049)John Belmonte2022-04-111-4/+4
| | | | | | | | Add "z" format specifier to coerce negative 0 to zero. See https://github.com/python/cpython/issues/90153 (originally https://bugs.python.org/issue45995) for discussion. This covers `str.format()` and f-strings. Old-style string interpolation is not supported. Co-authored-by: Mark Dickinson <dickinsm@gmail.com>
* Fix bad grammar and import docstring for split/rsplit (GH-32381)Raymond Hettinger2022-04-081-9/+16
|
* bpo-47182: Fix crash by named unicode characters after interpreter ↵Christian Heimes2022-03-311-0/+3
| | | | | reinitialization (GH-32212) Automerge-Triggered-By: GH:tiran
* bpo-47164: Add _PyASCIIObject_CAST() macro (GH-32191)Victor Stinner2022-03-311-30/+27
| | | | | | | | | | | | Add macros to cast objects to PyASCIIObject*, PyCompactUnicodeObject* and PyUnicodeObject*: _PyASCIIObject_CAST(), _PyCompactUnicodeObject_CAST() and _PyUnicodeObject_CAST(). Using these new macros make the code more readable and check their argument with: assert(PyUnicode_Check(op)). Remove redundant assert(PyUnicode_Check(op)) in macros using directly or indirectly these new CAST macros. Replacing existing casts with these macros.
* bpo-47070: Add _PyBytes_Repeat() (GH-31999)Pieter Eendebak2022-03-281-9/+3
| | | Use it where appropriate: the repeat functions of `array.array`, `bytes`, `bytearray`, and `str`.
* bpo-47084: Clear Unicode cached representations on finalization (GH-32032)Jeremy Kloth2022-03-221-0/+44
|
* bpo-46920: Remove disabled debug code added decades ago and likely ↵Oleg Iarygin2022-03-141-13/+0
| | | | unnecessary (GH-31812)
* bpo-46881: Fix refleak from GH-31616 (GH-31805)Jelle Zijlstra2022-03-111-2/+4
|
* bpo-46881: Statically allocate and initialize the latin1 characters. (GH-31616)Kumar Aditya2022-03-091-50/+14
|
* bpo-46541: Replace core use of _Py_IDENTIFIER() with statically initialized ↵Eric Snow2022-02-081-33/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | global objects. (gh-30928) We're no longer using _Py_IDENTIFIER() (or _Py_static_string()) in any core CPython code. It is still used in a number of non-builtin stdlib modules. The replacement is: PyUnicodeObject (not pointer) fields under _PyRuntimeState, statically initialized as part of _PyRuntime. A new _Py_GET_GLOBAL_IDENTIFIER() macro facilitates lookup of the fields (along with _Py_GET_GLOBAL_STRING() for non-identifier strings). https://bugs.python.org/issue46541#msg411799 explains the rationale for this change. The core of the change is in: * (new) Include/internal/pycore_global_strings.h - the declarations for the global strings, along with the macros * Include/internal/pycore_runtime_init.h - added the static initializers for the global strings * Include/internal/pycore_global_objects.h - where the struct in pycore_global_strings.h is hooked into _PyRuntimeState * Tools/scripts/generate_global_objects.py - added generation of the global string declarations and static initializers I've also added a --check flag to generate_global_objects.py (along with make check-global-objects) to check for unused global strings. That check is added to the PR CI config. The remainder of this change updates the core code to use _Py_GET_GLOBAL_IDENTIFIER() instead of _Py_IDENTIFIER() and the related _Py*Id functions (likewise for _Py_GET_GLOBAL_STRING() instead of _Py_static_string()). This includes adding a few functions where there wasn't already an alternative to _Py*Id(), replacing the _Py_Identifier * parameter with PyObject *. The following are not changed (yet): * stop using _Py_IDENTIFIER() in the stdlib modules * (maybe) get rid of _Py_IDENTIFIER(), etc. entirely -- this may not be doable as at least one package on PyPI using this (private) API * (maybe) intern the strings during runtime init https://bugs.python.org/issue46541
* bpo-46670: Test if a macro is defined, not its value (GH-31178)Victor Stinner2022-02-071-3/+3
| | | | | | | | * audioop.c: #ifdef WORDS_BIGENDIAN * ctypes.h: #ifdef USING_MALLOC_CLOSURE_DOT_C * _ctypes/malloc_closure.c: #ifdef HAVE_FFI_CLOSURE_ALLOC and #ifdef USING_APPLE_OS_LIBFFI * pytime.c: #ifdef __APPLE__ * unicodeobject.c: #ifdef HAVE_NON_UNICODE_WCHAR_T_REPRESENTATION
* bpo-46417: Clear Unicode static types at exit (GH-30806)Victor Stinner2022-01-221-10/+19
| | | | | | | | | | | Add _PyUnicode_FiniTypes() function, called by finalize_interp_types(). It clears these static types: * EncodingMapType * PyFieldNameIter_Type * PyFormatterIter_Type _PyStaticType_Dealloc() now does nothing if tp_subclasses is not NULL.
* bpo-46417: Add missing types of _PyTypes_InitTypes() (GH-30749)Victor Stinner2022-01-211-1/+1
| | | | | | | | | Add types removed by mistake by the commit adding _PyTypes_FiniTypes(). Move also PyBool_Type at the end, since it depends on PyLong_Type. PyBytes_Type and PyUnicode_Type no longer depend explicitly on PyBaseObject_Type: it's the default of PyType_Ready().
* bpo-46006: Revert "bpo-40521: Per-interpreter interned strings (GH-20085)" ↵Victor Stinner2022-01-061-19/+47
| | | | | | | | | | | (GH-30422) This reverts commit ea251806b8dffff11b30d2182af1e589caf88acf. Keep "assert(interned == NULL);" in _PyUnicode_Fini(), but only for the main interpreter. Keep _PyUnicode_ClearInterned() changes avoiding the creation of a temporary Python list object.
* bpo-46008: Make runtime-global object/type lifecycle functions and state ↵Eric Snow2021-12-091-19/+35
| | | | | | | | | | | | consistent. (gh-29998) This change is strictly renames and moving code around. It helps in the following ways: * ensures type-related init functions focus strictly on one of the three aspects (state, objects, types) * passes in PyInterpreterState * to all those functions, simplifying work on moving types/objects/state to the interpreter * consistent naming conventions help make what's going on more clear * keeping API related to a type in the corresponding header file makes it more obvious where to look for it https://bugs.python.org/issue46008
* bpo-45885: Specialize COMPARE_OP (GH-29734)Dennis Sweeney2021-12-031-0/+14
| | | | | | | * Add COMPARE_OP_ADAPTIVE adaptive instruction. * Add COMPARE_OP_FLOAT_JUMP, COMPARE_OP_INT_JUMP and COMPARE_OP_STR_JUMP specialized instructions. * Introduce and use _PyUnicode_Equal
* bpo-35134: Add Include/cpython/longobject.h (GH-29044)Victor Stinner2021-10-191-0/+1
| | | | | | | | | | Move Include/longobject.h non-limited API to a new Include/cpython/longobject.h header file. Move the following definitions to the internal C API: * _PyLong_DigitValue * _PyLong_FormatAdvancedWriter() * _PyLong_FormatWriter()
* bpo-45467: Fix IncrementalDecoder and StreamReader in the ↵Serhiy Storchaka2021-10-141-20/+44
| | | | | | | | | "raw-unicode-escape" codec (GH-28944) They support now splitting escape sequences between input chunks. Add the third parameter "final" in codecs.raw_unicode_escape_decode(). It is True by default to match the former behavior.
* bpo-45461: Fix IncrementalDecoder and StreamReader in the "unicode-escape" ↵Serhiy Storchaka2021-10-141-12/+37
| | | | | | | | | codec (GH-28939) They support now splitting escape sequences between input chunks. Add the third parameter "final" in codecs.unicode_escape_decode(). It is True by default to match the former behavior.
* Fix typos in the Objects directory (GH-28766)Christian Clauss2021-10-061-2/+2
|
* bpo-45061: Revert unicode_is_singleton() change (GH-28516)Victor Stinner2021-09-221-2/+4
| | | Don't use a loop over 256 items, only checks for a single singleton.