summaryrefslogtreecommitdiffstats
path: root/Objects/unicodeobject.c
Commit message (Collapse)AuthorAgeFilesLines
* [3.11] bpo-40514: Drop EXPERIMENTAL_ISOLATED_SUBINTERPRETERS (gh-93185) ↵Eric Snow2022-05-281-18/+0
| | | | | | | | | | | | | (GH-93306) (cherry picked from commit caa279d6fd5f151e57f891cd4f6ba51b532501c6) This was added for bpo-40514 (gh-84694) to test out a per-interpreter GIL. However, it has since proven unnecessary to keep the experiment in the repo. (It can be done as a branch in a fork like normal.) So here we are removing: * the configure option * the macro * the code enabled by the macro Automerge-Triggered-By: GH:ericsnowcurrently
* gh-91320: Use _PyCFunction_CAST() (#92251)Victor Stinner2022-05-031-1/+1
| | | | | | | | | | Replace "(PyCFunction)(void(*)(void))func" cast with _PyCFunction_CAST(func). Change generated by the command: sed -i -e \ 's!(PyCFunction)(void(\*)(void)) *\([A-Za-z0-9_]\+\)!_PyCFunction_CAST(\1)!g' \ $(find -name "*.c")
* bpo-36819: Fix crashes in built-in encoders with weird error handlers (GH-28593)Serhiy Storchaka2022-05-021-21/+39
| | | | | | | If the error handler returns position less or equal than the starting position of non-encodable characters, most of built-in encoders didn't properly re-size the output buffer. This led to out-of-bounds writes, and segfaults.
* gh-81548: Deprecate octal escape sequences with value larger than 0o377 ↵Serhiy Storchaka2022-04-301-5/+24
| | | | (GH-91668)
* gh-90667: Add specializations of Py_DECREF when types are known (GH-30872)Dennis Sweeney2022-04-191-2/+8
|
* bpo-46712: share more global strings in deepfreeze (gh-32152)Kumar Aditya2022-04-191-0/+1
| | | (for gh-90868)
* gh-91102: Use Argument Clinic for EncodingMap (#31725)Oleg Iarygin2022-04-181-47/+23
| | | Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
* gh-91576: Speed up iteration of strings (#91574)Kumar Aditya2022-04-181-6/+45
|
* gh-91421: Use constant value check during runtime (GH-91422)Tobias Stoeckmann2022-04-131-1/+1
| | | | | | | | | | | | | The left-hand side expression of the if-check can be converted to a constant by the compiler, but the addition on the right-hand side is performed during runtime. Move the addition from the right-hand side to the left-hand side by turning it into a subtraction there. Since the values are known to be large enough to not turn negative, this is a safe operation. Prevents a very unlikely integer overflow on 32 bit systems. Fixes GH-91421.
* bpo-45995: add "z" format specifer to coerce negative 0 to zero (GH-30049)John Belmonte2022-04-111-4/+4
| | | | | | | | Add "z" format specifier to coerce negative 0 to zero. See https://github.com/python/cpython/issues/90153 (originally https://bugs.python.org/issue45995) for discussion. This covers `str.format()` and f-strings. Old-style string interpolation is not supported. Co-authored-by: Mark Dickinson <dickinsm@gmail.com>
* Fix bad grammar and import docstring for split/rsplit (GH-32381)Raymond Hettinger2022-04-081-9/+16
|
* bpo-47182: Fix crash by named unicode characters after interpreter ↵Christian Heimes2022-03-311-0/+3
| | | | | reinitialization (GH-32212) Automerge-Triggered-By: GH:tiran
* bpo-47164: Add _PyASCIIObject_CAST() macro (GH-32191)Victor Stinner2022-03-311-30/+27
| | | | | | | | | | | | Add macros to cast objects to PyASCIIObject*, PyCompactUnicodeObject* and PyUnicodeObject*: _PyASCIIObject_CAST(), _PyCompactUnicodeObject_CAST() and _PyUnicodeObject_CAST(). Using these new macros make the code more readable and check their argument with: assert(PyUnicode_Check(op)). Remove redundant assert(PyUnicode_Check(op)) in macros using directly or indirectly these new CAST macros. Replacing existing casts with these macros.
* bpo-47070: Add _PyBytes_Repeat() (GH-31999)Pieter Eendebak2022-03-281-9/+3
| | | Use it where appropriate: the repeat functions of `array.array`, `bytes`, `bytearray`, and `str`.
* bpo-47084: Clear Unicode cached representations on finalization (GH-32032)Jeremy Kloth2022-03-221-0/+44
|
* bpo-46920: Remove disabled debug code added decades ago and likely ↵Oleg Iarygin2022-03-141-13/+0
| | | | unnecessary (GH-31812)
* bpo-46881: Fix refleak from GH-31616 (GH-31805)Jelle Zijlstra2022-03-111-2/+4
|
* bpo-46881: Statically allocate and initialize the latin1 characters. (GH-31616)Kumar Aditya2022-03-091-50/+14
|
* bpo-46541: Replace core use of _Py_IDENTIFIER() with statically initialized ↵Eric Snow2022-02-081-33/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | global objects. (gh-30928) We're no longer using _Py_IDENTIFIER() (or _Py_static_string()) in any core CPython code. It is still used in a number of non-builtin stdlib modules. The replacement is: PyUnicodeObject (not pointer) fields under _PyRuntimeState, statically initialized as part of _PyRuntime. A new _Py_GET_GLOBAL_IDENTIFIER() macro facilitates lookup of the fields (along with _Py_GET_GLOBAL_STRING() for non-identifier strings). https://bugs.python.org/issue46541#msg411799 explains the rationale for this change. The core of the change is in: * (new) Include/internal/pycore_global_strings.h - the declarations for the global strings, along with the macros * Include/internal/pycore_runtime_init.h - added the static initializers for the global strings * Include/internal/pycore_global_objects.h - where the struct in pycore_global_strings.h is hooked into _PyRuntimeState * Tools/scripts/generate_global_objects.py - added generation of the global string declarations and static initializers I've also added a --check flag to generate_global_objects.py (along with make check-global-objects) to check for unused global strings. That check is added to the PR CI config. The remainder of this change updates the core code to use _Py_GET_GLOBAL_IDENTIFIER() instead of _Py_IDENTIFIER() and the related _Py*Id functions (likewise for _Py_GET_GLOBAL_STRING() instead of _Py_static_string()). This includes adding a few functions where there wasn't already an alternative to _Py*Id(), replacing the _Py_Identifier * parameter with PyObject *. The following are not changed (yet): * stop using _Py_IDENTIFIER() in the stdlib modules * (maybe) get rid of _Py_IDENTIFIER(), etc. entirely -- this may not be doable as at least one package on PyPI using this (private) API * (maybe) intern the strings during runtime init https://bugs.python.org/issue46541
* bpo-46670: Test if a macro is defined, not its value (GH-31178)Victor Stinner2022-02-071-3/+3
| | | | | | | | * audioop.c: #ifdef WORDS_BIGENDIAN * ctypes.h: #ifdef USING_MALLOC_CLOSURE_DOT_C * _ctypes/malloc_closure.c: #ifdef HAVE_FFI_CLOSURE_ALLOC and #ifdef USING_APPLE_OS_LIBFFI * pytime.c: #ifdef __APPLE__ * unicodeobject.c: #ifdef HAVE_NON_UNICODE_WCHAR_T_REPRESENTATION
* bpo-46417: Clear Unicode static types at exit (GH-30806)Victor Stinner2022-01-221-10/+19
| | | | | | | | | | | Add _PyUnicode_FiniTypes() function, called by finalize_interp_types(). It clears these static types: * EncodingMapType * PyFieldNameIter_Type * PyFormatterIter_Type _PyStaticType_Dealloc() now does nothing if tp_subclasses is not NULL.
* bpo-46417: Add missing types of _PyTypes_InitTypes() (GH-30749)Victor Stinner2022-01-211-1/+1
| | | | | | | | | Add types removed by mistake by the commit adding _PyTypes_FiniTypes(). Move also PyBool_Type at the end, since it depends on PyLong_Type. PyBytes_Type and PyUnicode_Type no longer depend explicitly on PyBaseObject_Type: it's the default of PyType_Ready().
* bpo-46006: Revert "bpo-40521: Per-interpreter interned strings (GH-20085)" ↵Victor Stinner2022-01-061-19/+47
| | | | | | | | | | | (GH-30422) This reverts commit ea251806b8dffff11b30d2182af1e589caf88acf. Keep "assert(interned == NULL);" in _PyUnicode_Fini(), but only for the main interpreter. Keep _PyUnicode_ClearInterned() changes avoiding the creation of a temporary Python list object.
* bpo-46008: Make runtime-global object/type lifecycle functions and state ↵Eric Snow2021-12-091-19/+35
| | | | | | | | | | | | consistent. (gh-29998) This change is strictly renames and moving code around. It helps in the following ways: * ensures type-related init functions focus strictly on one of the three aspects (state, objects, types) * passes in PyInterpreterState * to all those functions, simplifying work on moving types/objects/state to the interpreter * consistent naming conventions help make what's going on more clear * keeping API related to a type in the corresponding header file makes it more obvious where to look for it https://bugs.python.org/issue46008
* bpo-45885: Specialize COMPARE_OP (GH-29734)Dennis Sweeney2021-12-031-0/+14
| | | | | | | * Add COMPARE_OP_ADAPTIVE adaptive instruction. * Add COMPARE_OP_FLOAT_JUMP, COMPARE_OP_INT_JUMP and COMPARE_OP_STR_JUMP specialized instructions. * Introduce and use _PyUnicode_Equal
* bpo-35134: Add Include/cpython/longobject.h (GH-29044)Victor Stinner2021-10-191-0/+1
| | | | | | | | | | Move Include/longobject.h non-limited API to a new Include/cpython/longobject.h header file. Move the following definitions to the internal C API: * _PyLong_DigitValue * _PyLong_FormatAdvancedWriter() * _PyLong_FormatWriter()
* bpo-45467: Fix IncrementalDecoder and StreamReader in the ↵Serhiy Storchaka2021-10-141-20/+44
| | | | | | | | | "raw-unicode-escape" codec (GH-28944) They support now splitting escape sequences between input chunks. Add the third parameter "final" in codecs.raw_unicode_escape_decode(). It is True by default to match the former behavior.
* bpo-45461: Fix IncrementalDecoder and StreamReader in the "unicode-escape" ↵Serhiy Storchaka2021-10-141-12/+37
| | | | | | | | | codec (GH-28939) They support now splitting escape sequences between input chunks. Add the third parameter "final" in codecs.unicode_escape_decode(). It is True by default to match the former behavior.
* Fix typos in the Objects directory (GH-28766)Christian Clauss2021-10-061-2/+2
|
* bpo-45061: Revert unicode_is_singleton() change (GH-28516)Victor Stinner2021-09-221-2/+4
| | | Don't use a loop over 256 items, only checks for a single singleton.
* [codemod] Fix non-matching bracket pairs (GH-28473)Mohamad Mansour2021-09-211-1/+1
| | | | | Co-authored-by: Terry Jan Reedy <tjreedy@udel.edu> Co-authored-by: Serhiy Storchaka <storchaka@gmail.com> Co-authored-by: Łukasz Langa <lukasz@langa.pl>
* bpo-45061: Detect refcount bug on empty string singleton (GH-28504)Victor Stinner2021-09-211-18/+37
| | | | | | | Detect refcount bugs in C extensions when the empty Unicode string singleton is destroyed by mistake. * Move forward declarations to the top of unicodeobject.c. * Simplifiy unicode_is_singleton().
* bpo-44110: Improve string's __getitem__ error message (GH-26042)Miguel Brito2021-06-271-1/+2
|
* Add more const modifiers. (GH-26691)Serhiy Storchaka2021-06-121-1/+1
|
* bpo-44029: Remove Py_UNICODE APIs (GH-25881)Inada Naoki2021-05-071-285/+0
| | | | | | | | | | | | Remove deprecated `Py_UNICODE` APIs: `PyUnicode_Encode`, `PyUnicode_EncodeUTF7`, `PyUnicode_EncodeUTF8`, `PyUnicode_EncodeUTF16`, `PyUnicode_EncodeUTF32`, `PyUnicode_EncodeLatin1`, `PyUnicode_EncodeMBCS`, `PyUnicode_EncodeDecimal`, `PyUnicode_EncodeRawUnicodeEscape`, `PyUnicode_EncodeCharmap`, `PyUnicode_EncodeUnicodeEscape`, `PyUnicode_TransformDecimalToASCII`, `PyUnicode_TranslateCharmap`, `PyUnicodeEncodeError_Create`, `PyUnicodeTranslateError_Create`. See :pep:`393` and :pep:`624` for reference.
* bpo-43667: Fix broken Unicode encoding in non-UTF locales on Solaris (GH-25096)Jakub Kulík2021-04-301-0/+40
|
* bpo-43687: Py_Initialize() creates singletons earlier (GH-25147)Victor Stinner2021-04-021-24/+28
| | | | | Reorganize pycore_interp_init() to initialize singletons before the the first PyType_Ready() call. Fix an issue when Python is configured using --without-doc-strings.
* bpo-43179: Generalise alignment for optimised string routines (GH-24624)Jessica Clarke2021-03-311-16/+6
| | | | | | | | | | | | | | | | | | | | | | | | | * Remove m68k-specific hack from ascii_decode On m68k, alignments of primitives is more relaxed, with 4-byte and 8-byte types only requiring 2-byte alignment, thus using sizeof(size_t) does not work. Instead, use the portable alternative. Note that this is a minimal fix that only relaxes the assertion and the condition for when to use the optimised version remains overly strict. Such issues will be fixed tree-wide in the next commit. NB: In C11 we could use _Alignof(size_t) instead, but for compatibility we use autoconf. * Optimise string routines for architectures with non-natural alignment C only requires that sizeof(x) is a multiple of alignof(x), not that the two are equal. Thus anywhere where we optimise based on alignment we should be using alignof(x) not sizeof(x). This is more annoying than it would be in C11 where we could just use _Alignof(x) (and alignof(x) in C++11), but since we still require only C99 we must plumb the information all the way from autoconf through the various typedefs and defines.
* bpo-35883: Py_DecodeLocale() escapes invalid Unicode characters (GH-24843)Victor Stinner2021-03-171-4/+5
| | | | | | | | | | Python no longer fails at startup with a fatal error if a command line argument contains an invalid Unicode character. The Py_DecodeLocale() function now escapes byte sequences which would be decoded as Unicode characters outside the [U+0000; U+10ffff] range. Use MAX_UNICODE constant in unicodeobject.c.
* bpo-42128: Structural Pattern Matching (PEP 634) (GH-22917)Brandt Bucher2021-02-261-1/+2
| | | | | Co-authored-by: Guido van Rossum <guido@python.org> Co-authored-by: Talin <viridia@gmail.com> Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
* bpo-43268: Pass interp rather than tstate to internal functions (GH-24580)Victor Stinner2021-02-191-10/+10
| | | | | | | | | | | | | | | Pass the current interpreter (interp) rather than the current Python thread state (tstate) to internal functions which only use the interpreter. Modified functions: * _PyXXX_Fini() and _PyXXX_ClearFreeList() functions * _PyEval_SignalAsyncExc(), make_pending_calls() * _PySys_GetObject(), sys_set_object(), sys_set_object_id(), sys_set_object_str() * should_audit(), set_flags_from_config(), make_flags() * _PyAtExit_Call() * init_stdio_encoding() * etc.
* bpo-43268: _Py_IsMainInterpreter() now expects interp (GH-24577)Victor Stinner2021-02-191-1/+1
| | | | The _Py_IsMainInterpreter() function now expects interp rather than tstate.
* Fix compiler warnings regarding loss of data (GH-23983)Pablo Galindo2020-12-291-1/+1
|
* bpo-42745: finalize_interp_types() calls _PyType_Fini() (GH-23953)Victor Stinner2020-12-261-4/+3
| | | | | Call _PyType_Fini() in subinterpreters. Fix reference leaks in subinterpreters.
* bpo-40521: Per-interpreter interned strings (GH-20085)Victor Stinner2020-12-261-61/+28
| | | | | | | | | | | Make the Unicode dictionary of interned strings compatible with subinterpreters. Remove the INTERN_NAME_STRINGS macro in typeobject.c: names are always now interned (even if EXPERIMENTAL_ISOLATED_SUBINTERPRETERS macro is defined). _PyUnicode_ClearInterned() now uses PyDict_Next() to no longer allocate memory, to ensure that the interned dictionary is cleared.
* bpo-39465: Fix _PyUnicode_FromId() for subinterpreters (GH-20058)Victor Stinner2020-12-251-23/+62
| | | | | | | | | | | | | | | | | Make _PyUnicode_FromId() function compatible with subinterpreters. Each interpreter now has an array of identifier objects (interned strings decoded from UTF-8). * Add PyInterpreterState.unicode.identifiers: array of identifiers objects. * Add _PyRuntimeState.unicode_ids used to allocate unique indexes to _Py_Identifier. * Rewrite the _Py_Identifier structure. Microbenchmark on _PyUnicode_FromId(&PyId_a) with _Py_IDENTIFIER(a): [ref] 2.42 ns +- 0.00 ns -> [atomic] 3.39 ns +- 0.00 ns: 1.40x slower This change adds 1 ns per _PyUnicode_FromId() call in average.
* bpo-42431: Fix outdated bytes comments (GH-23458)Serhiy Storchaka2020-12-031-0/+1
| | | | Also move definitions of internal macros F_LJUST etc to private header.
* bpo-42519: Replace PyObject_MALLOC() with PyObject_Malloc() (GH-23587)Victor Stinner2020-12-011-21/+21
| | | | | | | | | No longer use deprecated aliases to functions: * Replace PyObject_MALLOC() with PyObject_Malloc() * Replace PyObject_REALLOC() with PyObject_Realloc() * Replace PyObject_FREE() with PyObject_Free() * Replace PyObject_Del() with PyObject_Free() * Replace PyObject_DEL() with PyObject_Free()
* bpo-42519: Replace PyMem_MALLOC() with PyMem_Malloc() (GH-23586)Victor Stinner2020-12-011-12/+12
| | | | | | | | | | | No longer use deprecated aliases to functions: * Replace PyMem_MALLOC() with PyMem_Malloc() * Replace PyMem_REALLOC() with PyMem_Realloc() * Replace PyMem_FREE() with PyMem_Free() * Replace PyMem_Del() with PyMem_Free() * Replace PyMem_DEL() with PyMem_Free() Modify also the PyMem_DEL() macro to use directly PyMem_Free().
* bpo-40998: Address compiler warnings found by ubsan (GH-20929)Christian Heimes2020-11-181-1/+5
| | | | | Signed-off-by: Christian Heimes <christian@python.org> Automerge-Triggered-By: GH:tiran