summaryrefslogtreecommitdiffstats
path: root/Objects/unicodeobject.c
Commit message (Collapse)AuthorAgeFilesLines
...
* [codemod] Fix non-matching bracket pairs (GH-28473)Mohamad Mansour2021-09-211-1/+1
| | | | | Co-authored-by: Terry Jan Reedy <tjreedy@udel.edu> Co-authored-by: Serhiy Storchaka <storchaka@gmail.com> Co-authored-by: Łukasz Langa <lukasz@langa.pl>
* bpo-45061: Detect refcount bug on empty string singleton (GH-28504)Victor Stinner2021-09-211-18/+37
| | | | | | | Detect refcount bugs in C extensions when the empty Unicode string singleton is destroyed by mistake. * Move forward declarations to the top of unicodeobject.c. * Simplifiy unicode_is_singleton().
* bpo-44110: Improve string's __getitem__ error message (GH-26042)Miguel Brito2021-06-271-1/+2
|
* Add more const modifiers. (GH-26691)Serhiy Storchaka2021-06-121-1/+1
|
* bpo-44029: Remove Py_UNICODE APIs (GH-25881)Inada Naoki2021-05-071-285/+0
| | | | | | | | | | | | Remove deprecated `Py_UNICODE` APIs: `PyUnicode_Encode`, `PyUnicode_EncodeUTF7`, `PyUnicode_EncodeUTF8`, `PyUnicode_EncodeUTF16`, `PyUnicode_EncodeUTF32`, `PyUnicode_EncodeLatin1`, `PyUnicode_EncodeMBCS`, `PyUnicode_EncodeDecimal`, `PyUnicode_EncodeRawUnicodeEscape`, `PyUnicode_EncodeCharmap`, `PyUnicode_EncodeUnicodeEscape`, `PyUnicode_TransformDecimalToASCII`, `PyUnicode_TranslateCharmap`, `PyUnicodeEncodeError_Create`, `PyUnicodeTranslateError_Create`. See :pep:`393` and :pep:`624` for reference.
* bpo-43667: Fix broken Unicode encoding in non-UTF locales on Solaris (GH-25096)Jakub Kulík2021-04-301-0/+40
|
* bpo-43687: Py_Initialize() creates singletons earlier (GH-25147)Victor Stinner2021-04-021-24/+28
| | | | | Reorganize pycore_interp_init() to initialize singletons before the the first PyType_Ready() call. Fix an issue when Python is configured using --without-doc-strings.
* bpo-43179: Generalise alignment for optimised string routines (GH-24624)Jessica Clarke2021-03-311-16/+6
| | | | | | | | | | | | | | | | | | | | | | | | | * Remove m68k-specific hack from ascii_decode On m68k, alignments of primitives is more relaxed, with 4-byte and 8-byte types only requiring 2-byte alignment, thus using sizeof(size_t) does not work. Instead, use the portable alternative. Note that this is a minimal fix that only relaxes the assertion and the condition for when to use the optimised version remains overly strict. Such issues will be fixed tree-wide in the next commit. NB: In C11 we could use _Alignof(size_t) instead, but for compatibility we use autoconf. * Optimise string routines for architectures with non-natural alignment C only requires that sizeof(x) is a multiple of alignof(x), not that the two are equal. Thus anywhere where we optimise based on alignment we should be using alignof(x) not sizeof(x). This is more annoying than it would be in C11 where we could just use _Alignof(x) (and alignof(x) in C++11), but since we still require only C99 we must plumb the information all the way from autoconf through the various typedefs and defines.
* bpo-35883: Py_DecodeLocale() escapes invalid Unicode characters (GH-24843)Victor Stinner2021-03-171-4/+5
| | | | | | | | | | Python no longer fails at startup with a fatal error if a command line argument contains an invalid Unicode character. The Py_DecodeLocale() function now escapes byte sequences which would be decoded as Unicode characters outside the [U+0000; U+10ffff] range. Use MAX_UNICODE constant in unicodeobject.c.
* bpo-42128: Structural Pattern Matching (PEP 634) (GH-22917)Brandt Bucher2021-02-261-1/+2
| | | | | Co-authored-by: Guido van Rossum <guido@python.org> Co-authored-by: Talin <viridia@gmail.com> Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
* bpo-43268: Pass interp rather than tstate to internal functions (GH-24580)Victor Stinner2021-02-191-10/+10
| | | | | | | | | | | | | | | Pass the current interpreter (interp) rather than the current Python thread state (tstate) to internal functions which only use the interpreter. Modified functions: * _PyXXX_Fini() and _PyXXX_ClearFreeList() functions * _PyEval_SignalAsyncExc(), make_pending_calls() * _PySys_GetObject(), sys_set_object(), sys_set_object_id(), sys_set_object_str() * should_audit(), set_flags_from_config(), make_flags() * _PyAtExit_Call() * init_stdio_encoding() * etc.
* bpo-43268: _Py_IsMainInterpreter() now expects interp (GH-24577)Victor Stinner2021-02-191-1/+1
| | | | The _Py_IsMainInterpreter() function now expects interp rather than tstate.
* Fix compiler warnings regarding loss of data (GH-23983)Pablo Galindo2020-12-291-1/+1
|
* bpo-42745: finalize_interp_types() calls _PyType_Fini() (GH-23953)Victor Stinner2020-12-261-4/+3
| | | | | Call _PyType_Fini() in subinterpreters. Fix reference leaks in subinterpreters.
* bpo-40521: Per-interpreter interned strings (GH-20085)Victor Stinner2020-12-261-61/+28
| | | | | | | | | | | Make the Unicode dictionary of interned strings compatible with subinterpreters. Remove the INTERN_NAME_STRINGS macro in typeobject.c: names are always now interned (even if EXPERIMENTAL_ISOLATED_SUBINTERPRETERS macro is defined). _PyUnicode_ClearInterned() now uses PyDict_Next() to no longer allocate memory, to ensure that the interned dictionary is cleared.
* bpo-39465: Fix _PyUnicode_FromId() for subinterpreters (GH-20058)Victor Stinner2020-12-251-23/+62
| | | | | | | | | | | | | | | | | Make _PyUnicode_FromId() function compatible with subinterpreters. Each interpreter now has an array of identifier objects (interned strings decoded from UTF-8). * Add PyInterpreterState.unicode.identifiers: array of identifiers objects. * Add _PyRuntimeState.unicode_ids used to allocate unique indexes to _Py_Identifier. * Rewrite the _Py_Identifier structure. Microbenchmark on _PyUnicode_FromId(&PyId_a) with _Py_IDENTIFIER(a): [ref] 2.42 ns +- 0.00 ns -> [atomic] 3.39 ns +- 0.00 ns: 1.40x slower This change adds 1 ns per _PyUnicode_FromId() call in average.
* bpo-42431: Fix outdated bytes comments (GH-23458)Serhiy Storchaka2020-12-031-0/+1
| | | | Also move definitions of internal macros F_LJUST etc to private header.
* bpo-42519: Replace PyObject_MALLOC() with PyObject_Malloc() (GH-23587)Victor Stinner2020-12-011-21/+21
| | | | | | | | | No longer use deprecated aliases to functions: * Replace PyObject_MALLOC() with PyObject_Malloc() * Replace PyObject_REALLOC() with PyObject_Realloc() * Replace PyObject_FREE() with PyObject_Free() * Replace PyObject_Del() with PyObject_Free() * Replace PyObject_DEL() with PyObject_Free()
* bpo-42519: Replace PyMem_MALLOC() with PyMem_Malloc() (GH-23586)Victor Stinner2020-12-011-12/+12
| | | | | | | | | | | No longer use deprecated aliases to functions: * Replace PyMem_MALLOC() with PyMem_Malloc() * Replace PyMem_REALLOC() with PyMem_Realloc() * Replace PyMem_FREE() with PyMem_Free() * Replace PyMem_Del() with PyMem_Free() * Replace PyMem_DEL() with PyMem_Free() Modify also the PyMem_DEL() macro to use directly PyMem_Free().
* bpo-40998: Address compiler warnings found by ubsan (GH-20929)Christian Heimes2020-11-181-1/+5
| | | | | Signed-off-by: Christian Heimes <christian@python.org> Automerge-Triggered-By: GH:tiran
* Fix typo in unicodeobject.c (GH-23180)Ikko Ashimine2020-11-101-1/+1
| | | | | exeeds -> exceeds Automerge-Triggered-By: GH:Mariatta
* bpo-42157: unicodedata avoids references to UCD_Type (GH-22990)Victor Stinner2020-10-261-2/+1
| | | | | | | | | | * UCD_Check() uses PyModule_Check() * Simplify the internal _PyUnicode_Name_CAPI structure: * Remove size and state members * Remove state and self parameters of getcode() and getname() functions * Remove global_module_state
* bpo-1635741: _PyUnicode_Name_CAPI moves to internal C API (GH-22713)Victor Stinner2020-10-261-15/+16
| | | | | | | | | | The private _PyUnicode_Name_CAPI structure of the PyCapsule API unicodedata.ucnhash_CAPI moves to the internal C API. Moreover, the structure gets a new state member which must be passed to the getcode() and getname() functions. * Move Include/ucnhash.h to Include/internal/pycore_ucnhash.h * unicodedata module is now built with Py_BUILD_CORE_MODULE. * unicodedata: move hashAPI variable into unicodedata_module_state.
* bpo-38252: Use 8-byte step to detect ASCII sequence in 64bit Windows build ↵Ma Lin2020-10-181-17/+17
| | | | (GH-16334)
* bpo-42065: Fix incorrectly formatted _codecs.charmap_decode error message ↵Max Bernstein2020-10-171-1/+1
| | | | (GH-19940)
* bpo-41974: Remove complex.__float__, complex.__floordiv__, etc (GH-22593)Serhiy Storchaka2020-10-091-1/+1
| | | | | | Remove complex special methods __int__, __float__, __floordiv__, __mod__, __divmod__, __rfloordiv__, __rmod__ and __rdivmod__ which always raised a TypeError.
* bpo-41909: Enable previously disabled recursion checks. (GH-22536)Serhiy Storchaka2020-10-041-2/+0
| | | | | | | | | | | Enable recursion checks which were disabled when get __bases__ of non-type objects in issubclass() and isinstance() and when intern strings. It fixes a stack overflow when getting __bases__ leads to infinite recursion. Originally recursion checks was disabled for PyDict_GetItem() which silences all errors including the one raised in case of detected recursion and can return incorrect result. But now the code uses PyDict_GetItemWithError() and PyDict_SetDefault() instead.
* bpo-41692: Deprecate PyUnicode_InternImmortal() (GH-22486)Victor Stinner2020-10-021-0/+9
| | | | The PyUnicode_InternImmortal() function is now deprecated and will be removed in Python 3.12: use PyUnicode_InternInPlace() instead.
* bpo-40521: Fix PyUnicode_InternInPlace() (GH-22376)Victor Stinner2020-09-231-0/+4
| | | | | Fix PyUnicode_InternInPlace() when the INTERNED_STRINGS macro is not defined (when the EXPERIMENTAL_ISOLATED_SUBINTERPRETERS macro is defined).
* bpo-1635741: Port _string module to multi-phase init (GH-22148)Victor Stinner2020-09-081-9/+5
| | | | Port the _string extension module to the multi-phase initialization API (PEP 489).
* bpo-41334: Convert constructors of str, bytes and bytearray to Argument ↵Serhiy Storchaka2020-07-201-27/+30
| | | | Clinic (GH-21535)
* bpo-36346: Make using the legacy Unicode C API optional (GH-21437)Serhiy Storchaka2020-07-101-23/+58
| | | | Add compile time option USE_UNICODE_WCHAR_CACHE. Setting it to 0 makes the interpreter not using the wchar_t cache and the legacy Unicode C API.
* bpo-39573: Use the Py_TYPE() macro (GH-21433)Victor Stinner2020-07-101-2/+2
| | | Replace obj->ob_type with Py_TYPE(obj).
* bpo-36346: Undeprecate private function _PyUnicode_AsUnicode(). (GH-21336)Serhiy Storchaka2020-07-051-6/+0
|
* bpo-1635741: Fix unicode_dealloc() for mortal interned string (GH-21270)Victor Stinner2020-07-031-4/+14
| | | | When unicode_dealloc() is called on a mortal interned string, the string reference counter is now reset at zero.
* bpo-1635741: Release Unicode interned strings at exit (GH-21269)Victor Stinner2020-07-011-32/+28
| | | | | | | * PyUnicode_InternInPlace() now ensures that interned strings are ready. * Add _PyUnicode_ClearInterned(). * Py_Finalize() now releases Unicode interned strings: call _PyUnicode_ClearInterned().
* bpo-36346: Raise DeprecationWarning when creating legacy Unicode (GH-20933)Inada Naoki2020-06-301-3/+20
|
* bpo-36346: Prepare for removing the legacy Unicode C API (AC only). (GH-21223)Serhiy Storchaka2020-06-301-0/+74
|
* bpo-41123: Remove PyUnicode_AsUnicodeCopy (GH-21209)Inada Naoki2020-06-301-33/+0
|
* bpo-37999: Simplify the conversion code for %c, %d, %x, etc. (GH-20437)Serhiy Storchaka2020-06-291-23/+11
| | | | Since PyLong_AsLong() no longer use __int__, explicit call of PyNumber_Index() before it is no longer needed.
* bpo-41123: Remove PyUnicode_GetMax() (GH-21192)Inada Naoki2020-06-291-14/+0
|
* bpo-41123: Remove Py_UNICODE_str* functions (GH-21164)Inada Naoki2020-06-271-88/+0
| | | They are undocumented and deprecated since Python 3.3.
* bpo-40521: Optimize PyBytes_FromStringAndSize(str, 0) (GH-21142)Victor Stinner2020-06-251-24/+35
| | | | | | | | | | | | | | | | Always create the empty bytes string singleton. Optimize PyBytes_FromStringAndSize(str, 0): it no longer has to check if the empty string singleton was created or not, it is always available. Add functions: * _PyBytes_Init() * bytes_get_empty(), bytes_new_empty() * bytes_create_empty_string_singleton() * unicode_create_empty_string_singleton() _Py_unicode_state: rename empty structure member to empty_string.
* bpo-40521: Make Unicode latin1 singletons per interpreter (GH-21101)Victor Stinner2020-06-241-42/+32
| | | | | | | | | Each interpreter now has its own Unicode latin1 singletons. Remove "ifdef EXPERIMENTAL_ISOLATED_SUBINTERPRETERS" and "ifdef LATIN1_SINGLETONS": always enable latin1 singletons. Optimize unicode_result_ready(): only attempt to get a latin1 singleton for PyUnicode_1BYTE_KIND.
* bpo-40521: Optimize PyUnicode_New(0, maxchar) (GH-21099)Victor Stinner2020-06-231-55/+25
| | | | | | Functions of unicodeobject.c, like PyUnicode_New(), no longer check if the empty Unicode singleton has been initialized or not. Consider that it is always initialized. The Unicode API must not be used before _PyUnicode_Init() or after _PyUnicode_Fini().
* bpo-40521: Make empty Unicode string per interpreter (GH-21096)Victor Stinner2020-06-231-73/+117
| | | Each interpreter now has its own empty Unicode string singleton.
* bpo-36346: Add Py_DEPRECATED to deprecated unicode APIs (GH-20878)Inada Naoki2020-06-171-0/+23
| | | | Co-authored-by: Kyle Stanley <aeros167@gmail.com> Co-authored-by: Victor Stinner <vstinner@python.org>
* bpo-40989: PyObject_INIT() becomes an alias to PyObject_Init() (GH-20901)Victor Stinner2020-06-151-6/+7
| | | | | | | | | | | | | | The PyObject_INIT() and PyObject_INIT_VAR() macros become aliases to, respectively, PyObject_Init() and PyObject_InitVar() functions. Rename _PyObject_INIT() and _PyObject_INIT_VAR() static inline functions to, respectively, _PyObject_Init() and _PyObject_InitVar(), and move them to pycore_object.h. Remove their return value: their return type becomes void. The _datetime module is now built with the Py_BUILD_CORE_MODULE macro defined. Remove an outdated comment on _Py_tracemalloc_config.
* bpo-40943: Replace PY_FORMAT_SIZE_T with "z" (GH-20781)Victor Stinner2020-06-101-35/+33
| | | | | | | The PEP 353, written in 2005, introduced PY_FORMAT_SIZE_T. Python no longer supports macOS 10.4 and Visual Studio 2010, but requires more recent macOS and Visual Studio versions. In 2020 with Python 3.10, it is now safe to use directly "%zu" to format size_t and "%zi" to format Py_ssize_t.
* bpo-40881: Fix unicode_release_interned() (GH-20699)Victor Stinner2020-06-071-2/+2
| | | Use Py_SET_REFCNT() in unicode_release_interned().