summaryrefslogtreecommitdiffstats
path: root/Objects/unicodeobject.c
Commit message (Collapse)AuthorAgeFilesLines
...
* Fix typo in unicodeobject.c (GH-23180)Ikko Ashimine2020-11-101-1/+1
| | | | | exeeds -> exceeds Automerge-Triggered-By: GH:Mariatta
* bpo-42157: unicodedata avoids references to UCD_Type (GH-22990)Victor Stinner2020-10-261-2/+1
| | | | | | | | | | * UCD_Check() uses PyModule_Check() * Simplify the internal _PyUnicode_Name_CAPI structure: * Remove size and state members * Remove state and self parameters of getcode() and getname() functions * Remove global_module_state
* bpo-1635741: _PyUnicode_Name_CAPI moves to internal C API (GH-22713)Victor Stinner2020-10-261-15/+16
| | | | | | | | | | The private _PyUnicode_Name_CAPI structure of the PyCapsule API unicodedata.ucnhash_CAPI moves to the internal C API. Moreover, the structure gets a new state member which must be passed to the getcode() and getname() functions. * Move Include/ucnhash.h to Include/internal/pycore_ucnhash.h * unicodedata module is now built with Py_BUILD_CORE_MODULE. * unicodedata: move hashAPI variable into unicodedata_module_state.
* bpo-38252: Use 8-byte step to detect ASCII sequence in 64bit Windows build ↵Ma Lin2020-10-181-17/+17
| | | | (GH-16334)
* bpo-42065: Fix incorrectly formatted _codecs.charmap_decode error message ↵Max Bernstein2020-10-171-1/+1
| | | | (GH-19940)
* bpo-41974: Remove complex.__float__, complex.__floordiv__, etc (GH-22593)Serhiy Storchaka2020-10-091-1/+1
| | | | | | Remove complex special methods __int__, __float__, __floordiv__, __mod__, __divmod__, __rfloordiv__, __rmod__ and __rdivmod__ which always raised a TypeError.
* bpo-41909: Enable previously disabled recursion checks. (GH-22536)Serhiy Storchaka2020-10-041-2/+0
| | | | | | | | | | | Enable recursion checks which were disabled when get __bases__ of non-type objects in issubclass() and isinstance() and when intern strings. It fixes a stack overflow when getting __bases__ leads to infinite recursion. Originally recursion checks was disabled for PyDict_GetItem() which silences all errors including the one raised in case of detected recursion and can return incorrect result. But now the code uses PyDict_GetItemWithError() and PyDict_SetDefault() instead.
* bpo-41692: Deprecate PyUnicode_InternImmortal() (GH-22486)Victor Stinner2020-10-021-0/+9
| | | | The PyUnicode_InternImmortal() function is now deprecated and will be removed in Python 3.12: use PyUnicode_InternInPlace() instead.
* bpo-40521: Fix PyUnicode_InternInPlace() (GH-22376)Victor Stinner2020-09-231-0/+4
| | | | | Fix PyUnicode_InternInPlace() when the INTERNED_STRINGS macro is not defined (when the EXPERIMENTAL_ISOLATED_SUBINTERPRETERS macro is defined).
* bpo-1635741: Port _string module to multi-phase init (GH-22148)Victor Stinner2020-09-081-9/+5
| | | | Port the _string extension module to the multi-phase initialization API (PEP 489).
* bpo-41334: Convert constructors of str, bytes and bytearray to Argument ↵Serhiy Storchaka2020-07-201-27/+30
| | | | Clinic (GH-21535)
* bpo-36346: Make using the legacy Unicode C API optional (GH-21437)Serhiy Storchaka2020-07-101-23/+58
| | | | Add compile time option USE_UNICODE_WCHAR_CACHE. Setting it to 0 makes the interpreter not using the wchar_t cache and the legacy Unicode C API.
* bpo-39573: Use the Py_TYPE() macro (GH-21433)Victor Stinner2020-07-101-2/+2
| | | Replace obj->ob_type with Py_TYPE(obj).
* bpo-36346: Undeprecate private function _PyUnicode_AsUnicode(). (GH-21336)Serhiy Storchaka2020-07-051-6/+0
|
* bpo-1635741: Fix unicode_dealloc() for mortal interned string (GH-21270)Victor Stinner2020-07-031-4/+14
| | | | When unicode_dealloc() is called on a mortal interned string, the string reference counter is now reset at zero.
* bpo-1635741: Release Unicode interned strings at exit (GH-21269)Victor Stinner2020-07-011-32/+28
| | | | | | | * PyUnicode_InternInPlace() now ensures that interned strings are ready. * Add _PyUnicode_ClearInterned(). * Py_Finalize() now releases Unicode interned strings: call _PyUnicode_ClearInterned().
* bpo-36346: Raise DeprecationWarning when creating legacy Unicode (GH-20933)Inada Naoki2020-06-301-3/+20
|
* bpo-36346: Prepare for removing the legacy Unicode C API (AC only). (GH-21223)Serhiy Storchaka2020-06-301-0/+74
|
* bpo-41123: Remove PyUnicode_AsUnicodeCopy (GH-21209)Inada Naoki2020-06-301-33/+0
|
* bpo-37999: Simplify the conversion code for %c, %d, %x, etc. (GH-20437)Serhiy Storchaka2020-06-291-23/+11
| | | | Since PyLong_AsLong() no longer use __int__, explicit call of PyNumber_Index() before it is no longer needed.
* bpo-41123: Remove PyUnicode_GetMax() (GH-21192)Inada Naoki2020-06-291-14/+0
|
* bpo-41123: Remove Py_UNICODE_str* functions (GH-21164)Inada Naoki2020-06-271-88/+0
| | | They are undocumented and deprecated since Python 3.3.
* bpo-40521: Optimize PyBytes_FromStringAndSize(str, 0) (GH-21142)Victor Stinner2020-06-251-24/+35
| | | | | | | | | | | | | | | | Always create the empty bytes string singleton. Optimize PyBytes_FromStringAndSize(str, 0): it no longer has to check if the empty string singleton was created or not, it is always available. Add functions: * _PyBytes_Init() * bytes_get_empty(), bytes_new_empty() * bytes_create_empty_string_singleton() * unicode_create_empty_string_singleton() _Py_unicode_state: rename empty structure member to empty_string.
* bpo-40521: Make Unicode latin1 singletons per interpreter (GH-21101)Victor Stinner2020-06-241-42/+32
| | | | | | | | | Each interpreter now has its own Unicode latin1 singletons. Remove "ifdef EXPERIMENTAL_ISOLATED_SUBINTERPRETERS" and "ifdef LATIN1_SINGLETONS": always enable latin1 singletons. Optimize unicode_result_ready(): only attempt to get a latin1 singleton for PyUnicode_1BYTE_KIND.
* bpo-40521: Optimize PyUnicode_New(0, maxchar) (GH-21099)Victor Stinner2020-06-231-55/+25
| | | | | | Functions of unicodeobject.c, like PyUnicode_New(), no longer check if the empty Unicode singleton has been initialized or not. Consider that it is always initialized. The Unicode API must not be used before _PyUnicode_Init() or after _PyUnicode_Fini().
* bpo-40521: Make empty Unicode string per interpreter (GH-21096)Victor Stinner2020-06-231-73/+117
| | | Each interpreter now has its own empty Unicode string singleton.
* bpo-36346: Add Py_DEPRECATED to deprecated unicode APIs (GH-20878)Inada Naoki2020-06-171-0/+23
| | | | Co-authored-by: Kyle Stanley <aeros167@gmail.com> Co-authored-by: Victor Stinner <vstinner@python.org>
* bpo-40989: PyObject_INIT() becomes an alias to PyObject_Init() (GH-20901)Victor Stinner2020-06-151-6/+7
| | | | | | | | | | | | | | The PyObject_INIT() and PyObject_INIT_VAR() macros become aliases to, respectively, PyObject_Init() and PyObject_InitVar() functions. Rename _PyObject_INIT() and _PyObject_INIT_VAR() static inline functions to, respectively, _PyObject_Init() and _PyObject_InitVar(), and move them to pycore_object.h. Remove their return value: their return type becomes void. The _datetime module is now built with the Py_BUILD_CORE_MODULE macro defined. Remove an outdated comment on _Py_tracemalloc_config.
* bpo-40943: Replace PY_FORMAT_SIZE_T with "z" (GH-20781)Victor Stinner2020-06-101-35/+33
| | | | | | | The PEP 353, written in 2005, introduced PY_FORMAT_SIZE_T. Python no longer supports macOS 10.4 and Visual Studio 2010, but requires more recent macOS and Visual Studio versions. In 2020 with Python 3.10, it is now safe to use directly "%zu" to format size_t and "%zi" to format Py_ssize_t.
* bpo-40881: Fix unicode_release_interned() (GH-20699)Victor Stinner2020-06-071-2/+2
| | | Use Py_SET_REFCNT() in unicode_release_interned().
* bpo-39465: Cleanup _PyUnicode_FromId() code (GH-20595)Victor Stinner2020-06-021-10/+16
| | | Work on a local variable before filling _Py_Identifier members.
* bpo-40792: Make the result of PyNumber_Index() always having exact type int. ↵Serhiy Storchaka2020-05-281-1/+1
| | | | | | | | | | | | (GH-20443) Previously, the result could have been an instance of a subclass of int. Also revert bpo-26202 and make attributes start, stop and step of the range object having exact type int. Add private function _PyNumber_Index() which preserves the old behavior of PyNumber_Index() for performance to use it in the conversion functions like PyLong_AsLong().
* bpo-40521: Add PyInterpreterState.unicode (GH-20081)Victor Stinner2020-05-131-31/+33
| | | | | | | Move PyInterpreterState.fs_codec into a new PyInterpreterState.unicode structure. Give a name to the fs_codec structure and use this structure in unicodeobject.c.
* bpo-39465: Remove _PyUnicode_ClearStaticStrings() from C API (GH-20078)Victor Stinner2020-05-131-3/+3
| | | | Remove the _PyUnicode_ClearStaticStrings() function from the C API. Make the function fully private (declare it with "static").
* bpo-40596: Fix str.isidentifier() for non-canonicalized strings containing ↵Serhiy Storchaka2020-05-121-4/+22
| | | | non-BMP characters on Windows. (GH-20053)
* bpo-40593: Improve syntax errors for invalid characters in source code. ↵Serhiy Storchaka2020-05-121-23/+41
| | | | (GH-20033)
* bpo-40521: Disable Unicode caches in isolated subinterpreters (GH-19933)Victor Stinner2020-05-051-15/+63
| | | | | | | When Python is built in the experimental isolated subinterpreters mode, disable Unicode singletons and Unicode interned strings since they are shared by all interpreters. Temporary workaround until these caches are made per-interpreter.
* bpo-39939: Add str.removeprefix and str.removesuffix (GH-18939)sweeneyde2020-04-221-0/+57
| | | | | Added str.removeprefix and str.removesuffix methods and corresponding bytes, bytearray, and collections.UserString methods to remove affixes from a string if present. See PEP 616 for a full description.
* bpo-40268: Remove a few pycore_pystate.h includes (GH-19510)Victor Stinner2020-04-141-2/+3
|
* bpo-40268: Rename _PyInterpreterState_GET_UNSAFE() (GH-19509)Victor Stinner2020-04-141-4/+4
| | | | | | | Rename _PyInterpreterState_GET_UNSAFE() to _PyInterpreterState_GET() for consistency with _PyThreadState_GET() and to have a shorter name (help to fit into 80 columns). Add also "assert(tstate != NULL);" to the function.
* bpo-40268: Add _PyInterpreterState_GetConfig() (GH-19492)Victor Stinner2020-04-131-7/+9
| | | | | | | | Don't access PyInterpreterState.config member directly anymore, but use new functions: * _PyInterpreterState_GetConfig() * _PyInterpreterState_SetConfig() * _Py_GetConfig()
* bpo-39943: Add the const qualifier to pointers on non-mutable PyBytes data. ↵Serhiy Storchaka2020-04-121-3/+3
| | | | (GH-19472)
* bpo-39943: Add the const qualifier to pointers on non-mutable PyUnicode ↵Serhiy Storchaka2020-04-111-133/+162
| | | | data. (GH-19345)
* bpo-40170: Add _PyIndex_Check() internal function (GH-19426)Victor Stinner2020-04-081-1/+2
| | | | | | | | | Add _PyIndex_Check() function to the internal C API: fast inlined verson of PyIndex_Check(). Add Include/internal/pycore_abstract.h header file. Replace PyIndex_Check() with _PyIndex_Check() in C files of Objects and Python subdirectories.
* bpo-37388: Don't check encoding/errors during finalization (GH-19409)Victor Stinner2020-04-071-0/+6
| | | | | | | | | str.encode() and str.decode() no longer check the encoding and errors in development mode or in debug mode during Python finalization. The codecs machinery can no longer work on very late calls to str.encode() and str.decode(). This change should help to call _PyObject_Dump() to debug during late Python finalization.
* bpo-40130: _PyUnicode_AsKind() should not be exported. (GH-19265)Serhiy Storchaka2020-04-011-49/+46
| | | | | Make it a static function, and pass known attributes (kind, data, length) instead of the PyUnicode object.
* Revert "bpo-39087: Add _PyUnicode_GetUTF8Buffer()" (GH-18985)Inada Naoki2020-03-141-35/+0
| | | | | | | * Revert "bpo-39087: Add _PyUnicode_GetUTF8Buffer() (GH-17659)" This reverts commit c7ad974d341d3edb6b9d2a2dcae4d3d4794ada6b. * Update unicodeobject.h
* bpo-39087: Add _PyUnicode_GetUTF8Buffer() (GH-17659)Inada Naoki2020-03-141-0/+35
| | | Co-authored-by: Victor Stinner <vstinner@python.org>
* bpo-39573: Finish converting to new Py_IS_TYPE() macro (GH-18601)Andy Lester2020-03-041-2/+2
|
* bpo-39087: Optimize PyUnicode_AsUTF8AndSize() (GH-18327)Inada Naoki2020-02-271-25/+73
| | | Avoid using temporary bytes object.