summaryrefslogtreecommitdiffstats
path: root/Objects/unicodeobject.c
Commit message (Collapse)AuthorAgeFilesLines
* bpo-35883: Py_DecodeLocale() escapes invalid Unicode characters (GH-24843) ↵Miss Islington (bot)2021-03-291-4/+5
| | | | | | | | | | | | | | | | | (GH-24906) Python no longer fails at startup with a fatal error if a command line argument contains an invalid Unicode character. The Py_DecodeLocale() function now escapes byte sequences which would be decoded as Unicode characters outside the [U+0000; U+10ffff] range. Use MAX_UNICODE constant in unicodeobject.c. (cherry picked from commit 9976834f807ea63ca51bc4f89be457d734148682) Co-authored-by: Victor Stinner <vstinner@python.org> Co-authored-by: Victor Stinner <vstinner@python.org>
* bpo-42065: Fix incorrectly formatted _codecs.charmap_decode error message ↵Miss Skeleton (bot)2020-10-181-1/+1
| | | | | | | (GH-19940) (cherry picked from commit 3635388f52b42e5280229104747962117104c453) Co-authored-by: Max Bernstein <tekknolagi@users.noreply.github.com>
* [3.8] bpo-41909: Enable previously disabled recursion checks. (GH-22536) ↵Serhiy Storchaka2020-10-041-2/+0
| | | | | | | | | | | | | | | (GH-22551) Enable recursion checks which were disabled when get __bases__ of non-type objects in issubclass() and isinstance() and when intern strings. It fixes a stack overflow when getting __bases__ leads to infinite recursion. Originally recursion checks was disabled for PyDict_GetItem() which silences all errors including the one raised in case of detected recursion and can return incorrect result. But now the code uses PyDict_GetItemWithError() and PyDict_SetDefault() instead. (cherry picked from commit 9ece9cd65cdeb0a1f6e60475bbd0219161c348ac)
* bpo-39605: Remove a cast that causes a warning. (GH-18473)Miss Islington (bot)2020-02-121-1/+1
| | | | | (cherry picked from commit 95905ce0f41fd42eb1ef60ddb83f057401c3d52f) Co-authored-by: Benjamin Peterson <benjamin@python.org>
* closes bpo-39605: Fix some casts to not cast away const. (GH-18453)Miss Islington (bot)2020-02-121-15/+15
| | | | | | | | | | | | | | | | | | gcc -Wcast-qual turns up a number of instances of casting away constness of pointers. Some of these can be safely modified, by either: Adding the const to the type cast, as in: - return _PyUnicode_FromUCS1((unsigned char*)s, size); + return _PyUnicode_FromUCS1((const unsigned char*)s, size); or, Removing the cast entirely, because it's not necessary (but probably was at one time), as in: - PyDTrace_FUNCTION_ENTRY((char *)filename, (char *)funcname, lineno); + PyDTrace_FUNCTION_ENTRY(filename, funcname, lineno); These changes will not change code, but they will make it much easier to check for errors in consts (cherry picked from commit e6be9b59a911626d6597fe148c32f0342bd2bd24) Co-authored-by: Andy Lester <andy@petdance.com>
* [3.8] bpo-36389: Backport debug enhancements from master (GH-16796)Victor Stinner2019-10-151-42/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * bpo-36389: _PyObject_CheckConsistency() available in release mode (GH-16612) bpo-36389, bpo-38376: The _PyObject_CheckConsistency() function is now also available in release mode. For example, it can be used to debug a crash in the visit_decref() function of the GC. Modify the following functions to also work in release mode: * _PyDict_CheckConsistency() * _PyObject_CheckConsistency() * _PyType_CheckConsistency() * _PyUnicode_CheckConsistency() Other changes: * _PyMem_IsPtrFreed(ptr) now also returns 1 if ptr is NULL (equals to 0). * _PyBytesWriter_CheckConsistency() now returns 1 and is only used with assert(). * Reorder _PyObject_Dump() to write safe fields first, and only attempt to render repr() at the end. (cherry picked from commit 6876257eaabdb30f27ebcbd7d2557278ce2e5705) * bpo-36389: Fix _PyBytesWriter in release mode (GH-16624) Fix _PyBytesWriter API when Python is built in release mode with assertions. (cherry picked from commit 60ec6efd96d95476fe5e38c491491add04f026e5) * bpo-38070: Enhance visit_decref() debug trace (GH-16631) subtract_refs() now pass the parent object to visit_decref() which pass it to _PyObject_ASSERT(). So if the "is freed" assertion fails, the parent is used in debug trace, rather than the freed object. The parent object is more likely to contain useful information. Freed objects cannot be inspected are are displayed as "<object at xxx is freed>" with no other detail. (cherry picked from commit 4d5f94b8cd20f804c7868c5395a15aa6032f874c) * Fix also a typo in PYMEM_DEADBYTE macro comment * bpo-36389: Add newline to _PyObject_AssertFailed() (GH-16629) Add a newline between the verbose object dump and the Py_FatalError() logs for readability. (cherry picked from commit 7775349895088a7ae68cecf0c74cf817f15e2c74)
* bpo-38409: Fix grammar in str.strip() docstring (GH-16682) (GH-16684)Miss Islington (bot)2019-10-091-2/+2
| | | (cherry picked from commit 09895c27cd8ff60563a794016e8c099bc897cc74)
* bpo-38236: Dump path config at first import error (GH-16300) (GH-16332)Victor Stinner2019-09-231-8/+11
| | | | | | Python now dumps path configuration if it fails to import the Python codecs of the filesystem and stdio encodings. (cherry picked from commit fcdb027234566c4d506d6d753c7d5638490fb088)
* [3.8] bpo-37206: Unrepresentable default values no longer represented as ↵Serhiy Storchaka2019-09-141-5/+5
| | | | | | | | | | None. (GH-13933) (GH-16141) In ArgumentClinic, value "NULL" should now be used only for unrepresentable default values (like in the optional third parameter of getattr). "None" should be used if None is accepted as argument and passing None has the same effect as not passing the argument at all. (cherry picked from commit 279f44678c8b84a183f9eeb85e0b086228154497) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
* Fix unused variable and signed/unsigned warnings (GH-15537) (GH-15551)Miss Islington (bot)2019-08-271-0/+6
| | | | | (cherry picked from commit 0138c4ceab1e10d42d0aa962d2ae079b46da7671) Co-authored-by: Raymond Hettinger <rhettinger@users.noreply.github.com>
* bpo-36311: Fixes decoding multibyte characters around chunk boundaries and ↵Miss Islington (bot)2019-08-211-6/+10
| | | | | | | improves decoding performance (GH-15083) (cherry picked from commit 7ebdda0dbee7df6f0c945a7e1e623e47676e112d) Co-authored-by: Steve Dower <steve.dower@python.org>
* bpo-24214: Fixed the UTF-8 and UTF-16 incremental decoders. (GH-14304)Miss Islington (bot)2019-06-251-3/+7
| | | | | | | | | * The UTF-8 incremental decoders fails now fast if encounter a sequence that can't be handled by the error handler. * The UTF-16 incremental decoders with the surrogatepass error handler decodes now a lone low surrogate with final=False. (cherry picked from commit 894263ba80af4b7733c2df95b527e96953922656) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
* bpo-36974: tp_print -> tp_vectorcall_offset and tp_reserved -> tp_as_async ↵Jeroen Demeyer2019-05-311-6/+6
| | | | | | | | | (GH-13464) Automatically replace tp_print -> tp_vectorcall_offset tp_compare -> tp_as_async tp_reserved -> tp_as_async
* remove unnecessary tp_dealloc (GH-13647)Inada Naoki2019-05-291-7/+1
|
* bpo-36763: Implement the PEP 587 (GH-13592)Victor Stinner2019-05-271-31/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Add a whole new documentation page: "Python Initialization Configuration" * PyWideStringList_Append() return type is now PyStatus, instead of int * PyInterpreterState_New() now calls PyConfig_Clear() if PyConfig_InitPythonConfig() fails. * Rename files: * Python/coreconfig.c => Python/initconfig.c * Include/cpython/coreconfig.h => Include/cpython/initconfig.h * Include/internal/: pycore_coreconfig.h => pycore_initconfig.h * Rename structures * _PyCoreConfig => PyConfig * _PyPreConfig => PyPreConfig * _PyInitError => PyStatus * _PyWstrList => PyWideStringList * Rename PyConfig fields: * use_module_search_paths => module_search_paths_set * module_search_path_env => pythonpath_env * Rename PyStatus field: _func => func * PyInterpreterState: rename core_config field to config * Rename macros and functions: * _PyCoreConfig_SetArgv() => PyConfig_SetBytesArgv() * _PyCoreConfig_SetWideArgv() => PyConfig_SetArgv() * _PyCoreConfig_DecodeLocale() => PyConfig_SetBytesString() * _PyInitError_Failed() => PyStatus_Exception() * _Py_INIT_ERROR_TYPE_xxx enums => _PyStatus_TYPE_xxx * _Py_UnixMain() => Py_BytesMain() * _Py_ExitInitError() => Py_ExitStatusException() * _Py_PreInitializeFromArgs() => Py_PreInitializeFromBytesArgs() * _Py_PreInitializeFromWideArgs() => Py_PreInitializeFromArgs() * _Py_PreInitialize() => Py_PreInitialize() * _Py_RunMain() => Py_RunMain() * _Py_InitializeFromConfig() => Py_InitializeFromConfig() * _Py_INIT_XXX() => _PyStatus_XXX() * _Py_INIT_FAILED() => _PyStatus_EXCEPTION() * Rename 'err' PyStatus variables to 'status' * Convert RUN_CODE() macro to config_run_code() static inline function * Remove functions: * _Py_InitializeFromArgs() * _Py_InitializeFromWideArgs() * _PyInterpreterState_GetCoreConfig()
* bpo-36946: Fix possible signed integer overflow when handling slices. (GH-13375)Zackery Spytz2019-05-171-1/+2
| | | | | | | The final addition (cur += step) may overflow, so use size_t for "cur". "cur" is always positive (even for negative steps), so it is safe to use size_t here. Co-Authored-By: Martin Panter <vadmium+py@gmail.com>
* bpo-36594: Fix incorrect use of %p in format strings (GH-12769)Zackery Spytz2019-05-061-3/+3
| | | In addition, fix some other minor violations of C99.
* bpo-36775: _PyCoreConfig only uses wchar_t* (GH-13062)Victor Stinner2019-05-021-72/+242
| | | | | | | | | | | | | | | | | _PyCoreConfig: Change filesystem_encoding, filesystem_errors, stdio_encoding and stdio_errors fields type from char* to wchar_t*. Changes: * PyInterpreterState: replace fscodec_initialized (int) with fs_codec structure. * Add get_error_handler_wide() and unicode_encode_utf8() helper functions. * Add error_handler parameter to unicode_encode_locale() and unicode_decode_locale(). * Remove _PyCoreConfig_SetString(). * Rename _PyCoreConfig_SetWideString() to _PyCoreConfig_SetString(). * Rename _PyCoreConfig_SetWideStringFromString() to _PyCoreConfig_DecodeLocale().
* bpo-36775: Add _PyUnicode_InitEncodings() (GH-13057)Victor Stinner2019-05-021-0/+97
| | | | | | Move get_codec_name() and initfsencoding() from pylifecycle.c to unicodeobject.c. Rename also "init" functions in pylifecycle.c.
* bpo-36775: Add _Py_FORCE_UTF8_FS_ENCODING macro (GH-13056)Victor Stinner2019-05-021-2/+2
| | | | | | | Add _Py_FORCE_UTF8_LOCALE and _Py_FORCE_UTF8_FS_ENCODING macros to avoid factorize "#if defined(__ANDROID__) || defined(__VXWORKS__)" and "#if defined(__APPLE__)". Cleanup also config_init_fs_encoding().
* bpo-36389: Add _PyObject_CheckConsistency() function (GH-12803)Victor Stinner2019-04-121-49/+44
| | | | | | Add a new _PyObject_CheckConsistency() function which can be used to help debugging. The function is available in release mode. Add a 'check_content' parameter to _PyDict_CheckConsistency().
* bpo-36549: str.capitalize now titlecases the first character instead of ↵Kingsley M2019-04-121-1/+1
| | | | uppercasing it (GH-12804)
* bpo-24214: Fixed the UTF-8 incremental decoder. (GH-12603)Serhiy Storchaka2019-03-301-0/+3
| | | | The bug occurred when the encoded surrogate character is passed to the incremental decoder in two chunks.
* bpo-36312: Fix decoders for some code pages. (GH-12369)Serhiy Storchaka2019-03-201-5/+16
|
* bpo-36356: Release Unicode interned strings on Valgrind (#12431)Victor Stinner2019-03-191-15/+42
| | | | | | | | | | | When Python is compiled with Valgrind support, release Unicode interned strings at exit in _PyUnicode_Fini(). * Rename _Py_ReleaseInternedUnicodeStrings() to unicode_release_interned() and make it private. * unicode_release_interned() is now called from _PyUnicode_Fini(): it must be called with a running Python thread state for TRASHCAN, it cannot be called from pymain_free(). * Don't display statistics on interned strings at exit anymore
* bpo-36301: Error if decoding pybuilddir.txt fails (GH-12422)Victor Stinner2019-03-191-2/+11
| | | | | | | | Python initialization now fails if decoding pybuilddir.txt configuration file fails at startup. _PyPathConfig_Calculate() now reports memory allocation failure and decoding error on decoding pybuilddir.txt content from UTF-8/surrogateescape.
* bpo-36297: remove "unicode_internal" codec (GH-12342)Inada Naoki2019-03-181-102/+0
|
* bpo-35713: Split _Py_InitializeCore into subfunctions (GH-11650)Victor Stinner2019-01-221-1/+0
| | | | | | | | | | | | | | * Split _Py_InitializeCore_impl() into subfunctions: add multiple pycore_init_xxx() functions * Preliminary sys.stderr is now set earlier to get an usable sys.stderr ealier. * Move code into _Py_Initialize_ReconfigureCore() to be able to call it from _Py_InitializeCore(). * Split _PyExc_Init(): create a new _PyBuiltins_AddExceptions() function. * Call _PyExc_Init() earlier in _Py_InitializeCore_impl() and new_interpreter() to get working exceptions earlier. * _Py_ReadyTypes() now returns _PyInitError rather than calling Py_FatalError(). * Misc code cleanup
* bpo-35713: Rework Python initialization (GH-11647)Victor Stinner2019-01-221-14/+18
| | | | | | | | | | | | | | | | | | | * The PyByteArray_Init() and PyByteArray_Fini() functions have been removed. They did nothing since Python 2.7.4 and Python 3.2.0, were excluded from the limited API (stable ABI), and were not documented. * Move "_PyXXX_Init()" and "_PyXXX_Fini()" declarations from Include/cpython/pylifecycle.h to Include/internal/pycore_pylifecycle.h. Replace "PyAPI_FUNC(TYPE)" with "extern TYPE". * _PyExc_Init() now returns an error on failure rather than calling Py_FatalError(). Move macros inside _PyExc_Init() and undefine them when done. Rewrite macros to make them look more like statement: add ";" when using them, add "do { ... } while (0)". * _PyUnicode_Init() now returns a _PyInitError error rather than call Py_FatalError(). * Move stdin check from _PySys_BeginInit() to init_sys_streams(). * _Py_ReadyTypes() now returns a _PyInitError error rather than calling Py_FatalError().
* bpo-35552: Fix reading past the end in PyUnicode_FromFormat() and ↵Serhiy Storchaka2019-01-121-3/+9
| | | | | | PyBytes_FromFormat(). (GH-11276) Format characters "%s" and "%V" in PyUnicode_FromFormat() and "%s" in PyBytes_FromFormat() no longer read memory past the limit if precision is specified.
* bpo-35560: Remove assertion from format(float, "n") (GH-11288)Xtreak2019-01-071-1/+1
| | | | | Fix an assertion error in format() in debug build for floating point formatting with "n" format, zero padding and small width. Release build is not impacted. Patch by Karthikeyan Singaravelan.
* bpo-35636: Remove redundant check in unicode_hash(). (GH-11402)animalize2019-01-021-10/+1
| | | | _Py_HashBytes() does the check for empty string.
* bpo-35444: Unify and optimize the helper for getting a builtin object. ↵Serhiy Storchaka2018-12-111-2/+3
| | | | | | | | (GH-11047) This speeds up pickling of some iterators. This fixes also error handling in pickling methods when fail to look up builtin "getattr".
* bpo-35365: Use a wchar_t* buffer in the code page decoder. (GH-10837)Serhiy Storchaka2018-12-041-60/+52
|
* bpo-35372: Fix the code page decoder for input > 2 GiB. (GH-10848)Serhiy Storchaka2018-12-031-5/+4
|
* bpo-34523, bpo-35322: Fix unicode_encode_locale() (GH-10759)Victor Stinner2018-11-281-7/+5
| | | | | | | | | Fix memory leak in PyUnicode_EncodeLocale() and PyUnicode_EncodeFSDefault() on error handling. Changes: * Fix unicode_encode_locale() error handling * Fix test_codecs.LocaleCodecTest
* bpo-33954: Fix compiler warning in _PyUnicode_FastFill() (GH-10737)Victor Stinner2018-11-271-1/+4
| | | | | | | 'data' argument of unicode_fill() is modified, so it must not be constant. Add more assertions to unicode_fill(): check the maximum character value.
* bpo-33012: Fix invalid function cast warnings with gcc 8. (GH-6749)Serhiy Storchaka2018-11-271-1/+1
| | | | | | Fix invalid function cast warnings with gcc 8 for method conventions different from METH_NOARGS, METH_O and METH_VARARGS excluding Argument Clinic generated code.
* bpo-33954: Fix _PyUnicode_InsertThousandsGrouping() (GH-10623)Victor Stinner2018-11-261-98/+165
| | | | | | | | | | | | Fix str.format(), float.__format__() and complex.__format__() methods for non-ASCII decimal point when using the "n" formatter. Changes: * Rewrite _PyUnicode_InsertThousandsGrouping(): it now requires a _PyUnicodeWriter object for the buffer and a Python str object for digits. * Rename FILL() macro to unicode_fill(), convert it to static inline function, add "assert(0 <= start);" and rework its code.
* bpo-35059: Cast void* to PyObject* (GH-10650)Victor Stinner2018-11-221-3/+6
| | | Don't pass void* to Python macros: use _PyObject_CAST().
* bpo-35081: Add Include/internal/pycore_object.h (GH-10640)Victor Stinner2018-11-211-0/+1
| | | | Move _PyObject_GC_TRACK() and _PyObject_GC_UNTRACK() from Include/objimpl.h to Include/internal/pycore_object.h.
* bpo-35214: Fix OOB memory access in unicode escape parser (GH-10506)Gregory P. Smith2018-11-131-1/+1
| | | | | | Discovered using clang's MemorySanitizer when it ran python3's test_fstring test_misformed_unicode_character_name. An msan build will fail by simply executing: ./python -c 'u"\N"'
* bpo-35081: Rename internal headers (GH-10275)Victor Stinner2018-11-121-1/+1
| | | | | | | | | | | | | | Rename Include/internal/ headers: * pycore_hash.h -> pycore_pyhash.h * pycore_lifecycle.h -> pycore_pylifecycle.h * pycore_mem.h -> pycore_pymem.h * pycore_state.h -> pycore_pystate.h Add missing headers to Makefile.pre.in and PCbuild: * pycore_condvar.h. * pycore_hamt.h * pycore_pyhash.h
* bpo-35081: Add pycore_fileutils.h (GH-10371)Victor Stinner2018-11-061-0/+1
| | | | Move Py_BUILD_CORE code from Include/fileutils.h to a new Include/internal/pycore_fileutils.h file.
* bpo-35081: Add pycore_ prefix to internal header files (GH-10263)Victor Stinner2018-10-311-1/+1
| | | | | | | | | | | | | | | | | | | | * Rename Include/internal/ header files: * pyatomic.h -> pycore_atomic.h * ceval.h -> pycore_ceval.h * condvar.h -> pycore_condvar.h * context.h -> pycore_context.h * pygetopt.h -> pycore_getopt.h * gil.h -> pycore_gil.h * hamt.h -> pycore_hamt.h * hash.h -> pycore_hash.h * mem.h -> pycore_mem.h * pystate.h -> pycore_state.h * warnings.h -> pycore_warnings.h * PCbuild project, Makefile.pre.in, Modules/Setup: add the Include/internal/ directory to the search paths of header files. * Update includes. For example, replace #include "internal/mem.h" with #include "pycore_mem.h".
* bpo-9263: _PyXXX_CheckConsistency() use _PyObject_ASSERT() (GH-10108)Victor Stinner2018-10-261-36/+40
| | | | | | | | | | Use _PyObject_ASSERT() in: * _PyDict_CheckConsistency() * _PyType_CheckConsistency() * _PyUnicode_CheckConsistency() _PyObject_ASSERT() dumps the faulty object if the assertion fails to help debugging.
* bpo-30863: Rewrite PyUnicode_AsWideChar() and PyUnicode_AsWideCharString(). ↵Serhiy Storchaka2018-10-231-123/+121
| | | | | | (GH-2599) They no longer cache the wchar_t* representation of string objects.
* Add missing closing quote and trailing period in str.isidentifier() ↵Emanuele Gaifas2018-10-081-2/+2
| | | | | docstring (GH-9756) This rectifies commit ffc5a14d00db984c8e72c7b67da8a493e17e2c14.
* bpo-33014: Clarify str.isidentifier docstring (GH-6088)Sanyam Khurana2018-10-081-3/+3
| | | | | | * bpo-33014: Clarify str.isidentifier docstring * bpo-33014: Add code example in isidentifier documentation
* Revert "bpo-34595: Add %T format to PyUnicode_FromFormatV() (GH-9080)" (GH-9187)Victor Stinner2018-09-111-52/+53
| | | This reverts commit 886483e2b9bbabf60ab769683269b873381dd5ee.