summaryrefslogtreecommitdiffstats
path: root/Objects/unicodeobject.c
Commit message (Collapse)AuthorAgeFilesLines
* bpo-38409: Fix grammar in str.strip() docstring (GH-16682) (GH-16684)Miss Islington (bot)2019-10-091-2/+2
| | | (cherry picked from commit 09895c27cd8ff60563a794016e8c099bc897cc74)
* bpo-38236: Dump path config at first import error (GH-16300) (GH-16332)Victor Stinner2019-09-231-8/+11
| | | | | | Python now dumps path configuration if it fails to import the Python codecs of the filesystem and stdio encodings. (cherry picked from commit fcdb027234566c4d506d6d753c7d5638490fb088)
* [3.8] bpo-37206: Unrepresentable default values no longer represented as ↵Serhiy Storchaka2019-09-141-5/+5
| | | | | | | | | | None. (GH-13933) (GH-16141) In ArgumentClinic, value "NULL" should now be used only for unrepresentable default values (like in the optional third parameter of getattr). "None" should be used if None is accepted as argument and passing None has the same effect as not passing the argument at all. (cherry picked from commit 279f44678c8b84a183f9eeb85e0b086228154497) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
* Fix unused variable and signed/unsigned warnings (GH-15537) (GH-15551)Miss Islington (bot)2019-08-271-0/+6
| | | | | (cherry picked from commit 0138c4ceab1e10d42d0aa962d2ae079b46da7671) Co-authored-by: Raymond Hettinger <rhettinger@users.noreply.github.com>
* bpo-36311: Fixes decoding multibyte characters around chunk boundaries and ↵Miss Islington (bot)2019-08-211-6/+10
| | | | | | | improves decoding performance (GH-15083) (cherry picked from commit 7ebdda0dbee7df6f0c945a7e1e623e47676e112d) Co-authored-by: Steve Dower <steve.dower@python.org>
* bpo-24214: Fixed the UTF-8 and UTF-16 incremental decoders. (GH-14304)Miss Islington (bot)2019-06-251-3/+7
| | | | | | | | | * The UTF-8 incremental decoders fails now fast if encounter a sequence that can't be handled by the error handler. * The UTF-16 incremental decoders with the surrogatepass error handler decodes now a lone low surrogate with final=False. (cherry picked from commit 894263ba80af4b7733c2df95b527e96953922656) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
* bpo-36974: tp_print -> tp_vectorcall_offset and tp_reserved -> tp_as_async ↵Jeroen Demeyer2019-05-311-6/+6
| | | | | | | | | (GH-13464) Automatically replace tp_print -> tp_vectorcall_offset tp_compare -> tp_as_async tp_reserved -> tp_as_async
* remove unnecessary tp_dealloc (GH-13647)Inada Naoki2019-05-291-7/+1
|
* bpo-36763: Implement the PEP 587 (GH-13592)Victor Stinner2019-05-271-31/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Add a whole new documentation page: "Python Initialization Configuration" * PyWideStringList_Append() return type is now PyStatus, instead of int * PyInterpreterState_New() now calls PyConfig_Clear() if PyConfig_InitPythonConfig() fails. * Rename files: * Python/coreconfig.c => Python/initconfig.c * Include/cpython/coreconfig.h => Include/cpython/initconfig.h * Include/internal/: pycore_coreconfig.h => pycore_initconfig.h * Rename structures * _PyCoreConfig => PyConfig * _PyPreConfig => PyPreConfig * _PyInitError => PyStatus * _PyWstrList => PyWideStringList * Rename PyConfig fields: * use_module_search_paths => module_search_paths_set * module_search_path_env => pythonpath_env * Rename PyStatus field: _func => func * PyInterpreterState: rename core_config field to config * Rename macros and functions: * _PyCoreConfig_SetArgv() => PyConfig_SetBytesArgv() * _PyCoreConfig_SetWideArgv() => PyConfig_SetArgv() * _PyCoreConfig_DecodeLocale() => PyConfig_SetBytesString() * _PyInitError_Failed() => PyStatus_Exception() * _Py_INIT_ERROR_TYPE_xxx enums => _PyStatus_TYPE_xxx * _Py_UnixMain() => Py_BytesMain() * _Py_ExitInitError() => Py_ExitStatusException() * _Py_PreInitializeFromArgs() => Py_PreInitializeFromBytesArgs() * _Py_PreInitializeFromWideArgs() => Py_PreInitializeFromArgs() * _Py_PreInitialize() => Py_PreInitialize() * _Py_RunMain() => Py_RunMain() * _Py_InitializeFromConfig() => Py_InitializeFromConfig() * _Py_INIT_XXX() => _PyStatus_XXX() * _Py_INIT_FAILED() => _PyStatus_EXCEPTION() * Rename 'err' PyStatus variables to 'status' * Convert RUN_CODE() macro to config_run_code() static inline function * Remove functions: * _Py_InitializeFromArgs() * _Py_InitializeFromWideArgs() * _PyInterpreterState_GetCoreConfig()
* bpo-36946: Fix possible signed integer overflow when handling slices. (GH-13375)Zackery Spytz2019-05-171-1/+2
| | | | | | | The final addition (cur += step) may overflow, so use size_t for "cur". "cur" is always positive (even for negative steps), so it is safe to use size_t here. Co-Authored-By: Martin Panter <vadmium+py@gmail.com>
* bpo-36594: Fix incorrect use of %p in format strings (GH-12769)Zackery Spytz2019-05-061-3/+3
| | | In addition, fix some other minor violations of C99.
* bpo-36775: _PyCoreConfig only uses wchar_t* (GH-13062)Victor Stinner2019-05-021-72/+242
| | | | | | | | | | | | | | | | | _PyCoreConfig: Change filesystem_encoding, filesystem_errors, stdio_encoding and stdio_errors fields type from char* to wchar_t*. Changes: * PyInterpreterState: replace fscodec_initialized (int) with fs_codec structure. * Add get_error_handler_wide() and unicode_encode_utf8() helper functions. * Add error_handler parameter to unicode_encode_locale() and unicode_decode_locale(). * Remove _PyCoreConfig_SetString(). * Rename _PyCoreConfig_SetWideString() to _PyCoreConfig_SetString(). * Rename _PyCoreConfig_SetWideStringFromString() to _PyCoreConfig_DecodeLocale().
* bpo-36775: Add _PyUnicode_InitEncodings() (GH-13057)Victor Stinner2019-05-021-0/+97
| | | | | | Move get_codec_name() and initfsencoding() from pylifecycle.c to unicodeobject.c. Rename also "init" functions in pylifecycle.c.
* bpo-36775: Add _Py_FORCE_UTF8_FS_ENCODING macro (GH-13056)Victor Stinner2019-05-021-2/+2
| | | | | | | Add _Py_FORCE_UTF8_LOCALE and _Py_FORCE_UTF8_FS_ENCODING macros to avoid factorize "#if defined(__ANDROID__) || defined(__VXWORKS__)" and "#if defined(__APPLE__)". Cleanup also config_init_fs_encoding().
* bpo-36389: Add _PyObject_CheckConsistency() function (GH-12803)Victor Stinner2019-04-121-49/+44
| | | | | | Add a new _PyObject_CheckConsistency() function which can be used to help debugging. The function is available in release mode. Add a 'check_content' parameter to _PyDict_CheckConsistency().
* bpo-36549: str.capitalize now titlecases the first character instead of ↵Kingsley M2019-04-121-1/+1
| | | | uppercasing it (GH-12804)
* bpo-24214: Fixed the UTF-8 incremental decoder. (GH-12603)Serhiy Storchaka2019-03-301-0/+3
| | | | The bug occurred when the encoded surrogate character is passed to the incremental decoder in two chunks.
* bpo-36312: Fix decoders for some code pages. (GH-12369)Serhiy Storchaka2019-03-201-5/+16
|
* bpo-36356: Release Unicode interned strings on Valgrind (#12431)Victor Stinner2019-03-191-15/+42
| | | | | | | | | | | When Python is compiled with Valgrind support, release Unicode interned strings at exit in _PyUnicode_Fini(). * Rename _Py_ReleaseInternedUnicodeStrings() to unicode_release_interned() and make it private. * unicode_release_interned() is now called from _PyUnicode_Fini(): it must be called with a running Python thread state for TRASHCAN, it cannot be called from pymain_free(). * Don't display statistics on interned strings at exit anymore
* bpo-36301: Error if decoding pybuilddir.txt fails (GH-12422)Victor Stinner2019-03-191-2/+11
| | | | | | | | Python initialization now fails if decoding pybuilddir.txt configuration file fails at startup. _PyPathConfig_Calculate() now reports memory allocation failure and decoding error on decoding pybuilddir.txt content from UTF-8/surrogateescape.
* bpo-36297: remove "unicode_internal" codec (GH-12342)Inada Naoki2019-03-181-102/+0
|
* bpo-35713: Split _Py_InitializeCore into subfunctions (GH-11650)Victor Stinner2019-01-221-1/+0
| | | | | | | | | | | | | | * Split _Py_InitializeCore_impl() into subfunctions: add multiple pycore_init_xxx() functions * Preliminary sys.stderr is now set earlier to get an usable sys.stderr ealier. * Move code into _Py_Initialize_ReconfigureCore() to be able to call it from _Py_InitializeCore(). * Split _PyExc_Init(): create a new _PyBuiltins_AddExceptions() function. * Call _PyExc_Init() earlier in _Py_InitializeCore_impl() and new_interpreter() to get working exceptions earlier. * _Py_ReadyTypes() now returns _PyInitError rather than calling Py_FatalError(). * Misc code cleanup
* bpo-35713: Rework Python initialization (GH-11647)Victor Stinner2019-01-221-14/+18
| | | | | | | | | | | | | | | | | | | * The PyByteArray_Init() and PyByteArray_Fini() functions have been removed. They did nothing since Python 2.7.4 and Python 3.2.0, were excluded from the limited API (stable ABI), and were not documented. * Move "_PyXXX_Init()" and "_PyXXX_Fini()" declarations from Include/cpython/pylifecycle.h to Include/internal/pycore_pylifecycle.h. Replace "PyAPI_FUNC(TYPE)" with "extern TYPE". * _PyExc_Init() now returns an error on failure rather than calling Py_FatalError(). Move macros inside _PyExc_Init() and undefine them when done. Rewrite macros to make them look more like statement: add ";" when using them, add "do { ... } while (0)". * _PyUnicode_Init() now returns a _PyInitError error rather than call Py_FatalError(). * Move stdin check from _PySys_BeginInit() to init_sys_streams(). * _Py_ReadyTypes() now returns a _PyInitError error rather than calling Py_FatalError().
* bpo-35552: Fix reading past the end in PyUnicode_FromFormat() and ↵Serhiy Storchaka2019-01-121-3/+9
| | | | | | PyBytes_FromFormat(). (GH-11276) Format characters "%s" and "%V" in PyUnicode_FromFormat() and "%s" in PyBytes_FromFormat() no longer read memory past the limit if precision is specified.
* bpo-35560: Remove assertion from format(float, "n") (GH-11288)Xtreak2019-01-071-1/+1
| | | | | Fix an assertion error in format() in debug build for floating point formatting with "n" format, zero padding and small width. Release build is not impacted. Patch by Karthikeyan Singaravelan.
* bpo-35636: Remove redundant check in unicode_hash(). (GH-11402)animalize2019-01-021-10/+1
| | | | _Py_HashBytes() does the check for empty string.
* bpo-35444: Unify and optimize the helper for getting a builtin object. ↵Serhiy Storchaka2018-12-111-2/+3
| | | | | | | | (GH-11047) This speeds up pickling of some iterators. This fixes also error handling in pickling methods when fail to look up builtin "getattr".
* bpo-35365: Use a wchar_t* buffer in the code page decoder. (GH-10837)Serhiy Storchaka2018-12-041-60/+52
|
* bpo-35372: Fix the code page decoder for input > 2 GiB. (GH-10848)Serhiy Storchaka2018-12-031-5/+4
|
* bpo-34523, bpo-35322: Fix unicode_encode_locale() (GH-10759)Victor Stinner2018-11-281-7/+5
| | | | | | | | | Fix memory leak in PyUnicode_EncodeLocale() and PyUnicode_EncodeFSDefault() on error handling. Changes: * Fix unicode_encode_locale() error handling * Fix test_codecs.LocaleCodecTest
* bpo-33954: Fix compiler warning in _PyUnicode_FastFill() (GH-10737)Victor Stinner2018-11-271-1/+4
| | | | | | | 'data' argument of unicode_fill() is modified, so it must not be constant. Add more assertions to unicode_fill(): check the maximum character value.
* bpo-33012: Fix invalid function cast warnings with gcc 8. (GH-6749)Serhiy Storchaka2018-11-271-1/+1
| | | | | | Fix invalid function cast warnings with gcc 8 for method conventions different from METH_NOARGS, METH_O and METH_VARARGS excluding Argument Clinic generated code.
* bpo-33954: Fix _PyUnicode_InsertThousandsGrouping() (GH-10623)Victor Stinner2018-11-261-98/+165
| | | | | | | | | | | | Fix str.format(), float.__format__() and complex.__format__() methods for non-ASCII decimal point when using the "n" formatter. Changes: * Rewrite _PyUnicode_InsertThousandsGrouping(): it now requires a _PyUnicodeWriter object for the buffer and a Python str object for digits. * Rename FILL() macro to unicode_fill(), convert it to static inline function, add "assert(0 <= start);" and rework its code.
* bpo-35059: Cast void* to PyObject* (GH-10650)Victor Stinner2018-11-221-3/+6
| | | Don't pass void* to Python macros: use _PyObject_CAST().
* bpo-35081: Add Include/internal/pycore_object.h (GH-10640)Victor Stinner2018-11-211-0/+1
| | | | Move _PyObject_GC_TRACK() and _PyObject_GC_UNTRACK() from Include/objimpl.h to Include/internal/pycore_object.h.
* bpo-35214: Fix OOB memory access in unicode escape parser (GH-10506)Gregory P. Smith2018-11-131-1/+1
| | | | | | Discovered using clang's MemorySanitizer when it ran python3's test_fstring test_misformed_unicode_character_name. An msan build will fail by simply executing: ./python -c 'u"\N"'
* bpo-35081: Rename internal headers (GH-10275)Victor Stinner2018-11-121-1/+1
| | | | | | | | | | | | | | Rename Include/internal/ headers: * pycore_hash.h -> pycore_pyhash.h * pycore_lifecycle.h -> pycore_pylifecycle.h * pycore_mem.h -> pycore_pymem.h * pycore_state.h -> pycore_pystate.h Add missing headers to Makefile.pre.in and PCbuild: * pycore_condvar.h. * pycore_hamt.h * pycore_pyhash.h
* bpo-35081: Add pycore_fileutils.h (GH-10371)Victor Stinner2018-11-061-0/+1
| | | | Move Py_BUILD_CORE code from Include/fileutils.h to a new Include/internal/pycore_fileutils.h file.
* bpo-35081: Add pycore_ prefix to internal header files (GH-10263)Victor Stinner2018-10-311-1/+1
| | | | | | | | | | | | | | | | | | | | * Rename Include/internal/ header files: * pyatomic.h -> pycore_atomic.h * ceval.h -> pycore_ceval.h * condvar.h -> pycore_condvar.h * context.h -> pycore_context.h * pygetopt.h -> pycore_getopt.h * gil.h -> pycore_gil.h * hamt.h -> pycore_hamt.h * hash.h -> pycore_hash.h * mem.h -> pycore_mem.h * pystate.h -> pycore_state.h * warnings.h -> pycore_warnings.h * PCbuild project, Makefile.pre.in, Modules/Setup: add the Include/internal/ directory to the search paths of header files. * Update includes. For example, replace #include "internal/mem.h" with #include "pycore_mem.h".
* bpo-9263: _PyXXX_CheckConsistency() use _PyObject_ASSERT() (GH-10108)Victor Stinner2018-10-261-36/+40
| | | | | | | | | | Use _PyObject_ASSERT() in: * _PyDict_CheckConsistency() * _PyType_CheckConsistency() * _PyUnicode_CheckConsistency() _PyObject_ASSERT() dumps the faulty object if the assertion fails to help debugging.
* bpo-30863: Rewrite PyUnicode_AsWideChar() and PyUnicode_AsWideCharString(). ↵Serhiy Storchaka2018-10-231-123/+121
| | | | | | (GH-2599) They no longer cache the wchar_t* representation of string objects.
* Add missing closing quote and trailing period in str.isidentifier() ↵Emanuele Gaifas2018-10-081-2/+2
| | | | | docstring (GH-9756) This rectifies commit ffc5a14d00db984c8e72c7b67da8a493e17e2c14.
* bpo-33014: Clarify str.isidentifier docstring (GH-6088)Sanyam Khurana2018-10-081-3/+3
| | | | | | * bpo-33014: Clarify str.isidentifier docstring * bpo-33014: Add code example in isidentifier documentation
* Revert "bpo-34595: Add %T format to PyUnicode_FromFormatV() (GH-9080)" (GH-9187)Victor Stinner2018-09-111-52/+53
| | | This reverts commit 886483e2b9bbabf60ab769683269b873381dd5ee.
* bpo-34595: Add %T format to PyUnicode_FromFormatV() (GH-9080)Victor Stinner2018-09-071-53/+52
| | | | | | | | | * Add %T format to PyUnicode_FromFormatV(), and so to PyUnicode_FromFormat() and PyErr_Format(), to format an object type name: equivalent to "%s" with Py_TYPE(obj)->tp_name. * Replace Py_TYPE(obj)->tp_name with %T format in unicodeobject.c. * Add unit test on %T format. * Rename unicode_fromformat_write_cstr() to unicode_fromformat_write_utf8(), to make the intent more explicit.
* bpo-34523: Support surrogatepass in locale codecs (GH-8995)Victor Stinner2018-08-291-72/+101
| | | | | | | | | | | | | | | | | | | | Add support for the "surrogatepass" error handler in PyUnicode_DecodeFSDefault() and PyUnicode_EncodeFSDefault() for the UTF-8 encoding. Changes: * _Py_DecodeUTF8Ex() and _Py_EncodeUTF8Ex() now support the surrogatepass error handler (_Py_ERROR_SURROGATEPASS). * _Py_DecodeLocaleEx() and _Py_EncodeLocaleEx() now use the _Py_error_handler enum instead of "int surrogateescape" to pass the error handler. These functions now return -3 if the error handler is unknown. * Add unit tests on _Py_DecodeLocaleEx() and _Py_EncodeLocaleEx() in test_codecs. * Rename get_error_handler() to _Py_GetErrorHandler() and expose it as a private function. * _freeze_importlib doesn't need config.filesystem_errors="strict" workaround anymore.
* bpo-34523: Add _PyCoreConfig.filesystem_encoding (GH-8963)Victor Stinner2018-08-291-24/+18
| | | | | | | | | | | | | | | | | | | | | | | _PyCoreConfig_Read() is now responsible to choose the filesystem encoding and error handler. Using Py_Main(), the encoding is now chosen even before calling Py_Initialize(). _PyCoreConfig.filesystem_encoding is now the reference, instead of Py_FileSystemDefaultEncoding, for the Python filesystem encoding. Changes: * Add filesystem_encoding and filesystem_errors to _PyCoreConfig * _PyCoreConfig_Read() now reads the locale encoding for the file system encoding. * PyUnicode_EncodeFSDefault() and PyUnicode_DecodeFSDefaultAndSize() now use the interpreter configuration rather than Py_FileSystemDefaultEncoding and Py_FileSystemDefaultEncodeErrors global configuration variables. * Add _Py_SetFileSystemEncoding() and _Py_ClearFileSystemEncoding() private functions to only modify Py_FileSystemDefaultEncoding and Py_FileSystemDefaultEncodeErrors in coreconfig.c. * _Py_CoerceLegacyLocale() now takes an int rather than _PyCoreConfig for the warning.
* bpo-34435: Add missing NULL check to unicode_encode_ucs1(). (GH-8823)Alexey Izbyshev2018-08-191-2/+3
| | | Reported by Svace static analyzer.
* bpo-22602: Raise an exception in the UTF-7 decoder for ill-formed sequences ↵Zackery Spytz2018-08-191-0/+5
| | | | | | | starting with "+". (GH-8741) The UTF-7 decoder now raises UnicodeDecodeError for ill-formed sequences starting with "+" (as specified in RFC 2152).
* bpo-34301: Add _PyInterpreterState_Get() helper function (GH-8592)Victor Stinner2018-08-031-2/+2
| | | | sys_setcheckinterval() now uses a local variable to parse arguments, before writing into interp->check_interval.