summaryrefslogtreecommitdiffstats
path: root/Objects/unicodeobject.c
Commit message (Collapse)AuthorAgeFilesLines
* bpo-28029: Make "".replace("", s, n) returning s for any n != 0. (GH-16981)Serhiy Storchaka2019-10-301-1/+4
|
* bpo-38409: Fix grammar in str.strip() docstring (GH-16682)Zachary Ware2019-10-091-2/+2
|
* bpo-36389: _PyObject_CheckConsistency() available in release mode (GH-16612)Victor Stinner2019-10-071-42/+42
| | | | | | | | | | | | | | | | | | | | | bpo-36389, bpo-38376: The _PyObject_CheckConsistency() function is now also available in release mode. For example, it can be used to debug a crash in the visit_decref() function of the GC. Modify the following functions to also work in release mode: * _PyDict_CheckConsistency() * _PyObject_CheckConsistency() * _PyType_CheckConsistency() * _PyUnicode_CheckConsistency() Other changes: * _PyMem_IsPtrFreed(ptr) now also returns 1 if ptr is NULL (equals to 0). * _PyBytesWriter_CheckConsistency() now returns 1 and is only used with assert(). * Reorder _PyObject_Dump() to write safe fields first, and only attempt to render repr() at the end.
* bpo-38353: Cleanup includes in the internal C API (GH-16548)Victor Stinner2019-10-021-1/+2
| | | | Use forward declaration of types to avoid includes in the internal C API. Add also comment to justify other includes.
* bpo-38236: Dump path config at first import error (GH-16300)Victor Stinner2019-09-231-9/+10
| | | | Python now dumps path configuration if it fails to import the Python codecs of the filesystem and stdio encodings.
* bpo-37206: Unrepresentable default values no longer represented as None. ↵Serhiy Storchaka2019-09-141-5/+5
| | | | | | | (GH-13933) In ArgumentClinic, value "NULL" should now be used only for unrepresentable default values (like in the optional third parameter of getattr). "None" should be used if None is accepted as argument and passing None has the same effect as not passing the argument at all.
* Fix unused variable and signed/unsigned warnings (GH-15537)Raymond Hettinger2019-08-271-0/+6
|
* bpo-36311: Fixes decoding multibyte characters around chunk boundaries and ↵Steve Dower2019-08-211-6/+10
| | | | improves decoding performance (GH-15083)
* bpo-37483: add _PyObject_CallOneArg() function (#14558)Jeroen Demeyer2019-07-041-6/+4
|
* bpo-37388: Add PyUnicode_Decode(str, 0) fast-path (GH-14385)Victor Stinner2019-06-251-0/+4
| | | Add a fast-path to PyUnicode_Decode() for size equals to 0.
* bpo-37388: Development mode check encoding and errors (GH-14341)Victor Stinner2019-06-251-5/+63
| | | | | | | | | In development mode and in debug build, encoding and errors arguments are now checked on string encoding and decoding operations. Examples: open(), str.encode() and bytes.decode(). By default, for best performances, the errors argument is only checked at the first encoding/decoding error, and the encoding argument is sometimes ignored for empty strings.
* bpo-24214: Fixed the UTF-8 and UTF-16 incremental decoders. (GH-14304)Serhiy Storchaka2019-06-251-3/+7
| | | | | | | * The UTF-8 incremental decoders fails now fast if encounter a sequence that can't be handled by the error handler. * The UTF-16 incremental decoders with the surrogatepass error handler decodes now a lone low surrogate with final=False.
* bpo-37348: optimize decoding ASCII string (GH-14283)Inada Naoki2019-06-241-34/+51
| | | `_PyUnicode_Writer` is a relatively complex structure. Initializing it is significant overhead when decoding short ASCII string.
* bpo-36710: Use tstate in pylifecycle.c (GH-14249)Victor Stinner2019-06-191-1/+3
| | | | In pylifecycle.c: pass tstate argument, rather than interp argument, to functions.
* bpo-36974: tp_print -> tp_vectorcall_offset and tp_reserved -> tp_as_async ↵Jeroen Demeyer2019-05-311-6/+6
| | | | | | | | | (GH-13464) Automatically replace tp_print -> tp_vectorcall_offset tp_compare -> tp_as_async tp_reserved -> tp_as_async
* remove unnecessary tp_dealloc (GH-13647)Inada Naoki2019-05-291-7/+1
|
* bpo-36763: Implement the PEP 587 (GH-13592)Victor Stinner2019-05-271-31/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Add a whole new documentation page: "Python Initialization Configuration" * PyWideStringList_Append() return type is now PyStatus, instead of int * PyInterpreterState_New() now calls PyConfig_Clear() if PyConfig_InitPythonConfig() fails. * Rename files: * Python/coreconfig.c => Python/initconfig.c * Include/cpython/coreconfig.h => Include/cpython/initconfig.h * Include/internal/: pycore_coreconfig.h => pycore_initconfig.h * Rename structures * _PyCoreConfig => PyConfig * _PyPreConfig => PyPreConfig * _PyInitError => PyStatus * _PyWstrList => PyWideStringList * Rename PyConfig fields: * use_module_search_paths => module_search_paths_set * module_search_path_env => pythonpath_env * Rename PyStatus field: _func => func * PyInterpreterState: rename core_config field to config * Rename macros and functions: * _PyCoreConfig_SetArgv() => PyConfig_SetBytesArgv() * _PyCoreConfig_SetWideArgv() => PyConfig_SetArgv() * _PyCoreConfig_DecodeLocale() => PyConfig_SetBytesString() * _PyInitError_Failed() => PyStatus_Exception() * _Py_INIT_ERROR_TYPE_xxx enums => _PyStatus_TYPE_xxx * _Py_UnixMain() => Py_BytesMain() * _Py_ExitInitError() => Py_ExitStatusException() * _Py_PreInitializeFromArgs() => Py_PreInitializeFromBytesArgs() * _Py_PreInitializeFromWideArgs() => Py_PreInitializeFromArgs() * _Py_PreInitialize() => Py_PreInitialize() * _Py_RunMain() => Py_RunMain() * _Py_InitializeFromConfig() => Py_InitializeFromConfig() * _Py_INIT_XXX() => _PyStatus_XXX() * _Py_INIT_FAILED() => _PyStatus_EXCEPTION() * Rename 'err' PyStatus variables to 'status' * Convert RUN_CODE() macro to config_run_code() static inline function * Remove functions: * _Py_InitializeFromArgs() * _Py_InitializeFromWideArgs() * _PyInterpreterState_GetCoreConfig()
* bpo-36946: Fix possible signed integer overflow when handling slices. (GH-13375)Zackery Spytz2019-05-171-1/+2
| | | | | | | The final addition (cur += step) may overflow, so use size_t for "cur". "cur" is always positive (even for negative steps), so it is safe to use size_t here. Co-Authored-By: Martin Panter <vadmium+py@gmail.com>
* bpo-36594: Fix incorrect use of %p in format strings (GH-12769)Zackery Spytz2019-05-061-3/+3
| | | In addition, fix some other minor violations of C99.
* bpo-36775: _PyCoreConfig only uses wchar_t* (GH-13062)Victor Stinner2019-05-021-72/+242
| | | | | | | | | | | | | | | | | _PyCoreConfig: Change filesystem_encoding, filesystem_errors, stdio_encoding and stdio_errors fields type from char* to wchar_t*. Changes: * PyInterpreterState: replace fscodec_initialized (int) with fs_codec structure. * Add get_error_handler_wide() and unicode_encode_utf8() helper functions. * Add error_handler parameter to unicode_encode_locale() and unicode_decode_locale(). * Remove _PyCoreConfig_SetString(). * Rename _PyCoreConfig_SetWideString() to _PyCoreConfig_SetString(). * Rename _PyCoreConfig_SetWideStringFromString() to _PyCoreConfig_DecodeLocale().
* bpo-36775: Add _PyUnicode_InitEncodings() (GH-13057)Victor Stinner2019-05-021-0/+97
| | | | | | Move get_codec_name() and initfsencoding() from pylifecycle.c to unicodeobject.c. Rename also "init" functions in pylifecycle.c.
* bpo-36775: Add _Py_FORCE_UTF8_FS_ENCODING macro (GH-13056)Victor Stinner2019-05-021-2/+2
| | | | | | | Add _Py_FORCE_UTF8_LOCALE and _Py_FORCE_UTF8_FS_ENCODING macros to avoid factorize "#if defined(__ANDROID__) || defined(__VXWORKS__)" and "#if defined(__APPLE__)". Cleanup also config_init_fs_encoding().
* bpo-36389: Add _PyObject_CheckConsistency() function (GH-12803)Victor Stinner2019-04-121-49/+44
| | | | | | Add a new _PyObject_CheckConsistency() function which can be used to help debugging. The function is available in release mode. Add a 'check_content' parameter to _PyDict_CheckConsistency().
* bpo-36549: str.capitalize now titlecases the first character instead of ↵Kingsley M2019-04-121-1/+1
| | | | uppercasing it (GH-12804)
* bpo-24214: Fixed the UTF-8 incremental decoder. (GH-12603)Serhiy Storchaka2019-03-301-0/+3
| | | | The bug occurred when the encoded surrogate character is passed to the incremental decoder in two chunks.
* bpo-36312: Fix decoders for some code pages. (GH-12369)Serhiy Storchaka2019-03-201-5/+16
|
* bpo-36356: Release Unicode interned strings on Valgrind (#12431)Victor Stinner2019-03-191-15/+42
| | | | | | | | | | | When Python is compiled with Valgrind support, release Unicode interned strings at exit in _PyUnicode_Fini(). * Rename _Py_ReleaseInternedUnicodeStrings() to unicode_release_interned() and make it private. * unicode_release_interned() is now called from _PyUnicode_Fini(): it must be called with a running Python thread state for TRASHCAN, it cannot be called from pymain_free(). * Don't display statistics on interned strings at exit anymore
* bpo-36301: Error if decoding pybuilddir.txt fails (GH-12422)Victor Stinner2019-03-191-2/+11
| | | | | | | | Python initialization now fails if decoding pybuilddir.txt configuration file fails at startup. _PyPathConfig_Calculate() now reports memory allocation failure and decoding error on decoding pybuilddir.txt content from UTF-8/surrogateescape.
* bpo-36297: remove "unicode_internal" codec (GH-12342)Inada Naoki2019-03-181-102/+0
|
* bpo-35713: Split _Py_InitializeCore into subfunctions (GH-11650)Victor Stinner2019-01-221-1/+0
| | | | | | | | | | | | | | * Split _Py_InitializeCore_impl() into subfunctions: add multiple pycore_init_xxx() functions * Preliminary sys.stderr is now set earlier to get an usable sys.stderr ealier. * Move code into _Py_Initialize_ReconfigureCore() to be able to call it from _Py_InitializeCore(). * Split _PyExc_Init(): create a new _PyBuiltins_AddExceptions() function. * Call _PyExc_Init() earlier in _Py_InitializeCore_impl() and new_interpreter() to get working exceptions earlier. * _Py_ReadyTypes() now returns _PyInitError rather than calling Py_FatalError(). * Misc code cleanup
* bpo-35713: Rework Python initialization (GH-11647)Victor Stinner2019-01-221-14/+18
| | | | | | | | | | | | | | | | | | | * The PyByteArray_Init() and PyByteArray_Fini() functions have been removed. They did nothing since Python 2.7.4 and Python 3.2.0, were excluded from the limited API (stable ABI), and were not documented. * Move "_PyXXX_Init()" and "_PyXXX_Fini()" declarations from Include/cpython/pylifecycle.h to Include/internal/pycore_pylifecycle.h. Replace "PyAPI_FUNC(TYPE)" with "extern TYPE". * _PyExc_Init() now returns an error on failure rather than calling Py_FatalError(). Move macros inside _PyExc_Init() and undefine them when done. Rewrite macros to make them look more like statement: add ";" when using them, add "do { ... } while (0)". * _PyUnicode_Init() now returns a _PyInitError error rather than call Py_FatalError(). * Move stdin check from _PySys_BeginInit() to init_sys_streams(). * _Py_ReadyTypes() now returns a _PyInitError error rather than calling Py_FatalError().
* bpo-35552: Fix reading past the end in PyUnicode_FromFormat() and ↵Serhiy Storchaka2019-01-121-3/+9
| | | | | | PyBytes_FromFormat(). (GH-11276) Format characters "%s" and "%V" in PyUnicode_FromFormat() and "%s" in PyBytes_FromFormat() no longer read memory past the limit if precision is specified.
* bpo-35560: Remove assertion from format(float, "n") (GH-11288)Xtreak2019-01-071-1/+1
| | | | | Fix an assertion error in format() in debug build for floating point formatting with "n" format, zero padding and small width. Release build is not impacted. Patch by Karthikeyan Singaravelan.
* bpo-35636: Remove redundant check in unicode_hash(). (GH-11402)animalize2019-01-021-10/+1
| | | | _Py_HashBytes() does the check for empty string.
* bpo-35444: Unify and optimize the helper for getting a builtin object. ↵Serhiy Storchaka2018-12-111-2/+3
| | | | | | | | (GH-11047) This speeds up pickling of some iterators. This fixes also error handling in pickling methods when fail to look up builtin "getattr".
* bpo-35365: Use a wchar_t* buffer in the code page decoder. (GH-10837)Serhiy Storchaka2018-12-041-60/+52
|
* bpo-35372: Fix the code page decoder for input > 2 GiB. (GH-10848)Serhiy Storchaka2018-12-031-5/+4
|
* bpo-34523, bpo-35322: Fix unicode_encode_locale() (GH-10759)Victor Stinner2018-11-281-7/+5
| | | | | | | | | Fix memory leak in PyUnicode_EncodeLocale() and PyUnicode_EncodeFSDefault() on error handling. Changes: * Fix unicode_encode_locale() error handling * Fix test_codecs.LocaleCodecTest
* bpo-33954: Fix compiler warning in _PyUnicode_FastFill() (GH-10737)Victor Stinner2018-11-271-1/+4
| | | | | | | 'data' argument of unicode_fill() is modified, so it must not be constant. Add more assertions to unicode_fill(): check the maximum character value.
* bpo-33012: Fix invalid function cast warnings with gcc 8. (GH-6749)Serhiy Storchaka2018-11-271-1/+1
| | | | | | Fix invalid function cast warnings with gcc 8 for method conventions different from METH_NOARGS, METH_O and METH_VARARGS excluding Argument Clinic generated code.
* bpo-33954: Fix _PyUnicode_InsertThousandsGrouping() (GH-10623)Victor Stinner2018-11-261-98/+165
| | | | | | | | | | | | Fix str.format(), float.__format__() and complex.__format__() methods for non-ASCII decimal point when using the "n" formatter. Changes: * Rewrite _PyUnicode_InsertThousandsGrouping(): it now requires a _PyUnicodeWriter object for the buffer and a Python str object for digits. * Rename FILL() macro to unicode_fill(), convert it to static inline function, add "assert(0 <= start);" and rework its code.
* bpo-35059: Cast void* to PyObject* (GH-10650)Victor Stinner2018-11-221-3/+6
| | | Don't pass void* to Python macros: use _PyObject_CAST().
* bpo-35081: Add Include/internal/pycore_object.h (GH-10640)Victor Stinner2018-11-211-0/+1
| | | | Move _PyObject_GC_TRACK() and _PyObject_GC_UNTRACK() from Include/objimpl.h to Include/internal/pycore_object.h.
* bpo-35214: Fix OOB memory access in unicode escape parser (GH-10506)Gregory P. Smith2018-11-131-1/+1
| | | | | | Discovered using clang's MemorySanitizer when it ran python3's test_fstring test_misformed_unicode_character_name. An msan build will fail by simply executing: ./python -c 'u"\N"'
* bpo-35081: Rename internal headers (GH-10275)Victor Stinner2018-11-121-1/+1
| | | | | | | | | | | | | | Rename Include/internal/ headers: * pycore_hash.h -> pycore_pyhash.h * pycore_lifecycle.h -> pycore_pylifecycle.h * pycore_mem.h -> pycore_pymem.h * pycore_state.h -> pycore_pystate.h Add missing headers to Makefile.pre.in and PCbuild: * pycore_condvar.h. * pycore_hamt.h * pycore_pyhash.h
* bpo-35081: Add pycore_fileutils.h (GH-10371)Victor Stinner2018-11-061-0/+1
| | | | Move Py_BUILD_CORE code from Include/fileutils.h to a new Include/internal/pycore_fileutils.h file.
* bpo-35081: Add pycore_ prefix to internal header files (GH-10263)Victor Stinner2018-10-311-1/+1
| | | | | | | | | | | | | | | | | | | | * Rename Include/internal/ header files: * pyatomic.h -> pycore_atomic.h * ceval.h -> pycore_ceval.h * condvar.h -> pycore_condvar.h * context.h -> pycore_context.h * pygetopt.h -> pycore_getopt.h * gil.h -> pycore_gil.h * hamt.h -> pycore_hamt.h * hash.h -> pycore_hash.h * mem.h -> pycore_mem.h * pystate.h -> pycore_state.h * warnings.h -> pycore_warnings.h * PCbuild project, Makefile.pre.in, Modules/Setup: add the Include/internal/ directory to the search paths of header files. * Update includes. For example, replace #include "internal/mem.h" with #include "pycore_mem.h".
* bpo-9263: _PyXXX_CheckConsistency() use _PyObject_ASSERT() (GH-10108)Victor Stinner2018-10-261-36/+40
| | | | | | | | | | Use _PyObject_ASSERT() in: * _PyDict_CheckConsistency() * _PyType_CheckConsistency() * _PyUnicode_CheckConsistency() _PyObject_ASSERT() dumps the faulty object if the assertion fails to help debugging.
* bpo-30863: Rewrite PyUnicode_AsWideChar() and PyUnicode_AsWideCharString(). ↵Serhiy Storchaka2018-10-231-123/+121
| | | | | | (GH-2599) They no longer cache the wchar_t* representation of string objects.
* Add missing closing quote and trailing period in str.isidentifier() ↵Emanuele Gaifas2018-10-081-2/+2
| | | | | docstring (GH-9756) This rectifies commit ffc5a14d00db984c8e72c7b67da8a493e17e2c14.