summaryrefslogtreecommitdiffstats
path: root/Objects/unicodeobject.c
Commit message (Collapse)AuthorAgeFilesLines
...
* bpo-33012: Fix invalid function cast warnings with gcc 8. (GH-6749)Serhiy Storchaka2018-11-271-1/+1
| | | | | | Fix invalid function cast warnings with gcc 8 for method conventions different from METH_NOARGS, METH_O and METH_VARARGS excluding Argument Clinic generated code.
* bpo-33954: Fix _PyUnicode_InsertThousandsGrouping() (GH-10623)Victor Stinner2018-11-261-98/+165
| | | | | | | | | | | | Fix str.format(), float.__format__() and complex.__format__() methods for non-ASCII decimal point when using the "n" formatter. Changes: * Rewrite _PyUnicode_InsertThousandsGrouping(): it now requires a _PyUnicodeWriter object for the buffer and a Python str object for digits. * Rename FILL() macro to unicode_fill(), convert it to static inline function, add "assert(0 <= start);" and rework its code.
* bpo-35059: Cast void* to PyObject* (GH-10650)Victor Stinner2018-11-221-3/+6
| | | Don't pass void* to Python macros: use _PyObject_CAST().
* bpo-35081: Add Include/internal/pycore_object.h (GH-10640)Victor Stinner2018-11-211-0/+1
| | | | Move _PyObject_GC_TRACK() and _PyObject_GC_UNTRACK() from Include/objimpl.h to Include/internal/pycore_object.h.
* bpo-35214: Fix OOB memory access in unicode escape parser (GH-10506)Gregory P. Smith2018-11-131-1/+1
| | | | | | Discovered using clang's MemorySanitizer when it ran python3's test_fstring test_misformed_unicode_character_name. An msan build will fail by simply executing: ./python -c 'u"\N"'
* bpo-35081: Rename internal headers (GH-10275)Victor Stinner2018-11-121-1/+1
| | | | | | | | | | | | | | Rename Include/internal/ headers: * pycore_hash.h -> pycore_pyhash.h * pycore_lifecycle.h -> pycore_pylifecycle.h * pycore_mem.h -> pycore_pymem.h * pycore_state.h -> pycore_pystate.h Add missing headers to Makefile.pre.in and PCbuild: * pycore_condvar.h. * pycore_hamt.h * pycore_pyhash.h
* bpo-35081: Add pycore_fileutils.h (GH-10371)Victor Stinner2018-11-061-0/+1
| | | | Move Py_BUILD_CORE code from Include/fileutils.h to a new Include/internal/pycore_fileutils.h file.
* bpo-35081: Add pycore_ prefix to internal header files (GH-10263)Victor Stinner2018-10-311-1/+1
| | | | | | | | | | | | | | | | | | | | * Rename Include/internal/ header files: * pyatomic.h -> pycore_atomic.h * ceval.h -> pycore_ceval.h * condvar.h -> pycore_condvar.h * context.h -> pycore_context.h * pygetopt.h -> pycore_getopt.h * gil.h -> pycore_gil.h * hamt.h -> pycore_hamt.h * hash.h -> pycore_hash.h * mem.h -> pycore_mem.h * pystate.h -> pycore_state.h * warnings.h -> pycore_warnings.h * PCbuild project, Makefile.pre.in, Modules/Setup: add the Include/internal/ directory to the search paths of header files. * Update includes. For example, replace #include "internal/mem.h" with #include "pycore_mem.h".
* bpo-9263: _PyXXX_CheckConsistency() use _PyObject_ASSERT() (GH-10108)Victor Stinner2018-10-261-36/+40
| | | | | | | | | | Use _PyObject_ASSERT() in: * _PyDict_CheckConsistency() * _PyType_CheckConsistency() * _PyUnicode_CheckConsistency() _PyObject_ASSERT() dumps the faulty object if the assertion fails to help debugging.
* bpo-30863: Rewrite PyUnicode_AsWideChar() and PyUnicode_AsWideCharString(). ↵Serhiy Storchaka2018-10-231-123/+121
| | | | | | (GH-2599) They no longer cache the wchar_t* representation of string objects.
* Add missing closing quote and trailing period in str.isidentifier() ↵Emanuele Gaifas2018-10-081-2/+2
| | | | | docstring (GH-9756) This rectifies commit ffc5a14d00db984c8e72c7b67da8a493e17e2c14.
* bpo-33014: Clarify str.isidentifier docstring (GH-6088)Sanyam Khurana2018-10-081-3/+3
| | | | | | * bpo-33014: Clarify str.isidentifier docstring * bpo-33014: Add code example in isidentifier documentation
* Revert "bpo-34595: Add %T format to PyUnicode_FromFormatV() (GH-9080)" (GH-9187)Victor Stinner2018-09-111-52/+53
| | | This reverts commit 886483e2b9bbabf60ab769683269b873381dd5ee.
* bpo-34595: Add %T format to PyUnicode_FromFormatV() (GH-9080)Victor Stinner2018-09-071-53/+52
| | | | | | | | | * Add %T format to PyUnicode_FromFormatV(), and so to PyUnicode_FromFormat() and PyErr_Format(), to format an object type name: equivalent to "%s" with Py_TYPE(obj)->tp_name. * Replace Py_TYPE(obj)->tp_name with %T format in unicodeobject.c. * Add unit test on %T format. * Rename unicode_fromformat_write_cstr() to unicode_fromformat_write_utf8(), to make the intent more explicit.
* bpo-34523: Support surrogatepass in locale codecs (GH-8995)Victor Stinner2018-08-291-72/+101
| | | | | | | | | | | | | | | | | | | | Add support for the "surrogatepass" error handler in PyUnicode_DecodeFSDefault() and PyUnicode_EncodeFSDefault() for the UTF-8 encoding. Changes: * _Py_DecodeUTF8Ex() and _Py_EncodeUTF8Ex() now support the surrogatepass error handler (_Py_ERROR_SURROGATEPASS). * _Py_DecodeLocaleEx() and _Py_EncodeLocaleEx() now use the _Py_error_handler enum instead of "int surrogateescape" to pass the error handler. These functions now return -3 if the error handler is unknown. * Add unit tests on _Py_DecodeLocaleEx() and _Py_EncodeLocaleEx() in test_codecs. * Rename get_error_handler() to _Py_GetErrorHandler() and expose it as a private function. * _freeze_importlib doesn't need config.filesystem_errors="strict" workaround anymore.
* bpo-34523: Add _PyCoreConfig.filesystem_encoding (GH-8963)Victor Stinner2018-08-291-24/+18
| | | | | | | | | | | | | | | | | | | | | | | _PyCoreConfig_Read() is now responsible to choose the filesystem encoding and error handler. Using Py_Main(), the encoding is now chosen even before calling Py_Initialize(). _PyCoreConfig.filesystem_encoding is now the reference, instead of Py_FileSystemDefaultEncoding, for the Python filesystem encoding. Changes: * Add filesystem_encoding and filesystem_errors to _PyCoreConfig * _PyCoreConfig_Read() now reads the locale encoding for the file system encoding. * PyUnicode_EncodeFSDefault() and PyUnicode_DecodeFSDefaultAndSize() now use the interpreter configuration rather than Py_FileSystemDefaultEncoding and Py_FileSystemDefaultEncodeErrors global configuration variables. * Add _Py_SetFileSystemEncoding() and _Py_ClearFileSystemEncoding() private functions to only modify Py_FileSystemDefaultEncoding and Py_FileSystemDefaultEncodeErrors in coreconfig.c. * _Py_CoerceLegacyLocale() now takes an int rather than _PyCoreConfig for the warning.
* bpo-34435: Add missing NULL check to unicode_encode_ucs1(). (GH-8823)Alexey Izbyshev2018-08-191-2/+3
| | | Reported by Svace static analyzer.
* bpo-22602: Raise an exception in the UTF-7 decoder for ill-formed sequences ↵Zackery Spytz2018-08-191-0/+5
| | | | | | | starting with "+". (GH-8741) The UTF-7 decoder now raises UnicodeDecodeError for ill-formed sequences starting with "+" (as specified in RFC 2152).
* bpo-34301: Add _PyInterpreterState_Get() helper function (GH-8592)Victor Stinner2018-08-031-2/+2
| | | | sys_setcheckinterval() now uses a local variable to parse arguments, before writing into interp->check_interval.
* bpo-34087: Fix buffer overflow in int(s) and similar functions (GH-8274)INADA Naoki2018-07-141-0/+2
| | | | | | `_PyUnicode_TransformDecimalAndSpaceToASCII()` missed trailing NUL char. It caused buffer overflow in `_Py_string_to_number_with_underscores()`. This bug is introduced in 9b6c60cb.
* Change tp_size to tp_basicsize in comment and realign the comments (GH-6775)Bup2018-06-191-38/+38
|
* bpo-33012: Fix invalid function cast warnings with gcc 8 for METH_NOARGS. ↵Siddhesh Poyarekar2018-04-291-4/+4
| | | | | | | | | (GH-6030) METH_NOARGS functions need only a single argument but they are cast into a PyCFunction, which takes two arguments. This triggers an invalid function cast warning in gcc8 due to the argument mismatch. Fix this by adding a dummy unused argument.
* bpo-29803: remove a redandunt op and fix a comment in unicodeobject.c (#660)Xiang Zhang2018-02-131-5/+1
|
* bpo-32827: Fix usage of _PyUnicodeWriter_Prepare() in decoding errors ↵Serhiy Storchaka2018-02-131-7/+3
| | | | handler. (GH-5636)
* bpo-32747: Remove trailing spaces in docstrings. (GH-5491)oldk2018-02-021-1/+1
|
* bpo-32583: Fix possible crashing in builtin Unicode decoders (#5325)Xiang Zhang2018-01-311-2/+20
| | | | | When using customized decode error handlers, it is possible for builtin decoders to write out-of-bounds and then crash.
* Fix wrong assert in unicodeobject (GH-5340)INADA Naoki2018-01-271-1/+1
|
* bpo-32677: Add .isascii() to str, bytes and bytearray (GH-5342)INADA Naoki2018-01-271-0/+20
|
* bpo-29240: Fix locale encodings in UTF-8 Mode (#5170)Victor Stinner2018-01-151-331/+144
| | | | | | | | | | | | | | | | | | | | | | | | | | | Modify locale.localeconv(), time.tzname, os.strerror() and other functions to ignore the UTF-8 Mode: always use the current locale encoding. Changes: * Add _Py_DecodeLocaleEx() and _Py_EncodeLocaleEx(). On decoding or encoding error, they return the position of the error and an error message which are used to raise Unicode errors in PyUnicode_DecodeLocale() and PyUnicode_EncodeLocale(). * Replace _Py_DecodeCurrentLocale() with _Py_DecodeLocaleEx(). * PyUnicode_DecodeLocale() now uses _Py_DecodeLocaleEx() for all cases, especially for the strict error handler. * Add _Py_DecodeUTF8Ex(): return more information on decoding error and supports the strict error handler. * Rename _Py_EncodeUTF8_surrogateescape() to _Py_EncodeUTF8Ex(). * Replace _Py_EncodeCurrentLocale() with _Py_EncodeLocaleEx(). * Ignore the UTF-8 mode to encode/decode localeconv(), strerror() and time zone name. * Remove PyUnicode_DecodeLocale(), PyUnicode_DecodeLocaleAndSize() and PyUnicode_EncodeLocale() now ignore the UTF-8 mode: always use the "current" locale. * Remove _PyUnicode_DecodeCurrentLocale(), _PyUnicode_DecodeCurrentLocaleAndSize() and _PyUnicode_EncodeCurrentLocale().
* bpo-29240: Ignore UTF-8 Mode in time module (#5148)Victor Stinner2018-01-111-0/+6
| | | | | | time.strftime() must use the current LC_CTYPE encoding, not UTF-8 if the UTF-8 mode is enabled. Add _PyUnicode_DecodeCurrentLocale() function.
* bpo-29240: readline now ignores the UTF-8 Mode (#5145)Victor Stinner2018-01-101-10/+52
| | | | | | | | | | | | Add new fuctions ignoring the UTF-8 mode: * _Py_DecodeCurrentLocale() * _Py_EncodeCurrentLocale() * _PyUnicode_DecodeCurrentLocaleAndSize() * _PyUnicode_EncodeCurrentLocale() Modify the readline module to use these functions. Re-enable test_readline.test_nonascii().
* bpo-32030: Add _Py_EncodeLocaleRaw() (#4961)Victor Stinner2017-12-211-4/+21
| | | | | | | | | | | | Replace Py_EncodeLocale() with _Py_EncodeLocaleRaw() in: * _Py_wfopen() * _Py_wreadlink() * _Py_wrealpath() * _Py_wstat() * pymain_open_filename() These functions are called early during Python intialization, only the RAW memory allocator must be used.
* bpo-32030: Add _Py_EncodeUTF8_surrogateescape() (#4960)Victor Stinner2017-12-211-0/+89
| | | | | Py_EncodeLocale() now uses _Py_EncodeUTF8_surrogateescape(), instead of using temporary unicode and bytes objects. So Py_EncodeLocale() doesn't use the Python C API anymore.
* bpo-32240: Add the const qualifier to declarations of PyObject* array ↵Serhiy Storchaka2017-12-151-1/+1
| | | | arguments. (#4746)
* bpo-29240: PEP 540: Add a new UTF-8 Mode (#855)Victor Stinner2017-12-131-9/+21
| | | | | | | | | | | | | | | | | | | | | | * Add -X utf8 command line option, PYTHONUTF8 environment variable and a new sys.flags.utf8_mode flag. * If the LC_CTYPE locale is "C" at startup: enable automatically the UTF-8 mode. * Add _winapi.GetACP(). encodings._alias_mbcs() now calls _winapi.GetACP() to get the ANSI code page * locale.getpreferredencoding() now returns 'UTF-8' in the UTF-8 mode. As a side effect, open() now uses the UTF-8 encoding by default in this mode. * Py_DecodeLocale() and Py_EncodeLocale() now use the UTF-8 encoding in the UTF-8 Mode. * Update subprocess._args_from_interpreter_flags() to handle -X utf8 * Skip some tests relying on the current locale if the UTF-8 mode is enabled. * Add test_utf8mode.py. * _Py_DecodeUTF8_surrogateescape() gets a new optional parameter to return also the length (number of wide characters). * pymain_get_global_config() and pymain_set_global_config() now always copy flag values, rather than only copying if the new value is greater than the old value.
* bpo-31979: Remove unused align_maxchar() function (#4527)Victor Stinner2017-11-231-13/+0
|
* bpo-31979: Simplify transforming decimals to ASCII (#4336)Serhiy Storchaka2017-11-131-104/+32
| | | | | in int(), float() and complex() parsers. This also speeds up parsing non-ASCII numbers by around 20%.
* Add the const qualifier to "char *" variables that refer to literal strings. ↵Serhiy Storchaka2017-11-111-3/+3
| | | | (#4370)
* bpo-23699: Use a macro to reduce boilerplate code in rich comparison ↵stratakis2017-11-021-30/+4
| | | | functions (GH-793)
* bpo-20047: Make bytearray methods partition() and rpartition() rejecting (#4158)Serhiy Storchaka2017-10-281-2/+2
| | | separators that are not bytes-like objects.
* bpo-31825: Fixed OverflowError in the 'unicode-escape' codec (#4058)Serhiy Storchaka2017-10-201-1/+1
| | | | and in codecs.escape_decode() when decode an escaped non-ascii byte.
* bpo-31338 (#3374)Barry Warsaw2017-09-151-50/+34
| | | | | | | * Add Py_UNREACHABLE() as an alias to abort(). * Use Py_UNREACHABLE() instead of assert(0) * Convert more unreachable code to use Py_UNREACHABLE() * Document Py_UNREACHABLE() and a few other macros.
* bpo-31393: Fix the use of PyUnicode_READY(). (#3451)Serhiy Storchaka2017-09-081-2/+8
|
* bpo-30860: Consolidate stateful runtime globals. (#3397)Eric Snow2017-09-081-0/+1
| | | | | | | * group the (stateful) runtime globals into various topical structs * consolidate the topical structs under a single top-level _PyRuntimeState struct * add a check-c-globals.py script that helps identify runtime globals Other globals are excluded (see globals.txt and check-c-globals.py).
* bpo-30923: Silence fall-through warnings included in -Wextra since gcc-7.0. ↵Stefan Krah2017-08-211-2/+3
| | | | (#3157)
* bpo-22207: Add checks for possible integer overflows in unicodeobject.c. (#2623)Serhiy Storchaka2017-07-111-6/+12
| | | Based on patch by Victor Stinner.
* [security] bpo-13617: Reject embedded null characters in wchar* strings. (#2302)Serhiy Storchaka2017-06-281-0/+14
| | | | | | | Based on patch by Victor Stinner. Add private C API function _PyUnicode_AsUnicode() which is similar to PyUnicode_AsUnicode(), but checks for null characters.
* bpo-30708: Check for null characters in PyUnicode_AsWideCharString(). (#2285)Serhiy Storchaka2017-06-271-27/+22
| | | | | Raise a ValueError if the second argument is NULL and the wchar_t\* string contains null characters.
* bpo-29802: Fix reference counting in module-level struct functions (#1213)Serhiy Storchaka2017-04-201-0/+1
| | | | when pass arguments of wrong type.
* Expand the PySlice_GetIndicesEx macro. (#1023)Serhiy Storchaka2017-04-081-2/+3
|