summaryrefslogtreecommitdiffstats
path: root/Objects/unicodeobject.c
Commit message (Collapse)AuthorAgeFilesLines
...
* Add missing closing quote and trailing period in str.isidentifier() ↵Emanuele Gaifas2018-10-081-2/+2
| | | | | docstring (GH-9756) This rectifies commit ffc5a14d00db984c8e72c7b67da8a493e17e2c14.
* bpo-33014: Clarify str.isidentifier docstring (GH-6088)Sanyam Khurana2018-10-081-3/+3
| | | | | | * bpo-33014: Clarify str.isidentifier docstring * bpo-33014: Add code example in isidentifier documentation
* Revert "bpo-34595: Add %T format to PyUnicode_FromFormatV() (GH-9080)" (GH-9187)Victor Stinner2018-09-111-52/+53
| | | This reverts commit 886483e2b9bbabf60ab769683269b873381dd5ee.
* bpo-34595: Add %T format to PyUnicode_FromFormatV() (GH-9080)Victor Stinner2018-09-071-53/+52
| | | | | | | | | * Add %T format to PyUnicode_FromFormatV(), and so to PyUnicode_FromFormat() and PyErr_Format(), to format an object type name: equivalent to "%s" with Py_TYPE(obj)->tp_name. * Replace Py_TYPE(obj)->tp_name with %T format in unicodeobject.c. * Add unit test on %T format. * Rename unicode_fromformat_write_cstr() to unicode_fromformat_write_utf8(), to make the intent more explicit.
* bpo-34523: Support surrogatepass in locale codecs (GH-8995)Victor Stinner2018-08-291-72/+101
| | | | | | | | | | | | | | | | | | | | Add support for the "surrogatepass" error handler in PyUnicode_DecodeFSDefault() and PyUnicode_EncodeFSDefault() for the UTF-8 encoding. Changes: * _Py_DecodeUTF8Ex() and _Py_EncodeUTF8Ex() now support the surrogatepass error handler (_Py_ERROR_SURROGATEPASS). * _Py_DecodeLocaleEx() and _Py_EncodeLocaleEx() now use the _Py_error_handler enum instead of "int surrogateescape" to pass the error handler. These functions now return -3 if the error handler is unknown. * Add unit tests on _Py_DecodeLocaleEx() and _Py_EncodeLocaleEx() in test_codecs. * Rename get_error_handler() to _Py_GetErrorHandler() and expose it as a private function. * _freeze_importlib doesn't need config.filesystem_errors="strict" workaround anymore.
* bpo-34523: Add _PyCoreConfig.filesystem_encoding (GH-8963)Victor Stinner2018-08-291-24/+18
| | | | | | | | | | | | | | | | | | | | | | | _PyCoreConfig_Read() is now responsible to choose the filesystem encoding and error handler. Using Py_Main(), the encoding is now chosen even before calling Py_Initialize(). _PyCoreConfig.filesystem_encoding is now the reference, instead of Py_FileSystemDefaultEncoding, for the Python filesystem encoding. Changes: * Add filesystem_encoding and filesystem_errors to _PyCoreConfig * _PyCoreConfig_Read() now reads the locale encoding for the file system encoding. * PyUnicode_EncodeFSDefault() and PyUnicode_DecodeFSDefaultAndSize() now use the interpreter configuration rather than Py_FileSystemDefaultEncoding and Py_FileSystemDefaultEncodeErrors global configuration variables. * Add _Py_SetFileSystemEncoding() and _Py_ClearFileSystemEncoding() private functions to only modify Py_FileSystemDefaultEncoding and Py_FileSystemDefaultEncodeErrors in coreconfig.c. * _Py_CoerceLegacyLocale() now takes an int rather than _PyCoreConfig for the warning.
* bpo-34435: Add missing NULL check to unicode_encode_ucs1(). (GH-8823)Alexey Izbyshev2018-08-191-2/+3
| | | Reported by Svace static analyzer.
* bpo-22602: Raise an exception in the UTF-7 decoder for ill-formed sequences ↵Zackery Spytz2018-08-191-0/+5
| | | | | | | starting with "+". (GH-8741) The UTF-7 decoder now raises UnicodeDecodeError for ill-formed sequences starting with "+" (as specified in RFC 2152).
* bpo-34301: Add _PyInterpreterState_Get() helper function (GH-8592)Victor Stinner2018-08-031-2/+2
| | | | sys_setcheckinterval() now uses a local variable to parse arguments, before writing into interp->check_interval.
* bpo-34087: Fix buffer overflow in int(s) and similar functions (GH-8274)INADA Naoki2018-07-141-0/+2
| | | | | | `_PyUnicode_TransformDecimalAndSpaceToASCII()` missed trailing NUL char. It caused buffer overflow in `_Py_string_to_number_with_underscores()`. This bug is introduced in 9b6c60cb.
* Change tp_size to tp_basicsize in comment and realign the comments (GH-6775)Bup2018-06-191-38/+38
|
* bpo-33012: Fix invalid function cast warnings with gcc 8 for METH_NOARGS. ↵Siddhesh Poyarekar2018-04-291-4/+4
| | | | | | | | | (GH-6030) METH_NOARGS functions need only a single argument but they are cast into a PyCFunction, which takes two arguments. This triggers an invalid function cast warning in gcc8 due to the argument mismatch. Fix this by adding a dummy unused argument.
* bpo-29803: remove a redandunt op and fix a comment in unicodeobject.c (#660)Xiang Zhang2018-02-131-5/+1
|
* bpo-32827: Fix usage of _PyUnicodeWriter_Prepare() in decoding errors ↵Serhiy Storchaka2018-02-131-7/+3
| | | | handler. (GH-5636)
* bpo-32747: Remove trailing spaces in docstrings. (GH-5491)oldk2018-02-021-1/+1
|
* bpo-32583: Fix possible crashing in builtin Unicode decoders (#5325)Xiang Zhang2018-01-311-2/+20
| | | | | When using customized decode error handlers, it is possible for builtin decoders to write out-of-bounds and then crash.
* Fix wrong assert in unicodeobject (GH-5340)INADA Naoki2018-01-271-1/+1
|
* bpo-32677: Add .isascii() to str, bytes and bytearray (GH-5342)INADA Naoki2018-01-271-0/+20
|
* bpo-29240: Fix locale encodings in UTF-8 Mode (#5170)Victor Stinner2018-01-151-331/+144
| | | | | | | | | | | | | | | | | | | | | | | | | | | Modify locale.localeconv(), time.tzname, os.strerror() and other functions to ignore the UTF-8 Mode: always use the current locale encoding. Changes: * Add _Py_DecodeLocaleEx() and _Py_EncodeLocaleEx(). On decoding or encoding error, they return the position of the error and an error message which are used to raise Unicode errors in PyUnicode_DecodeLocale() and PyUnicode_EncodeLocale(). * Replace _Py_DecodeCurrentLocale() with _Py_DecodeLocaleEx(). * PyUnicode_DecodeLocale() now uses _Py_DecodeLocaleEx() for all cases, especially for the strict error handler. * Add _Py_DecodeUTF8Ex(): return more information on decoding error and supports the strict error handler. * Rename _Py_EncodeUTF8_surrogateescape() to _Py_EncodeUTF8Ex(). * Replace _Py_EncodeCurrentLocale() with _Py_EncodeLocaleEx(). * Ignore the UTF-8 mode to encode/decode localeconv(), strerror() and time zone name. * Remove PyUnicode_DecodeLocale(), PyUnicode_DecodeLocaleAndSize() and PyUnicode_EncodeLocale() now ignore the UTF-8 mode: always use the "current" locale. * Remove _PyUnicode_DecodeCurrentLocale(), _PyUnicode_DecodeCurrentLocaleAndSize() and _PyUnicode_EncodeCurrentLocale().
* bpo-29240: Ignore UTF-8 Mode in time module (#5148)Victor Stinner2018-01-111-0/+6
| | | | | | time.strftime() must use the current LC_CTYPE encoding, not UTF-8 if the UTF-8 mode is enabled. Add _PyUnicode_DecodeCurrentLocale() function.
* bpo-29240: readline now ignores the UTF-8 Mode (#5145)Victor Stinner2018-01-101-10/+52
| | | | | | | | | | | | Add new fuctions ignoring the UTF-8 mode: * _Py_DecodeCurrentLocale() * _Py_EncodeCurrentLocale() * _PyUnicode_DecodeCurrentLocaleAndSize() * _PyUnicode_EncodeCurrentLocale() Modify the readline module to use these functions. Re-enable test_readline.test_nonascii().
* bpo-32030: Add _Py_EncodeLocaleRaw() (#4961)Victor Stinner2017-12-211-4/+21
| | | | | | | | | | | | Replace Py_EncodeLocale() with _Py_EncodeLocaleRaw() in: * _Py_wfopen() * _Py_wreadlink() * _Py_wrealpath() * _Py_wstat() * pymain_open_filename() These functions are called early during Python intialization, only the RAW memory allocator must be used.
* bpo-32030: Add _Py_EncodeUTF8_surrogateescape() (#4960)Victor Stinner2017-12-211-0/+89
| | | | | Py_EncodeLocale() now uses _Py_EncodeUTF8_surrogateescape(), instead of using temporary unicode and bytes objects. So Py_EncodeLocale() doesn't use the Python C API anymore.
* bpo-32240: Add the const qualifier to declarations of PyObject* array ↵Serhiy Storchaka2017-12-151-1/+1
| | | | arguments. (#4746)
* bpo-29240: PEP 540: Add a new UTF-8 Mode (#855)Victor Stinner2017-12-131-9/+21
| | | | | | | | | | | | | | | | | | | | | | * Add -X utf8 command line option, PYTHONUTF8 environment variable and a new sys.flags.utf8_mode flag. * If the LC_CTYPE locale is "C" at startup: enable automatically the UTF-8 mode. * Add _winapi.GetACP(). encodings._alias_mbcs() now calls _winapi.GetACP() to get the ANSI code page * locale.getpreferredencoding() now returns 'UTF-8' in the UTF-8 mode. As a side effect, open() now uses the UTF-8 encoding by default in this mode. * Py_DecodeLocale() and Py_EncodeLocale() now use the UTF-8 encoding in the UTF-8 Mode. * Update subprocess._args_from_interpreter_flags() to handle -X utf8 * Skip some tests relying on the current locale if the UTF-8 mode is enabled. * Add test_utf8mode.py. * _Py_DecodeUTF8_surrogateescape() gets a new optional parameter to return also the length (number of wide characters). * pymain_get_global_config() and pymain_set_global_config() now always copy flag values, rather than only copying if the new value is greater than the old value.
* bpo-31979: Remove unused align_maxchar() function (#4527)Victor Stinner2017-11-231-13/+0
|
* bpo-31979: Simplify transforming decimals to ASCII (#4336)Serhiy Storchaka2017-11-131-104/+32
| | | | | in int(), float() and complex() parsers. This also speeds up parsing non-ASCII numbers by around 20%.
* Add the const qualifier to "char *" variables that refer to literal strings. ↵Serhiy Storchaka2017-11-111-3/+3
| | | | (#4370)
* bpo-23699: Use a macro to reduce boilerplate code in rich comparison ↵stratakis2017-11-021-30/+4
| | | | functions (GH-793)
* bpo-20047: Make bytearray methods partition() and rpartition() rejecting (#4158)Serhiy Storchaka2017-10-281-2/+2
| | | separators that are not bytes-like objects.
* bpo-31825: Fixed OverflowError in the 'unicode-escape' codec (#4058)Serhiy Storchaka2017-10-201-1/+1
| | | | and in codecs.escape_decode() when decode an escaped non-ascii byte.
* bpo-31338 (#3374)Barry Warsaw2017-09-151-50/+34
| | | | | | | * Add Py_UNREACHABLE() as an alias to abort(). * Use Py_UNREACHABLE() instead of assert(0) * Convert more unreachable code to use Py_UNREACHABLE() * Document Py_UNREACHABLE() and a few other macros.
* bpo-31393: Fix the use of PyUnicode_READY(). (#3451)Serhiy Storchaka2017-09-081-2/+8
|
* bpo-30860: Consolidate stateful runtime globals. (#3397)Eric Snow2017-09-081-0/+1
| | | | | | | * group the (stateful) runtime globals into various topical structs * consolidate the topical structs under a single top-level _PyRuntimeState struct * add a check-c-globals.py script that helps identify runtime globals Other globals are excluded (see globals.txt and check-c-globals.py).
* bpo-30923: Silence fall-through warnings included in -Wextra since gcc-7.0. ↵Stefan Krah2017-08-211-2/+3
| | | | (#3157)
* bpo-22207: Add checks for possible integer overflows in unicodeobject.c. (#2623)Serhiy Storchaka2017-07-111-6/+12
| | | Based on patch by Victor Stinner.
* [security] bpo-13617: Reject embedded null characters in wchar* strings. (#2302)Serhiy Storchaka2017-06-281-0/+14
| | | | | | | Based on patch by Victor Stinner. Add private C API function _PyUnicode_AsUnicode() which is similar to PyUnicode_AsUnicode(), but checks for null characters.
* bpo-30708: Check for null characters in PyUnicode_AsWideCharString(). (#2285)Serhiy Storchaka2017-06-271-27/+22
| | | | | Raise a ValueError if the second argument is NULL and the wchar_t\* string contains null characters.
* bpo-29802: Fix reference counting in module-level struct functions (#1213)Serhiy Storchaka2017-04-201-0/+1
| | | | when pass arguments of wrong type.
* Expand the PySlice_GetIndicesEx macro. (#1023)Serhiy Storchaka2017-04-081-2/+3
|
* bpo-29549: Fixes docstring for str.index (#256)Lisa Roach2017-04-051-2/+10
| | | | | | | | | | | | | | * Updates B.index documentation. * Updates str.index documentation, makes it Argument Clinic compatible. * Removes ArgumentClinic code. * Finishes string.index documentation. * Updates string.rindex documentation. * Documents B.rindex.
* bpo-29865: Use PyXXX_GET_SIZE macros rather than Py_SIZE for concrete types. ↵Serhiy Storchaka2017-03-211-1/+2
| | | | (#748)
* bpo-29116: Improve error message for concatenating str with non-str. (#710)Serhiy Storchaka2017-03-191-1/+10
|
* bpo-24037: Add Argument Clinic converter `bool(accept={int})`. (#485)Serhiy Storchaka2017-03-121-2/+2
|
* Use Py_RETURN_FALSE/Py_RETURN_TRUE rather than ↵Serhiy Storchaka2017-03-081-25/+25
| | | | PyBool_FromLong(0)/PyBool_FromLong(1). (#567)
* bpo-29568: Disable any characters between two percents for escaped percent ↵Serhiy Storchaka2017-03-081-7/+8
| | | | "%%" in the format string for classic string formatting. (GH-513)
* Fix grammar in doc string, RST markupMartin Panter2017-01-241-2/+2
|
* Issue #28999: Use Py_RETURN_NONE, Py_RETURN_TRUE and Py_RETURN_FALSE whereverSerhiy Storchaka2017-01-231-3/+1
| | | | possible. Patch is writen with Coccinelle.
* Issue #28769: The result of PyUnicode_AsUTF8AndSize() and PyUnicode_AsUTF8()Serhiy Storchaka2017-01-221-2/+2
| | | | is now of type "const char *" rather of "char *".
* Run Argument Clinic: METH_VARARGS=>METH_FASTCALLVictor Stinner2017-01-171-2/+2
| | | | | | | | Issue #29286. Run Argument Clinic to get the new faster METH_FASTCALL calling convention for functions using "boring" positional arguments. Manually fix _elementtree: _elementtree_XMLParser_doctype() must remain consistent with the clinic code.