summaryrefslogtreecommitdiffstats
path: root/Objects/stringlib
Commit message (Collapse)AuthorAgeFilesLines
* [3.10] bpo-43504: Remove effbot urls (GH-26308) (#92161)Thaddeus14992022-05-021-1/+2
| | | | | | * [3.10] Remove effbot urls (GH-26308). (cherry picked from commit e9f66aedf44ccc3be27975cfb070a44ce6a6bd13) Co-authored-by: E-Paine <63801254+E-Paine@users.noreply.github.com>
* bpo-36819: Fix crashes in built-in encoders with weird error handlers (GH-28593)Miss Islington (bot)2022-05-021-2/+13
| | | | | | | | | If the error handler returns position less or equal than the starting position of non-encodable characters, most of built-in encoders didn't properly re-size the output buffer. This led to out-of-bounds writes, and segfaults. (cherry picked from commit 18b07d773e09a2719e69aeaa925d5abb7ba0c068) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
* bpo-43179: Generalise alignment for optimised string routines (GH-24624)Jessica Clarke2021-03-312-11/+6
| | | | | | | | | | | | | | | | | | | | | | | | | * Remove m68k-specific hack from ascii_decode On m68k, alignments of primitives is more relaxed, with 4-byte and 8-byte types only requiring 2-byte alignment, thus using sizeof(size_t) does not work. Instead, use the portable alternative. Note that this is a minimal fix that only relaxes the assertion and the condition for when to use the optimised version remains overly strict. Such issues will be fixed tree-wide in the next commit. NB: In C11 we could use _Alignof(size_t) instead, but for compatibility we use autoconf. * Optimise string routines for architectures with non-natural alignment C only requires that sizeof(x) is a multiple of alignof(x), not that the two are equal. Thus anywhere where we optimise based on alignment we should be using alignof(x) not sizeof(x). This is more annoying than it would be in C11 where we could just use _Alignof(x) (and alignof(x) in C++11), but since we still require only C99 we must plumb the information all the way from autoconf through the various typedefs and defines.
* bpo-41972: Use the two-way algorithm for string searching (GH-22904)Dennis Sweeney2021-02-282-20/+901
| | | | | Implement an enhanced variant of Crochemore and Perrin's Two-Way string searching algorithm, which reduces worst-case time from quadratic (the product of the string and pattern lengths) to linear. This applies to forward searches (like``find``, ``index``, ``replace``); the algorithm for reverse searches (like ``rfind``) is not changed. Co-authored-by: Tim Peters <tim.peters@gmail.com>
* bpo-42519: Replace PyObject_MALLOC() with PyObject_Malloc() (GH-23587)Victor Stinner2020-12-011-2/+2
| | | | | | | | | No longer use deprecated aliases to functions: * Replace PyObject_MALLOC() with PyObject_Malloc() * Replace PyObject_REALLOC() with PyObject_Realloc() * Replace PyObject_FREE() with PyObject_Free() * Replace PyObject_Del() with PyObject_Free() * Replace PyObject_DEL() with PyObject_Free()
* bpo-42519: Replace PyMem_MALLOC() with PyMem_Malloc() (GH-23586)Victor Stinner2020-12-011-1/+1
| | | | | | | | | | | No longer use deprecated aliases to functions: * Replace PyMem_MALLOC() with PyMem_Malloc() * Replace PyMem_REALLOC() with PyMem_Realloc() * Replace PyMem_FREE() with PyMem_Free() * Replace PyMem_Del() with PyMem_Free() * Replace PyMem_DEL() with PyMem_Free() Modify also the PyMem_DEL() macro to use directly PyMem_Free().
* bpo-38252: Use 8-byte step to detect ASCII sequence in 64bit Windows build ↵Ma Lin2020-10-182-25/+25
| | | | (GH-16334)
* bpo-40521: Make empty Unicode string per interpreter (GH-21096)Victor Stinner2020-06-237-10/+6
| | | Each interpreter now has its own empty Unicode string singleton.
* bpo-40521: Make bytes singletons per interpreter (GH-21074)Victor Stinner2020-06-238-17/+24
| | | | | | Each interpreter now has its own empty bytes string and single byte character singletons. Replace STRINGLIB_EMPTY macro with STRINGLIB_GET_EMPTY() macro.
* bpo-29882: Add _Py_popcount32() function (GH-20518)Victor Stinner2020-06-081-1/+1
| | | | | | * Rename pycore_byteswap.h to pycore_bitutils.h. * Move popcount_digit() to pycore_bitutils.h as _Py_popcount32(). * _Py_popcount32() uses GCC and clang builtin function if available. * Add unit tests to _Py_popcount32().
* bpo-40792: Make the result of PyNumber_Index() always having exact type int. ↵Serhiy Storchaka2020-05-281-5/+5
| | | | | | | | | | | | (GH-20443) Previously, the result could have been an instance of a subclass of int. Also revert bpo-26202 and make attributes start, stop and step of the range object having exact type int. Add private function _PyNumber_Index() which preserves the old behavior of PyNumber_Index() for performance to use it in the conversion functions like PyLong_AsLong().
* bpo-37999: No longer use __int__ in implicit integer conversions. (GH-15636)Serhiy Storchaka2020-05-261-26/+1
| | | | Only __index__ should be used to make integer conversions lossless.
* bpo-40302: UTF-32 encoder SWAB4() macro use a|b rather than a+b (GH-19572)Victor Stinner2020-04-171-1/+1
|
* bpo-40302: Add pycore_byteswap.h header file (GH-19552)Victor Stinner2020-04-171-16/+20
| | | | | | | | | | | | | | Add a new internal pycore_byteswap.h header file with the following functions: * _Py_bswap16() * _Py_bswap32() * _Py_bswap64() Use these functions in _ctypes, sha256 and sha512 modules, and also use in the UTF-32 encoder. sha256, sha512 and _ctypes modules are now built with the internal C API.
* bpo-39943: Add the const qualifier to pointers on non-mutable PyBytes data. ↵Serhiy Storchaka2020-04-121-2/+2
| | | | (GH-19472)
* bpo-39943: Add the const qualifier to pointers on non-mutable PyUnicode ↵Serhiy Storchaka2020-04-111-1/+1
| | | | data. (GH-19345)
* Update some www.unicode.org URLs to use HTTPS. (GH-18912)Benjamin Peterson2020-03-111-1/+1
|
* bpo-38249: Expand Py_UNREACHABLE() to __builtin_unreachable() in the release ↵Serhiy Storchaka2020-03-091-4/+5
| | | | | mode. (GH-16329) Co-authored-by: Victor Stinner <vstinner@python.org>
* bpo-39087: Optimize PyUnicode_AsUTF8AndSize() (GH-18327)Inada Naoki2020-02-271-18/+17
| | | Avoid using temporary bytes object.
* bpo-35081: Move bytes_methods.h to the internal C API (GH-18492)Victor Stinner2020-02-121-1/+1
| | | | | Move the bytes_methods.h header file to the internal C API as pycore_bytes_methods.h: it only contains private symbols (prefixed by "_Py"), except of the PyDoc_STRVAR_shared() macro.
* closes bpo-39605: Fix some casts to not cast away const. (GH-18453)Andy Lester2020-02-123-4/+4
| | | | | | | | | | | | | | | gcc -Wcast-qual turns up a number of instances of casting away constness of pointers. Some of these can be safely modified, by either: Adding the const to the type cast, as in: - return _PyUnicode_FromUCS1((unsigned char*)s, size); + return _PyUnicode_FromUCS1((const unsigned char*)s, size); or, Removing the cast entirely, because it's not necessary (but probably was at one time), as in: - PyDTrace_FUNCTION_ENTRY((char *)filename, (char *)funcname, lineno); + PyDTrace_FUNCTION_ENTRY(filename, funcname, lineno); These changes will not change code, but they will make it much easier to check for errors in consts
* bpo-39573: Use Py_SET_SIZE() function (GH-18402)Victor Stinner2020-02-071-1/+1
| | | | Replace direct acccess to PyVarObject.ob_size with usage of the Py_SET_SIZE() function.
* bpo-36051: Fix compiler warning. (GH-18325)Inada Naoki2020-02-031-1/+1
|
* bpo-36051: Drop GIL during large bytes.join() (GH-17757)Bruce Merry2020-01-291-17/+40
| | | | | Improve multi-threaded performance by dropping the GIL in the fast path of bytes.join. To avoid increasing overhead for small joins, it is only done if the output size exceeds a threshold.
* bpo-39372: Clean header files of declared interfaces with no implementations ↵Pablo Galindo2020-01-185-13/+0
| | | | | | | | (GH-18037) The public API symbols being removed are: _PyBytes_InsertThousandsGroupingLocale, _PyBytes_InsertThousandsGrouping, _Py_InitializeFromArgs, _Py_InitializeFromWideArgs, _PyFloat_Repr, _PyFloat_Digits, _PyFloat_DigitsInit, PyFrame_ExtendStack, _PyAIterWrapper_Type, PyNullImporter_Type, PyCmpWrapper_Type, PySortWrapper_Type, PyNoArgsFunction.
* bpo-28029: Make "".replace("", s, n) returning s for any n != 0. (GH-16981)Serhiy Storchaka2019-10-301-8/+5
|
* Doc: Fix typo in fastsearch comments (GH-14608)Valentin Haenel2019-09-111-2/+2
|
* bpo-37034: Display argument name on errors with keyword arguments with ↵Rémi Lapeyre2019-08-291-4/+4
| | | | Argument Clinic. (GH-13593)
* Fix typos in comments, docs and test names (#15018)Min ho Kim2019-07-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | * Fix typos in comments, docs and test names * Update test_pyparse.py account for change in string length * Apply suggestion: splitable -> splittable Co-Authored-By: Terry Jan Reedy <tjreedy@udel.edu> * Apply suggestion: splitable -> splittable Co-Authored-By: Terry Jan Reedy <tjreedy@udel.edu> * Apply suggestion: Dealloccte -> Deallocate Co-Authored-By: Terry Jan Reedy <tjreedy@udel.edu> * Update posixmodule checksum. * Reverse idlelib changes.
* bpo-24214: Fixed the UTF-8 and UTF-16 incremental decoders. (GH-14304)Serhiy Storchaka2019-06-251-3/+3
| | | | | | | * The UTF-8 incremental decoders fails now fast if encounter a sequence that can't be handled by the error handler. * The UTF-16 incremental decoders with the surrogatepass error handler decodes now a lone low surrogate with final=False.
* Improve exception message for str.format (GH-12675)Francisco Couzo2019-06-011-2/+7
|
* bpo-36974: tp_print -> tp_vectorcall_offset and tp_reserved -> tp_as_async ↵Jeroen Demeyer2019-05-311-4/+4
| | | | | | | | | (GH-13464) Automatically replace tp_print -> tp_vectorcall_offset tp_compare -> tp_as_async tp_reserved -> tp_as_async
* Fix couple of dead code paths (GH-7418)David Carlier2019-05-171-1/+0
|
* bpo-36775: _PyCoreConfig only uses wchar_t* (GH-13062)Victor Stinner2019-05-021-1/+1
| | | | | | | | | | | | | | | | | _PyCoreConfig: Change filesystem_encoding, filesystem_errors, stdio_encoding and stdio_errors fields type from char* to wchar_t*. Changes: * PyInterpreterState: replace fscodec_initialized (int) with fs_codec structure. * Add get_error_handler_wide() and unicode_encode_utf8() helper functions. * Add error_handler parameter to unicode_encode_locale() and unicode_decode_locale(). * Remove _PyCoreConfig_SetString(). * Rename _PyCoreConfig_SetWideString() to _PyCoreConfig_SetString(). * Rename _PyCoreConfig_SetWideStringFromString() to _PyCoreConfig_DecodeLocale().
* bpo-36127: Argument Clinic: inline parsing code for keyword parameters. ↵Serhiy Storchaka2019-03-141-4/+19
| | | | (GH-12058)
* bpo-35582: Argument Clinic: inline parsing code for positional parameters. ↵Serhiy Storchaka2019-01-111-7/+97
| | | | (GH-11313)
* bpo-23867: Argument Clinic: inline parsing code for a single positional ↵Serhiy Storchaka2018-12-251-2/+16
| | | | parameter. (GH-9689)
* bpo-33012: Fix invalid function cast warnings with gcc 8 in Argument Clinic. ↵Serhiy Storchaka2018-11-271-5/+5
| | | | | | | | (GH-6748) Fix invalid function cast warnings with gcc 8 for method conventions different from METH_NOARGS, METH_O and METH_VARARGS in Argument Clinic generated code.
* bpo-33954: Fix _PyUnicode_InsertThousandsGrouping() (GH-10623)Victor Stinner2018-11-261-133/+35
| | | | | | | | | | | | Fix str.format(), float.__format__() and complex.__format__() methods for non-ASCII decimal point when using the "n" formatter. Changes: * Rewrite _PyUnicode_InsertThousandsGrouping(): it now requires a _PyUnicodeWriter object for the buffer and a Python str object for digits. * Rename FILL() macro to unicode_fill(), convert it to static inline function, add "assert(0 <= start);" and rework its code.
* bpo-34523: Support surrogatepass in locale codecs (GH-8995)Victor Stinner2018-08-291-1/+1
| | | | | | | | | | | | | | | | | | | | Add support for the "surrogatepass" error handler in PyUnicode_DecodeFSDefault() and PyUnicode_EncodeFSDefault() for the UTF-8 encoding. Changes: * _Py_DecodeUTF8Ex() and _Py_EncodeUTF8Ex() now support the surrogatepass error handler (_Py_ERROR_SURROGATEPASS). * _Py_DecodeLocaleEx() and _Py_EncodeLocaleEx() now use the _Py_error_handler enum instead of "int surrogateescape" to pass the error handler. These functions now return -3 if the error handler is unknown. * Add unit tests on _Py_DecodeLocaleEx() and _Py_EncodeLocaleEx() in test_codecs. * Rename get_error_handler() to _Py_GetErrorHandler() and expose it as a private function. * _freeze_importlib doesn't need config.filesystem_errors="strict" workaround anymore.
* bpo-20180: complete AC conversion of Objects/stringlib/transmogrify.h (GH-8039)Tal Einat2018-07-062-33/+233
| | | | * converted bytes methods: expandtabs, ljust, rjust, center, zfill * updated char_convertor to properly set the C default value
* bpo-33012: Fix invalid function cast warnings with gcc 8 for METH_NOARGS. ↵Siddhesh Poyarekar2018-04-291-13/+13
| | | | | | | | | (GH-6030) METH_NOARGS functions need only a single argument but they are cast into a PyCFunction, which takes two arguments. This triggers an invalid function cast warning in gcc8 due to the argument mismatch. Fix this by adding a dummy unused argument.
* bpo-32677: Add .isascii() to str, bytes and bytearray (GH-5342)INADA Naoki2018-01-271-0/+6
|
* bpo-31338 (#3374)Barry Warsaw2017-09-151-2/+1
| | | | | | | * Add Py_UNREACHABLE() as an alias to abort(). * Use Py_UNREACHABLE() instead of assert(0) * Convert more unreachable code to use Py_UNREACHABLE() * Document Py_UNREACHABLE() and a few other macros.
* bpo-30923: Silence fall-through warnings included in -Wextra since gcc-7.0. ↵Stefan Krah2017-08-211-2/+2
| | | | (#3157)
* bpo-30978: str.format_map() now passes key lookup exceptions through. (#2790)Serhiy Storchaka2017-08-031-6/+10
| | | Previously any exception was replaced with a KeyError exception.
* bpo-24821: Fixed the slowing down to 25 times in the searching of some (#505)Serhiy Storchaka2017-03-301-6/+40
| | | | unlucky Unicode characters.
* Issue #28999: Use Py_RETURN_NONE, Py_RETURN_TRUE and Py_RETURN_FALSE whereverSerhiy Storchaka2017-01-231-4/+2
| | | | possible but Coccinelle couldn't find opportunity.
* Issue #29145: Merge 3.6.Xiang Zhang2017-01-101-1/+1
|
* Issue #28561: Clean up UTF-8 encoder: remove dead code, update comments, etc.Serhiy Storchaka2016-10-301-10/+4
| | | | Patch by Xiang Zhang.