summaryrefslogtreecommitdiffstats
path: root/Objects/stringlib
Commit message (Collapse)AuthorAgeFilesLines
* gh-139353: Rename formatter_unicode.c to unicode_formatter.c (#139723)Victor Stinner2025-10-081-97/+0
| | | | | | | | | | * Move Python/formatter_unicode.c to Objects/unicode_formatter.c. * Move Objects/stringlib/localeutil.h content into unicode_formatter.c. Remove localeutil.h. * Move _PyUnicode_InsertThousandsGrouping() to unicode_formatter.c and mark the function as static. * Rename unicode_fill() to _PyUnicode_Fill() and export it in pycore_unicodeobject.h. * Move MAX_UNICODE to pycore_unicodeobject.h as _Py_MAX_UNICODE.
* gh-129813, PEP 782: Use PyBytesWriter in utf8_encoder() (#138874)Victor Stinner2025-09-231-23/+35
| | | | Replace the private _PyBytesWriter API with the new public PyBytesWriter API in utf8_encoder() and unicode_encode_ucs1().
* GH-137623: Use an AC decorator for docstring line length enforcement (#137690)Adam Turner2025-08-181-1/+2
|
* gh-127971: fix off-by-one read beyond the end of a string during search ↵Duane Griffin2025-07-131-4/+4
| | | | (#132574)
* gh-111178: remove redundant casts for functions with correct signatures ↵Bénédikt Tran2025-04-011-2/+2
| | | | (#131673)
* gh-131525: Cache the result of tuple_hash (#131529)Michael Droettboom2025-03-271-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * gh-131525: Cache the result of tuple_hash * Fix debug builds * Add blurb * Fix formatting * Pre-compute empty tuple singleton * Mostly set the cache within tuple_alloc * Fixes for TSAN * Pre-compute empty tuple singleton * Fix for 32-bit platforms * Assert that op != NULL in _PyTuple_RESET_HASH_CACHE * Use FT_ATOMIC_STORE_SSIZE_RELAXED macro * Update Include/internal/pycore_tuple.h Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com> * Fix alignment * atomic load * Update Objects/tupleobject.c Co-authored-by: Chris Eibl <138194463+chris-eibl@users.noreply.github.com> --------- Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com> Co-authored-by: Chris Eibl <138194463+chris-eibl@users.noreply.github.com>
* gh-111178: Fix function signatures for multiple tests (#131496)Victor Stinner2025-03-201-8/+12
|
* gh-87790: support thousands separators for formatting fractional part of ↵Sergey B Kirpichev2025-02-251-6/+21
| | | | | | | | | | | | | | floats (#125304) ```pycon >>> f"{123_456.123_456:_._f}" # Whole and fractional '123_456.123_456' >>> f"{123_456.123_456:_f}" # Integer component only '123_456.123456' >>> f"{123_456.123_456:._f}" # Fractional component only '123456.123_456' >>> f"{123_456.123_456:.4_f}" # with precision '123456.1_235' ```
* gh-122943: Add the varpos parameter in _PyArg_UnpackKeywords (GH-126564)Serhiy Storchaka2024-11-081-2/+3
| | | | Remove _PyArg_UnpackKeywordsWithVararg. Add comments for integer arguments of _PyArg_UnpackKeywords.
* gh-115754: Use Py_GetConstant(Py_CONSTANT_EMPTY_STR) (#125194)Victor Stinner2024-10-091-2/+2
| | | | | Replace PyUnicode_New(0, 0), PyUnicode_FromString("") and PyUnicode_FromStringAndSize("", 0) with Py_GetConstant(Py_CONSTANT_EMPTY_STR).
* gh-124502: Optimize unicode_eq() (#125105)Victor Stinner2024-10-081-7/+12
|
* gh-121040: Use __attribute__((fallthrough)) (#121044)Victor Stinner2024-06-271-2/+2
| | | | | | | | | | | | | Fix warnings when using -Wimplicit-fallthrough compiler flag. Annotate explicitly "fall through" switch cases with a new _Py_FALLTHROUGH macro which uses __attribute__((fallthrough)) if available. Replace "fall through" comments with _Py_FALLTHROUGH. Add _Py__has_attribute() macro. No longer define __has_attribute() macro if it's not defined. Move also _Py__has_builtin() at the top of pyport.h. Co-Authored-By: Nikita Sobolev <mail@sobolevn.me>
* gh-120196: Reuse find_max_char() for bytes objects (#120497)Ruben Vorderman2024-06-171-4/+6
|
* gh-120397: Optimize str.count() for single characters (#120398)Ruben Vorderman2024-06-131-0/+19
|
* gh-119879: str.find(): Utilize last character gap for two-way periodic ↵d.grigonis2024-06-041-28/+35
| | | | needles (#119880)
* gh-119396: Optimize unicode_repr() (#119617)Victor Stinner2024-05-281-0/+95
| | | | | | | | | | | | | | | | | | | | | | Use stringlib to specialize unicode_repr() for each string kind (UCS1, UCS2, UCS4). Benchmark: +-------------------------------------+---------+----------------------+ | Benchmark | ref | change2 | +=====================================+=========+======================+ | repr('abc') | 100 ns | 103 ns: 1.02x slower | +-------------------------------------+---------+----------------------+ | repr('a' * 100) | 369 ns | 369 ns: 1.00x slower | +-------------------------------------+---------+----------------------+ | repr(('a' + squote) * 100) | 1.21 us | 946 ns: 1.27x faster | +-------------------------------------+---------+----------------------+ | repr(('a' + nl) * 100) | 1.23 us | 907 ns: 1.36x faster | +-------------------------------------+---------+----------------------+ | repr(dquote + ('a' + squote) * 100) | 1.08 us | 858 ns: 1.25x faster | +-------------------------------------+---------+----------------------+ | Geometric mean | (ref) | 1.16x faster | +-------------------------------------+---------+----------------------+
* gh-117557: Improve error messages when a string, bytes or bytearray of ↵Serhiy Storchaka2024-05-281-7/+49
| | | | length 1 are expected (GH-117631)
* gh-117431: Adapt bytes and bytearray .find() and friends to Argument Clinic ↵Erlend E. Aasland2024-04-121-47/+0
| | | | | | | | | | | | | | (#117502) This change gives a significant speedup, as the METH_FASTCALL calling convention is now used. The following bytes and bytearray methods are adapted: - count() - find() - index() - rfind() - rindex() Co-authored-by: Inada Naoki <songofacandy@gmail.com>
* gh-110964: Remove private _PyArg functions (#110966)Victor Stinner2023-10-171-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the following private functions and structures to pycore_modsupport.h internal C API: * _PyArg_BadArgument() * _PyArg_CheckPositional() * _PyArg_NoKeywords() * _PyArg_NoPositional() * _PyArg_ParseStack() * _PyArg_ParseStackAndKeywords() * _PyArg_Parser structure * _PyArg_UnpackKeywords() * _PyArg_UnpackKeywordsWithVararg() * _PyArg_UnpackStack() * _Py_ANY_VARARGS() Changes: * Python/getargs.h now includes pycore_modsupport.h to export functions. * clinic.py now adds pycore_modsupport.h when one of these functions is used. * Add pycore_modsupport.h includes when a C extension uses one of these functions. * Define Py_BUILD_CORE_MODULE in C extensions which now include directly or indirectly (via code generated by Argument Clinic) pycore_modsupport.h: * _csv * _curses_panel * _dbm * _gdbm * _multiprocessing.posixshmem * _sqlite.row * _statistics * grp * resource * syslog * _testcapi: bad_get() no longer uses METH_FASTCALL calling convention but METH_VARARGS. Replace _PyArg_UnpackStack() with PyArg_ParseTuple(). * _testcapi: add PYTESTCAPI_NEED_INTERNAL_API macro which is defined by _testcapi sub-modules which need the internal C API (pycore_modsupport.h): exceptions.c, float.c, vectorcall.c, watchers.c. * Remove Include/cpython/modsupport.h header file. Include/modsupport.h no longer includes the removed header file. * Fix mypy clinic.py
* gh-107603: Argument Clinic: Only include pycore_gc.h if needed (#108726)Victor Stinner2023-08-311-5/+4
| | | | | | | | | | | | | | | | | | | | Argument Clinic now only includes pycore_gc.h if PyGC_Head is needed, and only includes pycore_runtime.h if _Py_ID() is needed. * Add 'condition' optional argument to Clinic.add_include(). * deprecate_keyword_use() includes pycore_runtime.h when using the _PyID() function. * Fix rendering of includes: comments start at the column 35. * Mark PC/clinic/_wmimodule.cpp.h and "Objects/stringlib/clinic/*.h.h" header files as generated in .gitattributes. Effects: * 42 header files generated by AC no longer include the internal C API, instead of 4 header files before. For example, Modules/clinic/_abc.c.h no longer includes the internal C API. * Fix _testclinic_depr.c.h: it now always includes pycore_runtime.h to get _Py_ID().
* gh-106320: Remove private AC converter functions (#108505)Victor Stinner2023-08-261-1/+2
| | | | | | | | | | | | | | Move these private functions to the internal C API (pycore_abstract.h): * _Py_convert_optional_to_ssize_t() * _PyNumber_Index() Argument Clinic now emits #include "pycore_abstract.h" when these functions are used. The parser of the c-analyzer tool now uses a list of files which use the limited C API, rather than a list of files using the internal C API.
* gh-108444: Argument Clinic uses PyLong_AsInt() (#108458)Victor Stinner2023-08-241-2/+2
| | | | Argument Clinic now uses the new public PyLong_AsInt(), rather than the old name _PyLong_AsInt().
* gh-106320: Add pycore_complexobject.h header file (#106339)Victor Stinner2023-07-021-0/+1
| | | | | | Add internal pycore_complexobject.h header file. Move _Py_c_xxx() functions and _PyComplex_FormatAdvancedWriter() function to this new header file.
* gh-92536: Remove PyUnicode_READY() calls (#105210)Victor Stinner2023-06-012-15/+2
| | | | Since Python 3.12, PyUnicode_READY() does nothing and always returns 0.
* gh-105156: Cleanup usage of old Py_UNICODE type (#105158)Victor Stinner2023-06-011-1/+1
| | | | | | | | | | | | * refcounts.dat: * Remove Py_UNICODE functions. * Replace Py_UNICODE argument type with wchar_t. * _PyUnicode_ToLowercase(), _PyUnicode_ToUppercase(), _PyUnicode_ToTitlecase() are no longer deprecated in comments. It's no longer needed since they now use Py_UCS4 type, rather than the deprecated Py_UNICODE type. * gdb: Remove unused char_width() method.
* gh-99537: Use Py_SETREF() function in C code (#99657)Victor Stinner2022-11-221-4/+2
| | | | | | | | | | | | | | | Fix potential race condition in code patterns: * Replace "Py_DECREF(var); var = new;" with "Py_SETREF(var, new);" * Replace "Py_XDECREF(var); var = new;" with "Py_XSETREF(var, new);" * Replace "Py_CLEAR(var); var = new;" with "Py_XSETREF(var, new);" Other changes: * Replace "old = var; var = new; Py_DECREF(var)" with "Py_SETREF(var, new);" * Replace "old = var; var = new; Py_XDECREF(var)" with "Py_XSETREF(var, new);" * And remove the "old" variable.
* gh-99300: Use Py_NewRef() in Objects/ directory (#99354)Victor Stinner2022-11-103-10/+5
| | | | Replace Py_INCREF() and Py_XINCREF() with Py_NewRef() and Py_XNewRef() in C files of the Objects/ directory.
* gh-97982: Remove asciilib_count() (#98164)Victor Stinner2022-10-111-1/+6
| | | | | asciilib_count() is the same than ucs1lib_count(): the code is not specialized for ASCII strings, so it's not worth it to have a separated function. Remove asciilib_count() function.
* gh-94808: improve comments and coverage of fastsearch.h (GH-96760)Dennis Sweeney2022-09-132-5/+6
|
* gh-90928: Improve static initialization of keywords tuple in AC (#95907)Erlend E. Aasland2022-08-131-18/+8
|
* gh-90928: Statically Initialize the Keywords Tuple in Clinic-Generated Code ↵Eric Snow2022-08-111-2/+41
| | | | | | | | | | | | | | | | (gh-95860) We only statically initialize for core code and builtin modules. Extension modules still create the tuple at runtime. We'll solve that part of interpreter isolation separately. This change includes generated code. The non-generated changes are in: * Tools/clinic/clinic.py * Python/getargs.c * Include/cpython/modsupport.h * Makefile.pre.in (re-generate global strings after running clinic) * very minor tweaks to Modules/_codecsmodule.c and Python/Python-tokenize.c All other changes are generated code (clinic, global strings).
* gh-84461: Silence some compiler warnings on WASM (GH-93978)Christian Heimes2022-06-201-1/+1
|
* gh-93033: Use wmemchr in stringlib (GH-93034)goldsteinn2022-05-248-14/+36
| | | | | Generally comparable perf for the "good" case where memchr doesn't return any collisions (false matches on lower byte) but clearly faster with collisions.
* gh-89653: Use int type for Unicode kind (#92704)Victor Stinner2022-05-131-1/+1
| | | | Use the same type that PyUnicode_FromKindAndData() kind parameter type (public C API): int.
* gh-89653: PEP 670: Convert PyUnicode_KIND() macro to function (#92705)Victor Stinner2022-05-131-3/+0
| | | | | | | | In the limited C API version 3.12, PyUnicode_KIND() is now implemented as a static inline function. Keep the macro for the regular C API and for the limited C API version 3.11 and older to prevent introducing new compiler warnings. Update _decimal.c and stringlib/eq.h for PyUnicode_KIND().
* gh-92536: PEP 623: Remove wstr and legacy APIs from Unicode (GH-92537)Inada Naoki2022-05-122-40/+3
|
* gh-91320: Argument Clinic uses _PyCFunction_CAST() (#32210)Victor Stinner2022-05-031-5/+5
| | | | Replace "(PyCFunction)(void(*)(void))func" cast with _PyCFunction_CAST(func).
* bpo-36819: Fix crashes in built-in encoders with weird error handlers (GH-28593)Serhiy Storchaka2022-05-021-2/+13
| | | | | | | If the error handler returns position less or equal than the starting position of non-encodable characters, most of built-in encoders didn't properly re-size the output buffer. This led to out-of-bounds writes, and segfaults.
* bpo-46670: Define all macros for stringlib (GH-31176)Victor Stinner2022-02-078-17/+22
| | | | | bytesobject.c, bytearrayobject.c and unicodeobject.c now define all macros used by stringlib, to avoid using undefined macros. Fix "gcc -Wundef" warnings.
* bpo-35134: Add Include/cpython/floatobject.h (GH-28957)Victor Stinner2021-10-141-0/+2
| | | | | Split Include/floatobject.h into sub-files: add Include/cpython/floatobject.h and Include/internal/pycore_floatobject.h.
* bpo-41972: Tweak fastsearch.h string search algorithms (GH-27091)Dennis Sweeney2021-07-191-272/+331
|
* Remove effbot urls (GH-26308)E-Paine2021-05-221-1/+2
|
* bpo-43179: Generalise alignment for optimised string routines (GH-24624)Jessica Clarke2021-03-312-11/+6
| | | | | | | | | | | | | | | | | | | | | | | | | * Remove m68k-specific hack from ascii_decode On m68k, alignments of primitives is more relaxed, with 4-byte and 8-byte types only requiring 2-byte alignment, thus using sizeof(size_t) does not work. Instead, use the portable alternative. Note that this is a minimal fix that only relaxes the assertion and the condition for when to use the optimised version remains overly strict. Such issues will be fixed tree-wide in the next commit. NB: In C11 we could use _Alignof(size_t) instead, but for compatibility we use autoconf. * Optimise string routines for architectures with non-natural alignment C only requires that sizeof(x) is a multiple of alignof(x), not that the two are equal. Thus anywhere where we optimise based on alignment we should be using alignof(x) not sizeof(x). This is more annoying than it would be in C11 where we could just use _Alignof(x) (and alignof(x) in C++11), but since we still require only C99 we must plumb the information all the way from autoconf through the various typedefs and defines.
* bpo-41972: Use the two-way algorithm for string searching (GH-22904)Dennis Sweeney2021-02-282-20/+901
| | | | | Implement an enhanced variant of Crochemore and Perrin's Two-Way string searching algorithm, which reduces worst-case time from quadratic (the product of the string and pattern lengths) to linear. This applies to forward searches (like``find``, ``index``, ``replace``); the algorithm for reverse searches (like ``rfind``) is not changed. Co-authored-by: Tim Peters <tim.peters@gmail.com>
* bpo-42519: Replace PyObject_MALLOC() with PyObject_Malloc() (GH-23587)Victor Stinner2020-12-011-2/+2
| | | | | | | | | No longer use deprecated aliases to functions: * Replace PyObject_MALLOC() with PyObject_Malloc() * Replace PyObject_REALLOC() with PyObject_Realloc() * Replace PyObject_FREE() with PyObject_Free() * Replace PyObject_Del() with PyObject_Free() * Replace PyObject_DEL() with PyObject_Free()
* bpo-42519: Replace PyMem_MALLOC() with PyMem_Malloc() (GH-23586)Victor Stinner2020-12-011-1/+1
| | | | | | | | | | | No longer use deprecated aliases to functions: * Replace PyMem_MALLOC() with PyMem_Malloc() * Replace PyMem_REALLOC() with PyMem_Realloc() * Replace PyMem_FREE() with PyMem_Free() * Replace PyMem_Del() with PyMem_Free() * Replace PyMem_DEL() with PyMem_Free() Modify also the PyMem_DEL() macro to use directly PyMem_Free().
* bpo-38252: Use 8-byte step to detect ASCII sequence in 64bit Windows build ↵Ma Lin2020-10-182-25/+25
| | | | (GH-16334)
* bpo-40521: Make empty Unicode string per interpreter (GH-21096)Victor Stinner2020-06-237-10/+6
| | | Each interpreter now has its own empty Unicode string singleton.
* bpo-40521: Make bytes singletons per interpreter (GH-21074)Victor Stinner2020-06-238-17/+24
| | | | | | Each interpreter now has its own empty bytes string and single byte character singletons. Replace STRINGLIB_EMPTY macro with STRINGLIB_GET_EMPTY() macro.
* bpo-29882: Add _Py_popcount32() function (GH-20518)Victor Stinner2020-06-081-1/+1
| | | | | | * Rename pycore_byteswap.h to pycore_bitutils.h. * Move popcount_digit() to pycore_bitutils.h as _Py_popcount32(). * _Py_popcount32() uses GCC and clang builtin function if available. * Add unit tests to _Py_popcount32().