summaryrefslogtreecommitdiffstats
path: root/Modules/_sre
Commit message (Collapse)AuthorAgeFilesLines
* gh-112075: Add critical sections for most dict APIs (#114508)Dino Viehland2024-02-061-12/+15
| | | | | | | | | Starts adding thread safety to dict objects. Use @critical_section for APIs which are exposed via argument clinic and don't directly correlate with a public C API which needs to acquire the lock Use a _lock_held suffix for keeping changes to complicated functions simple and just wrapping them with a critical section Acquire and release the lock in an existing function where it won't be overly disruptive to the existing logic
* gh-115015: Argument Clinic: fix generated code for METH_METHOD methods ↵Erlend E. Aasland2024-02-051-3/+3
| | | | without params (#115016)
* gh-114569: Use PyMem_* APIs for most non-PyObject uses (#114574)Erlend E. Aasland2024-01-261-2/+2
| | | Fix usage in Modules, Objects, and Parser subdirectories.
* Output more details in the re tracing (GH-111357)Serhiy Storchaka2023-10-262-4/+42
|
* gh-110964: Remove private _PyArg functions (#110966)Victor Stinner2023-10-171-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the following private functions and structures to pycore_modsupport.h internal C API: * _PyArg_BadArgument() * _PyArg_CheckPositional() * _PyArg_NoKeywords() * _PyArg_NoPositional() * _PyArg_ParseStack() * _PyArg_ParseStackAndKeywords() * _PyArg_Parser structure * _PyArg_UnpackKeywords() * _PyArg_UnpackKeywordsWithVararg() * _PyArg_UnpackStack() * _Py_ANY_VARARGS() Changes: * Python/getargs.h now includes pycore_modsupport.h to export functions. * clinic.py now adds pycore_modsupport.h when one of these functions is used. * Add pycore_modsupport.h includes when a C extension uses one of these functions. * Define Py_BUILD_CORE_MODULE in C extensions which now include directly or indirectly (via code generated by Argument Clinic) pycore_modsupport.h: * _csv * _curses_panel * _dbm * _gdbm * _multiprocessing.posixshmem * _sqlite.row * _statistics * grp * resource * syslog * _testcapi: bad_get() no longer uses METH_FASTCALL calling convention but METH_VARARGS. Replace _PyArg_UnpackStack() with PyArg_ParseTuple(). * _testcapi: add PYTESTCAPI_NEED_INTERNAL_API macro which is defined by _testcapi sub-modules which need the internal C API (pycore_modsupport.h): exceptions.c, float.c, vectorcall.c, watchers.c. * Remove Include/cpython/modsupport.h header file. Include/modsupport.h no longer includes the removed header file. * Fix mypy clinic.py
* gh-109747: Improve errors for unsupported look-behind patterns (GH-109859)Serhiy Storchaka2023-10-142-9/+7
| | | | | | Now re.error is raised instead of OverflowError or RuntimeError for too large width of look-behind pattern. The limit is increased to 2**32-1 (was 2**31-1).
* gh-110590: Fix a bug where _sre.compile would overwrite exceptions (#110591)Nikita Sobolev2023-10-101-0/+3
| | | | TypeError would be overwritten by OverflowError if 'code' param contained non-ints.
* gh-109631: Allow interruption of short repeated regex matches (GH-109867)Serhiy Storchaka2023-09-262-2/+5
| | | | Counting for signal checking now continues in new match from the point where it ended in the previous match instead of starting from 0.
* gh-108765: Python.h no longer includes <ctype.h> (#108831)Victor Stinner2023-09-031-5/+33
| | | | | | | | | | | | | | | | | | | | | | | Remove <ctype.h> in C files which don't use it; only sre.c and _decimal.c still use it. Remove _PY_PORT_CTYPE_UTF8_ISSUE code from pyport.h: * Code added by commit b5047fd01948ab108edcc1b3c2c901d915814cfd in 2004 for MacOSX and FreeBSD. * Test removed by commit 52ddaefb6bab1a74ecffe8519c02735794ebfbe1 in 2007, since Python str type now uses locale independent functions like Py_ISALPHA() and Py_TOLOWER() and the Unicode database. Modules/_sre/sre.c replaces _PY_PORT_CTYPE_UTF8_ISSUE with new functions: sre_isalnum(), sre_tolower(), sre_toupper(). Remove unused includes: * _localemodule.c: remove <stdio.h>. * getargs.c: remove <float.h>. * dynload_win.c: remove <direct.h>, it no longer calls _getcwd() since commit fb1f68ed7cc1536482d1debd70a53c5442135fe2 (in 2001).
* gh-107603: Argument Clinic: Only include pycore_gc.h if needed (#108726)Victor Stinner2023-08-311-5/+4
| | | | | | | | | | | | | | | | | | | | Argument Clinic now only includes pycore_gc.h if PyGC_Head is needed, and only includes pycore_runtime.h if _Py_ID() is needed. * Add 'condition' optional argument to Clinic.add_include(). * deprecate_keyword_use() includes pycore_runtime.h when using the _PyID() function. * Fix rendering of includes: comments start at the column 35. * Mark PC/clinic/_wmimodule.cpp.h and "Objects/stringlib/clinic/*.h.h" header files as generated in .gitattributes. Effects: * 42 header files generated by AC no longer include the internal C API, instead of 4 header files before. For example, Modules/clinic/_abc.c.h no longer includes the internal C API. * Fix _testclinic_depr.c.h: it now always includes pycore_runtime.h to get _Py_ID().
* gh-106320: Remove private AC converter functions (#108505)Victor Stinner2023-08-261-1/+2
| | | | | | | | | | | | | | Move these private functions to the internal C API (pycore_abstract.h): * _Py_convert_optional_to_ssize_t() * _PyNumber_Index() Argument Clinic now emits #include "pycore_abstract.h" when these functions are used. The parser of the c-analyzer tool now uses a list of files which use the limited C API, rather than a list of files using the internal C API.
* gh-108444: Argument Clinic uses PyLong_AsInt() (#108458)Victor Stinner2023-08-241-6/+6
| | | | Argument Clinic now uses the new public PyLong_AsInt(), rather than the old name _PyLong_AsInt().
* gh-106320: Remove private _PyDict functions (#108449)Victor Stinner2023-08-241-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | Move private functions to the internal C API (pycore_dict.h): * _PyDictView_Intersect() * _PyDictView_New() * _PyDict_ContainsId() * _PyDict_DelItemId() * _PyDict_DelItem_KnownHash() * _PyDict_GetItemIdWithError() * _PyDict_GetItem_KnownHash() * _PyDict_HasSplitTable() * _PyDict_NewPresized() * _PyDict_Next() * _PyDict_Pop() * _PyDict_SetItemId() * _PyDict_SetItem_KnownHash() * _PyDict_SizeOf() No longer export most of these functions. Move also the _PyDictViewObject structure to the internal C API. Move dict_getitem_knownhash() function from _testcapi to the _testinternalcapi extension. Update test_capi.test_dict for this change.
* gh-100061: Proper fix of the bug in the matching of possessive quantifiers ↵SKO2023-08-161-0/+4
| | | | | | | | (GH-102612) Restore the global Input Stream pointer after trying to match a sub-pattern. Co-authored-by: Ma Lin <animalize@users.noreply.github.com>
* gh-106869: Use new PyMemberDef constant names (#106871)Victor Stinner2023-07-251-10/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Remove '#include "structmember.h"'. * If needed, add <stddef.h> to get offsetof() function. * Update Parser/asdl_c.py to regenerate Python/Python-ast.c. * Replace: * T_SHORT => Py_T_SHORT * T_INT => Py_T_INT * T_LONG => Py_T_LONG * T_FLOAT => Py_T_FLOAT * T_DOUBLE => Py_T_DOUBLE * T_STRING => Py_T_STRING * T_OBJECT => _Py_T_OBJECT * T_CHAR => Py_T_CHAR * T_BYTE => Py_T_BYTE * T_UBYTE => Py_T_UBYTE * T_USHORT => Py_T_USHORT * T_UINT => Py_T_UINT * T_ULONG => Py_T_ULONG * T_STRING_INPLACE => Py_T_STRING_INPLACE * T_BOOL => Py_T_BOOL * T_OBJECT_EX => Py_T_OBJECT_EX * T_LONGLONG => Py_T_LONGLONG * T_ULONGLONG => Py_T_ULONGLONG * T_PYSSIZET => Py_T_PYSSIZET * T_NONE => _Py_T_NONE * READONLY => Py_READONLY * PY_AUDIT_READ => Py_AUDIT_READ * READ_RESTRICTED => Py_AUDIT_READ * PY_WRITE_RESTRICTED => _Py_WRITE_RESTRICTED * RESTRICTED => (READ_RESTRICTED | _Py_WRITE_RESTRICTED)
* gh-86493: Use PyModule_Add() instead of PyModule_AddObjectRef() (GH-106860)Serhiy Storchaka2023-07-181-6/+1
|
* gh-106508: Improve debugging of the _sre module (GH-106509)Serhiy Storchaka2023-07-083-5/+31
| | | | | | | Now the VERBOSE macro can control tracing on per-pattern basis: * 0 -- disabled * 1 -- only if the DEBUG flag set * 2 -- always
* gh-106524: Fix a crash in _sre.template() (GH-106525)Radislav Chugunov2023-07-081-0/+2
| | | | | | Some items remained uninitialized if _sre.template() was called with invalid indices. Then attempt to clear them in the destructor led to dereferencing of uninitialized pointer.
* gh-104922: remove PY_SSIZE_T_CLEAN (#106315)Inada Naoki2023-07-021-2/+0
|
* gh-105687: Remove deprecated objects from `re` module (#105688)Nikita Sobolev2023-06-142-3/+1
|
* gh-92536: Remove PyUnicode_READY() calls (#105210)Victor Stinner2023-06-011-2/+0
| | | | Since Python 3.12, PyUnicode_READY() does nothing and always returns 0.
* gh-99113: Add Py_MOD_PER_INTERPRETER_GIL_SUPPORTED (gh-104205)Eric Snow2023-05-051-0/+1
| | | Here we are doing no more than adding the value for Py_mod_multiple_interpreters and using it for stdlib modules. We will start checking for it in gh-104206 (once PyInterpreterState.ceval.own_gil is added in gh-104204).
* gh-99300: Use Py_NewRef() in Modules/ directory (#99468)Victor Stinner2022-11-141-52/+27
| | | | Replace Py_INCREF() and Py_XINCREF() with Py_NewRef() and Py_XNewRef() in test C files of the Modules/ directory.
* gh-98740: Fix validation of conditional expressions in RE (GH-98764)Serhiy Storchaka2022-11-031-27/+29
| | | | | | | | | | | In very rare circumstances the JUMP opcode could be confused with the argument of the opcode in the "then" part which doesn't end with the JUMP opcode. This led to incorrect detection of the final JUMP opcode and incorrect calculation of the size of the subexpression. NOTE: Changed return value of functions _validate_inner() and _validate_charset() in Modules/_sre/sre.c. Now they return 0 on success, -1 on failure, and 1 if the last op is JUMP (which usually is a failure). Previously they returned 1 on success and 0 on failure.
* gh-91524: Speed up the regular expression substitution (#91525)Serhiy Storchaka2022-10-234-43/+329
| | | | | | | | | Functions re.sub() and re.subn() and corresponding re.Pattern methods are now 2-3 times faster for replacement strings containing group references. Closes #91524 Primarily authored by serhiy-storchaka Serhiy Storchaka Minor-cleanups-by: Gregory P. Smith [Google] <greg@krypto.org>
* gh-97669: Create Tools/build/ directory (#97963)Victor Stinner2022-10-172-2/+2
| | | | | | | | | | | | | | | | | | | | | | | Create Tools/build/ directory. Move the following scripts from Tools/scripts/ to Tools/build/: * check_extension_modules.py * deepfreeze.py * freeze_modules.py * generate_global_objects.py * generate_levenshtein_examples.py * generate_opcode_h.py * generate_re_casefix.py * generate_sre_constants.py * generate_stdlib_module_names.py * generate_token.py * parse_html5_entities.py * smelly.py * stable_abi.py * umarshal.py * update_file.py * verify_ensurepip_wheels.py Update references to these scripts.
* gh-90928: Improve static initialization of keywords tuple in AC (#95907)Erlend E. Aasland2022-08-131-222/+92
|
* gh-90928: Statically Initialize the Keywords Tuple in Clinic-Generated Code ↵Eric Snow2022-08-111-14/+449
| | | | | | | | | | | | | | | | (gh-95860) We only statically initialize for core code and builtin modules. Extension modules still create the tuple at runtime. We'll solve that part of interpreter isolation separately. This change includes generated code. The non-generated changes are in: * Tools/clinic/clinic.py * Python/getargs.c * Include/cpython/modsupport.h * Makefile.pre.in (re-generate global strings after running clinic) * very minor tweaks to Modules/_codecsmodule.c and Python/Python-tokenize.c All other changes are generated code (clinic, global strings).
* gh-91404: Revert "bpo-23689: re module, fix memory leak when a match is ↵Gregory P. Smith2022-06-175-81/+47
| | | | | | | | | | | | terminated by a signal or allocation failure (GH-32283) (#93882) Revert "bpo-23689: re module, fix memory leak when a match is terminated by a signal or memory allocation failure (GH-32283)" This reverts commit 6e3eee5c11b539e9aab39cff783acf57838c355a. Manual fixups to increase the MAGIC number and to handle conflicts with a couple of changes that landed after that. Thanks for reviews by Ma Lin and Serhiy Storchaka.
* gh-93741: Add private C API _PyImport_GetModuleAttrString() (GH-93742)Serhiy Storchaka2022-06-141-11/+1
| | | | | | It combines PyImport_ImportModule() and PyObject_GetAttrString() and saves 4-6 lines of code on every use. Add also _PyImport_GetModuleAttr() which takes Python strings as arguments.
* gh-92728: Restore re.template, but deprecate it (GH-93161)Miro Hrončok2022-05-252-0/+2
| | | | | | Revert "bpo-47211: Remove function re.template() and flag re.TEMPLATE (GH-32300)" This reverts commit b09184bf05b07b77c5ecfedd4daa846be3cbf0a9.
* gh-91320: Argument Clinic uses _PyCFunction_CAST() (#32210)Victor Stinner2022-05-031-19/+19
| | | | Replace "(PyCFunction)(void(*)(void))func" cast with _PyCFunction_CAST(func).
* gh-91583: AC: Fix regression for functions with defining_class (GH-91739)Serhiy Storchaka2022-04-301-44/+250
| | | | | Argument Clinic now generates the same efficient code as before adding the defining_class parameter.
* gh-91870: Remove unsupported SRE opcode CALL (GH-91872)Serhiy Storchaka2022-04-263-40/+37
| | | | | | | It was initially added to support atomic groups, but that support was never fully implemented, and CALL was only left in the compiler, but not interpreter and parser. ATOMIC_GROUP is now used to support atomic groups.
* gh-91616: re module, fix .fullmatch() mismatch when using Atomic Grouping or ↵Ma Lin2022-04-191-7/+7
| | | | | | | | | Possessive Quantifiers (GH-91681) These jumps should use DO_JUMP0() instead of DO_JUMP(): - JUMP_POSS_REPEAT_1 - JUMP_POSS_REPEAT_2 - JUMP_ATOMIC_GROUP
* bpo-47256: Increasing the depth of backtracking in RE (GH-32411)Ma Lin2022-04-182-44/+44
| | | | | | | | | | Limit the maximum capturing group to 2**30-1 on 64-bit platforms (it was 2**31-1). No change on 32-bit platforms (2**28-1). It allows to reduce the size of SRE(match_context): - On 32 bit platform: 36 bytes, no change. (msvc2022) - On 64 bit platform: 72 bytes -> 56 bytes. (msvc2022/gcc9.4) which leads to increasing the depth of backtracking.
* gh-91404: Use computed gotos and reduce indirection in re (#91495)Brandt Bucher2022-04-152-343/+452
|
* bpo-47152: Automatically regenerate sre_constants.h (GH-91439)Serhiy Storchaka2022-04-121-2/+2
| | | | | | | * Move the code for generating Modules/_sre/sre_constants.h from Lib/re/_constants.py into a separate script Tools/scripts/generate_sre_constants.py. * Add target `regen-sre` in the makefile. * Make target `regen-all` depending on `regen-sre`.
* bpo-47211: Remove function re.template() and flag re.TEMPLATE (GH-32300)Serhiy Storchaka2022-04-062-2/+0
| | | They were undocumented and never working.
* bpo-47152: Move sources of the _sre module into a subdirectory (GH-32290)Serhiy Storchaka2022-04-045-0/+5947