summaryrefslogtreecommitdiffstats
path: root/Lib/test/test_re.py
Commit message (Collapse)AuthorAgeFilesLines
* [3.12] gh-100061: Proper fix of the bug in the matching of possessive ↵Serhiy Storchaka2023-08-161-2/+10
| | | | | | | | | | quantifiers (GH-102612) (#108003) Restore the global Input Stream pointer after trying to match a sub-pattern. . (cherry picked from commit abd9cc52d94b8e2835322b62c29f09bb0e6fcfe9) Co-authored-by: SKO <41810398+uyw4687@users.noreply.github.com>
* [3.12] gh-106052: Fix bug in the matching of possessive quantifiers ↵Serhiy Storchaka2023-08-091-0/+12
| | | | | | | | | | | (GH-106515) (#107796) [3.12] gh-106052: Fix bug in the matching of possessive quantifiers (gh-106515) It did not work in the case of a subpattern containing backtracking. Temporary implement possessive quantifiers as equivalent greedy qualifiers in atomic groups.. (cherry picked from commit 7b6e34e5baeb4162815ffa4d943b09a58e3f6580)
* [3.12] Move implementation specific RE tests to separate class (GH-106563) ↵Miss Islington (bot)2023-07-091-66/+69
| | | | | | | | (#106564) Move implementation specific RE tests to separate class (GH-106563) (cherry picked from commit 8cb6f9761e3c1cff3210697e3670b57591bf2e7a) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
* [3.12] gh-106510: Fix DEBUG output for atomic group (GH-106511) (GH-106548)Miss Islington (bot)2023-07-081-1/+4
| | | | | (cherry picked from commit 74ec02e9490d8aa086aa9ad9d1d34d2ad999b5af) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
* [3.12] gh-106524: Fix a crash in _sre.template() (GH-106525) (GH-106544)Miss Islington (bot)2023-07-081-0/+10
| | | | | | | | Some items remained uninitialized if _sre.template() was called with invalid indices. Then attempt to clear them in the destructor led to dereferencing of uninitialized pointer. (cherry picked from commit 2ef1dc37f02b08536b677dd23ec51541a60effd7) Co-authored-by: Radislav Chugunov <52372310+chgnrdv@users.noreply.github.com>
* gh-84559: Remove the new multiprocessing warning, too disruptive. (#101551)Gregory P. Smith2023-02-031-2/+1
| | | | This reverts the core of #100618 while leaving relevant documentation improvements and minor refactorings in place.
* GH-84559: Deprecate fork being the multiprocessing default. (#100618)Gregory P. Smith2023-02-021-1/+2
| | | | | | | | This starts the process. Users who don't specify their own start method and use the default on platforms where it is 'fork' will see a DeprecationWarning upon multiprocessing.Pool() construction or upon multiprocessing.Process.start() or concurrent.futures.ProcessPool use. See the related issue and documentation within this change for details.
* gh-98740: Fix validation of conditional expressions in RE (GH-98764)Serhiy Storchaka2022-11-031-0/+5
| | | | | | | | | | | In very rare circumstances the JUMP opcode could be confused with the argument of the opcode in the "then" part which doesn't end with the JUMP opcode. This led to incorrect detection of the final JUMP opcode and incorrect calculation of the size of the subexpression. NOTE: Changed return value of functions _validate_inner() and _validate_charset() in Modules/_sre/sre.c. Now they return 0 on success, -1 on failure, and 1 if the last op is JUMP (which usually is a failure). Previously they returned 1 on success and 0 on failure.
* gh-94675: Add a regression test for rjsmin re slowdown (GH-94685)Miro Hrončok2022-08-031-1/+30
| | | | | | | Adds a regression test for an re slowdown observed by rjsmin. Uses multiprocessing to kill the test after SHORT_TIMEOUT. Co-authored-by: Oleg Iarygin <dralife@yandex.ru> Co-authored-by: Christian Heimes <christian@python.org>
* gh-91404: Revert "bpo-23689: re module, fix memory leak when a match is ↵Gregory P. Smith2022-06-171-26/+2
| | | | | | | | | | | | terminated by a signal or allocation failure (GH-32283) (#93882) Revert "bpo-23689: re module, fix memory leak when a match is terminated by a signal or memory allocation failure (GH-32283)" This reverts commit 6e3eee5c11b539e9aab39cff783acf57838c355a. Manual fixups to increase the MAGIC number and to handle conflicts with a couple of changes that landed after that. Thanks for reviews by Ma Lin and Serhiy Storchaka.
* gh-92728: Restore re.template, but deprecate it (GH-93161)Miro Hrončok2022-05-251-3/+27
| | | | | | Revert "bpo-47211: Remove function re.template() and flag re.TEMPLATE (GH-32300)" This reverts commit b09184bf05b07b77c5ecfedd4daa846be3cbf0a9.
* gh-90473: Skip tests that don't apply to Emscripten and WASI (GH-92846)Christian Heimes2022-05-161-3/+9
|
* gh-91760: More strict rules for numerical group references and group names ↵Serhiy Storchaka2022-05-081-55/+24
| | | | | | | | in RE (GH-91792) Only sequence of ASCII digits is now accepted as a numerical reference. The group name in bytes patterns and replacement strings can now only contain ASCII letters and digits and underscore.
* gh-91760: Deprecate group names and numbers which will be invalid in future ↵Serhiy Storchaka2022-04-301-0/+56
| | | | | | | | (GH-91794) Only sequence of ASCII digits will be accepted as a numerical reference. The group name in bytes patterns and replacement strings could only contain ASCII letters and digits and underscore.
* Simplify testing the warning filename (GH-91868)Serhiy Storchaka2022-04-241-3/+3
| | | The context manager result has the "filename" attribute.
* RE: Add more tests for inline flag "x" and re.VERBOSE (GH-91854)Serhiy Storchaka2022-04-231-5/+27
|
* gh-91700: Validate the group number in conditional expression in RE (GH-91702)Serhiy Storchaka2022-04-221-0/+2
| | | | | | In expression (?(group)...) an appropriate re.error is now raised if the group number refers to not defined group. Previously it raised RuntimeError: invalid SRE code.
* gh-90568: Fix exception type for \N with a named sequence in RE (GH-91665)Serhiy Storchaka2022-04-221-0/+4
| | | re.error is now raised instead of TypeError.
* gh-91616: re module, fix .fullmatch() mismatch when using Atomic Grouping or ↵Ma Lin2022-04-191-0/+20
| | | | | | | | | Possessive Quantifiers (GH-91681) These jumps should use DO_JUMP0() instead of DO_JUMP(): - JUMP_POSS_REPEAT_1 - JUMP_POSS_REPEAT_2 - JUMP_ATOMIC_GROUP
* Add more tests for group names and refs in RE (GH-91695)Serhiy Storchaka2022-04-191-15/+41
|
* gh-91575: Update case-insensitive matching in re to the latest Unicode ↵Serhiy Storchaka2022-04-181-6/+49
| | | | version (GH-91580)
* bpo-47211: Remove function re.template() and flag re.TEMPLATE (GH-32300)Serhiy Storchaka2022-04-061-3/+3
| | | They were undocumented and never working.
* bpo-23689: re module, fix memory leak when a match is terminated by a signal ↵Ma Lin2022-04-031-2/+26
| | | | or memory allocation failure (GH-32283)
* bpo-47152: Convert the re module into a package (GH-32177)Serhiy Storchaka2022-04-021-5/+31
| | | The sre_* modules are now deprecated.
* bpo-35859: Fix a few long-standing bugs in re engine (GH-12427)Ma Lin2022-03-291-0/+69
| | | | | | | | In rare cases, capturing group could get wrong result. Regular expression engines in Perl and Java have similar bugs. The new behavior now matches the behavior of more modern RE engines: in the regex module and in PHP, Ruby and Node.js.
* bpo-42885: Optimize search for regular expressions starting with "\A" or "^" ↵Serhiy Storchaka2022-03-221-0/+15
| | | | | | | (GH-32021) Affected functions are re.search(), re.split(), re.findall(), re.finditer() and re.sub().
* bpo-47081: Replace "qualifiers" with "quantifiers" in the re module ↵Serhiy Storchaka2022-03-221-5/+5
| | | | | documentation (GH-32028) It is a more commonly used term.
* bpo-433030: Add support of atomic grouping in regular expressions (GH-31982)Serhiy Storchaka2022-03-211-58/+236
| | | | | | | | * Atomic grouping: (?>...). * Possessive quantifiers: x++, x*+, x?+, x{m,n}+. Equivalent to (?>x+), (?>x*), (?>x?), (?>x{m,n}). Co-authored-by: Jeffrey C. Jacobs <timehorse@users.sourceforge.net>
* bpo-47066: Convert a warning about flags not at the start of the regular ↵Serhiy Storchaka2022-03-191-56/+12
| | | | expression into error (GH-31994)
* bpo-39394: Improve warning message in the re module (GH-31988)Serhiy Storchaka2022-03-191-3/+6
| | | | A warning about inline flags not at the start of the regular expression now contains the position of the flag.
* bpo-40280: Skip more tests on Emscripten (GH-31947)Christian Heimes2022-03-171-1/+3
| | | | | | - lchmod, lchown are not fully implemented - skip umask tests - cannot fstat unlinked or renamed files yet - ignore musl libc issues that affect Emscripten
* bpo-43988: Use check disallow instantiation helper (GH-26392)Erlend Egeberg Aasland2021-05-271-5/+5
|
* bpo-40736: Improve the error message for re.search() TypeError (GH-23312)Zackery Spytz2021-05-211-0/+6
| | | Include the invalid type in the error message.
* bpo-43916: Apply Py_TPFLAGS_DISALLOW_INSTANTIATION to selected types (GH-25748)Erlend Egeberg Aasland2021-04-301-0/+9
| | | | | | | | | | | | | | | | | | | | | Apply Py_TPFLAGS_DISALLOW_INSTANTIATION to the following types: * _dbm.dbm * _gdbm.gdbm * _multibytecodec.MultibyteCodec * _sre..SRE_Scanner * _thread._localdummy * _thread.lock * _winapi.Overlapped * array.arrayiterator * functools.KeyWrapper * functools._lru_list_elem * pyexpat.xmlparser * re.Match * re.Pattern * unicodedata.UCD * zlib.Compress * zlib.Decompress
* bpo-43908: Make re types immutable (GH-25697)Erlend Egeberg Aasland2021-04-291-0/+12
| | | Co-authored-by: Victor Stinner <vstinner@python.org>
* bpo-38250: [Enum] single-bit flags are canonical (GH-24215)Ethan Furman2021-01-251-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Flag members are now divided by one-bit verses multi-bit, with multi-bit being treated as aliases. Iterating over a flag only returns the contained single-bit flags. Iterating, repr(), and str() show members in definition order. When constructing combined-member flags, any extra integer values are either discarded (CONFORM), turned into ints (EJECT) or treated as errors (STRICT). Flag classes can specify which of those three behaviors is desired: >>> class Test(Flag, boundary=CONFORM): ... ONE = 1 ... TWO = 2 ... >>> Test(5) <Test.ONE: 1> Besides the three above behaviors, there is also KEEP, which should not be used unless necessary -- for example, _convert_ specifies KEEP as there are flag sets in the stdlib that are incomplete and/or inconsistent (e.g. ssl.Options). KEEP will, as the name suggests, keep all bits; however, iterating over a flag with extra bits will only return the canonical flags contained, not the extra bits. Iteration is now in member definition order. If member definition order matches increasing value order, then a more efficient method of flag decomposition is used; otherwise, sort() is called on the results of that method to get definition order. ``re`` module: repr() has been modified to support as closely as possible its previous output; the big difference is that inverted flags cannot be output as before because the inversion operation now always returns the comparable positive result; i.e. re.A|re.I|re.M|re.S is ~(re.L|re.U|re.S|re.T|re.DEBUG) in both of the above terms, the ``value`` is 282. re's tests have been updated to reflect the modifications to repr().
* bpo-1635741: Convert _sre types to heap types and establish module state ↵Erlend Egeberg Aasland2020-11-201-0/+4
| | | | (PEP 384) (GH-23393)
* bpo-40443: Remove unused imports in tests (GH-19805)Victor Stinner2020-04-291-1/+1
|
* bpo-36548: Improve the repr of re flags. (GH-12715)Serhiy Storchaka2019-05-311-0/+12
|
* bpo-36929: Modify io/re tests to allow for missing mod name (#13392)Max Bernstein2019-05-211-12/+16
| | | | | | | | | | | | | | | | | | * bpo-36929: Modify io/re tests to allow for missing mod name For a vanishingly small number of internal types, CPython sets the tp_name slot to mod_name.type_name, either in the PyTypeObject or the PyType_Spec. There are a few minor places where this surfaces: * Custom repr functions for those types (some of which ignore the tp_name in favor of using a string literal, such as _io.TextIOWrapper) * Pickling error messages The test suite only tests the former. This commit modifies the test suite to allow Python implementations to omit the module prefix. https://bugs.python.org/issue36929
* bpo-29571: Fix test_re.test_locale_flag() (GH-12099)Victor Stinner2019-02-281-2/+1
| | | | | | | | | Use locale.getpreferredencoding() rather than locale.getlocale() to get the locale encoding. With some locales, locale.getlocale() returns the wrong encoding. For example, on Fedora 29, locale.getlocale() returns ISO-8859-1 encoding for the "en_IN" locale, whereas locale.getpreferredencoding() reports the correct encoding: UTF-8.
* bpo-34294: re module, fix wrong capturing groups in rare cases. (GH-11546)animalize2019-02-181-0/+34
| | | | | | Need to reset capturing groups between two SRE(match) callings in loops, this fixes wrong capturing groups in rare cases. Also add a missing index in re.rst.
* bpo-30688: Support \N{name} escapes in re patterns. (GH-5588)Serhiy Storchaka2018-02-091-0/+36
| | | Co-authored-by: Jonathan Eunice <jonathan.eunice@gmail.com>
* bpo-32308: Replace empty matches adjacent to a previous non-empty match in ↵Serhiy Storchaka2018-01-041-14/+9
| | | | re.sub(). (#4846)
* Fix improper use of re.escape() in tests. (#4814)Serhiy Storchaka2017-12-121-1/+1
|
* bpo-25054, bpo-1647489: Added support of splitting on zerowidth patterns. ↵Serhiy Storchaka2017-12-041-13/+31
| | | | | | (#4471) Also fixed searching patterns that could match an empty string.
* bpo-30349: Raise FutureWarning for nested sets and set operations (#1553)Serhiy Storchaka2017-11-161-1/+46
| | | | in regular expressions.
* bpo-31690: Allow the inline flags "a", "L", and "u" to be used as group ↵Serhiy Storchaka2017-10-241-8/+14
| | | | flags for RE. (#3885)
* bpo-30397: Add re.Pattern and re.Match. (#1646)Serhiy Storchaka2017-10-041-2/+2
|
* bpo-30978: str.format_map() now passes key lookup exceptions through. (#2790)Serhiy Storchaka2017-08-031-1/+1
| | | Previously any exception was replaced with a KeyError exception.