| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
| |
Possessive Quantifiers (GH-91681)
These jumps should use DO_JUMP0() instead of DO_JUMP():
- JUMP_POSS_REPEAT_1
- JUMP_POSS_REPEAT_2
- JUMP_ATOMIC_GROUP
|
| |
|
|
|
|
| |
version (GH-91580)
|
|
|
| |
They were undocumented and never working.
|
|
|
|
| |
or memory allocation failure (GH-32283)
|
|
|
| |
The sre_* modules are now deprecated.
|
|
|
|
|
|
|
|
| |
In rare cases, capturing group could get wrong result.
Regular expression engines in Perl and Java have similar bugs.
The new behavior now matches the behavior of more modern
RE engines: in the regex module and in PHP, Ruby and Node.js.
|
|
|
|
|
|
|
| |
(GH-32021)
Affected functions are re.search(), re.split(), re.findall(), re.finditer()
and re.sub().
|
|
|
|
|
| |
documentation (GH-32028)
It is a more commonly used term.
|
|
|
|
|
|
|
|
| |
* Atomic grouping: (?>...).
* Possessive quantifiers: x++, x*+, x?+, x{m,n}+.
Equivalent to (?>x+), (?>x*), (?>x?), (?>x{m,n}).
Co-authored-by: Jeffrey C. Jacobs <timehorse@users.sourceforge.net>
|
|
|
|
| |
expression into error (GH-31994)
|
|
|
|
| |
A warning about inline flags not at the start of the regular
expression now contains the position of the flag.
|
|
|
|
|
|
| |
- lchmod, lchown are not fully implemented
- skip umask tests
- cannot fstat unlinked or renamed files yet
- ignore musl libc issues that affect Emscripten
|
| |
|
|
|
| |
Include the invalid type in the error message.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Apply Py_TPFLAGS_DISALLOW_INSTANTIATION to the following types:
* _dbm.dbm
* _gdbm.gdbm
* _multibytecodec.MultibyteCodec
* _sre..SRE_Scanner
* _thread._localdummy
* _thread.lock
* _winapi.Overlapped
* array.arrayiterator
* functools.KeyWrapper
* functools._lru_list_elem
* pyexpat.xmlparser
* re.Match
* re.Pattern
* unicodedata.UCD
* zlib.Compress
* zlib.Decompress
|
|
|
| |
Co-authored-by: Victor Stinner <vstinner@python.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Flag members are now divided by one-bit verses multi-bit, with multi-bit being treated as aliases. Iterating over a flag only returns the contained single-bit flags.
Iterating, repr(), and str() show members in definition order.
When constructing combined-member flags, any extra integer values are either discarded (CONFORM), turned into ints (EJECT) or treated as errors (STRICT). Flag classes can specify which of those three behaviors is desired:
>>> class Test(Flag, boundary=CONFORM):
... ONE = 1
... TWO = 2
...
>>> Test(5)
<Test.ONE: 1>
Besides the three above behaviors, there is also KEEP, which should not be used unless necessary -- for example, _convert_ specifies KEEP as there are flag sets in the stdlib that are incomplete and/or inconsistent (e.g. ssl.Options). KEEP will, as the name suggests, keep all bits; however, iterating over a flag with extra bits will only return the canonical flags contained, not the extra bits.
Iteration is now in member definition order. If member definition order
matches increasing value order, then a more efficient method of flag
decomposition is used; otherwise, sort() is called on the results of
that method to get definition order.
``re`` module:
repr() has been modified to support as closely as possible its previous
output; the big difference is that inverted flags cannot be output as
before because the inversion operation now always returns the comparable
positive result; i.e.
re.A|re.I|re.M|re.S is ~(re.L|re.U|re.S|re.T|re.DEBUG)
in both of the above terms, the ``value`` is 282.
re's tests have been updated to reflect the modifications to repr().
|
|
|
|
| |
(PEP 384) (GH-23393)
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* bpo-36929: Modify io/re tests to allow for missing mod name
For a vanishingly small number of internal types, CPython sets the
tp_name slot to mod_name.type_name, either in the PyTypeObject or the
PyType_Spec. There are a few minor places where this surfaces:
* Custom repr functions for those types (some of which ignore the
tp_name in favor of using a string literal, such as _io.TextIOWrapper)
* Pickling error messages
The test suite only tests the former. This commit modifies the test
suite to allow Python implementations to omit the module prefix.
https://bugs.python.org/issue36929
|
|
|
|
|
|
|
|
|
| |
Use locale.getpreferredencoding() rather than locale.getlocale() to
get the locale encoding. With some locales, locale.getlocale()
returns the wrong encoding.
For example, on Fedora 29, locale.getlocale() returns ISO-8859-1
encoding for the "en_IN" locale, whereas
locale.getpreferredencoding() reports the correct encoding: UTF-8.
|
|
|
|
|
|
| |
Need to reset capturing groups between two SRE(match) callings in loops, this fixes wrong capturing groups in rare cases.
Also add a missing index in re.rst.
|
|
|
| |
Co-authored-by: Jonathan Eunice <jonathan.eunice@gmail.com>
|
|
|
|
| |
re.sub(). (#4846)
|
| |
|
|
|
|
|
|
| |
(#4471)
Also fixed searching patterns that could match an empty string.
|
|
|
|
| |
in regular expressions.
|
|
|
|
| |
flags for RE. (#3885)
|
| |
|
|
|
| |
Previously any exception was replaced with a KeyError exception.
|
|
|
|
| |
Running our unit tests with `-bb` enabled triggered this failure.
|
|
|
|
|
|
| |
Warnings emitted when compile a regular expression now always point
to the line in the user code. Previously they could point into inners
of the re module if emitted from inside of groups or conditionals.
|
|
|
|
|
| |
`re.compile(..., re.DEBUG)` now displays the compiled bytecode in
human readable form.
|
|
|
|
| |
This increased the performance of matching some patterns up to 25 times.
|
|
|
|
|
|
|
|
| |
modifiers. (#1490)
Now allowed several subsequential inline modifiers at the start of the
pattern (e.g. '(?i)(?s)...'). In verbose mode whitespaces and comments
now are allowed before and between inline modifiers (e.g.
'(?x) (?i) (?s)...').
|
|
|
|
| |
of regular expressions.
|
|
|
|
| |
_sre.unicode_tolower(). (#1468)
|
|
|
|
|
|
| |
Compiled regular expression objects with the re.LOCALE flag no longer
depend on the locale at compile time. Only the locale at matching
time affects the result of matching.
|
|
|
|
| |
(#1000)
|
| |
|
|
|
| |
Make also minor PEP8 coding style fixes on modified imports.
|
|
|
| |
This reverts commit ace5c0fdd9b962e6e886c29dbcea72c53f051dc4.
|
|
|
| |
This reverts commit 43f5df5bfaea5a07c913d12cb92f78f997feb371.
|
| |
|
|
|
|
|
|
|
| |
``local.getlocale(locale.LC_CTYPE)`` and
``locale.getpreferredencoding(False)`` may give different answers
in some cases (such as the ``en_IN`` locale).
``re.LOCALE`` uses the latter, so update the test case to match.
|
|\
| |
| |
| | |
the match object. Based on patch by WGH.
|
| |\
| | |
| | |
| | | |
the match object. Based on patch by WGH.
|
| | |
| | |
| | |
| | | |
the match object. Based on patch by WGH.
|