summaryrefslogtreecommitdiffstats
path: root/Modules/_sre.c
Commit message (Collapse)AuthorAgeFilesLines
* - Issue #3629: Fix sre "bytecode" validator for an end case.Guido van Rossum2008-09-101-3/+4
| | | | Reviewed by Amaury.
* Tracker issue 3487: sre "bytecode" verifier.Guido van Rossum2008-08-051-0/+474
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a verifier for the binary code used by the _sre module (this is often called bytecode, though to distinguish it from Python bytecode I put it in quotes). I wrote this for Google App Engine, and am making the patch available as open source under the Apache 2 license. Below are the copyright statement and license, for completeness. # Copyright 2008 Google Inc. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. It's not necessary to include these copyrights and bytecode in the source file. Google has signed a contributor's agreement with the PSF already.
* This reverts r63675 based on the discussion in this thread:Gregory P. Smith2008-06-091-3/+3
| | | | | | | http://mail.python.org/pipermail/python-dev/2008-June/079988.html Python 2.6 should stick with PyString_* in its codebase. The PyBytes_* names in the spirit of 3.0 are available via a #define only. See the email thread.
* Renamed PyString to PyBytesChristian Heimes2008-05-261-3/+3
|
* Silence Coverity false alerts with CIDs #172, #183, #184Christian Heimes2008-01-181-1/+2
|
* Issue 846388. Adds a call to PyErr_CheckSignals toFacundo Batista2008-01-081-0/+8
| | | | | | | | | SRE_MATCH so that signal handlers can be invoked during long regular expression matches. It also adds a new error return value indicating that an exception occurred in a signal handler during the match, allowing exceptions in the signal handler to propagate up to the main loop. Thanks Josh Hoyt and Ralf Schmitt.
* #1629: Renamed Py_Size, Py_Type and Py_Refcnt to Py_SIZE, Py_TYPE and ↵Christian Heimes2007-12-191-1/+1
| | | | Py_REFCNT. Macros for b/w compatibility are available.
* Patch # 1140 (my code, approved by Effbot).Guido van Rossum2007-09-101-17/+8
| | | | | | | | | Make sure the type of the return value of re.sub(x, y, z) is the type of y+x (i.e. unicode if either is unicode, str if they are both str) even if there are no substitutions or if x==z (which triggered various special cases in join_list()). Could be backported to 2.5; no need to port to 3.0.
* PEP 3123: Provide forward compatibility with Python 3.0, while keepingMartin v. Löwis2007-07-211-1/+1
| | | | | backwards compatibility. Add Py_Refcnt, Py_Type, Py_Size, and PyVarObject_HEAD_INIT.
* Cause a PyObject_Malloc() failure to trigger a MemoryError, and thenAndrew M. Kuchling2006-10-041-2/+21
| | | | | | | add 'if (PyErr_Occurred())' checks to various places so that NULL is returned properly. 2.4 backport candidate.
* Try to handle a malloc failure. I'm not entirely sure this is correct.Neal Norwitz2006-08-121-0/+3
| | | | | | There might be something else we need to do to handle the exception. Klocwork # 212-213
* Impl ssize_tNeal Norwitz2006-06-121-93/+98
|
* Make use of METH_O and METH_NOARGS where possible.Georg Brandl2006-05-291-3/+3
| | | | Use Py_UnpackTuple instead of PyArg_ParseTuple where possible.
* METH_NOARGS functions do get called with two args.Georg Brandl2006-05-281-5/+5
|
* Fix C function calling conventions in _sre module.Georg Brandl2006-05-281-38/+18
|
* needforspeed: use PyObject_MALLOC instead of system malloc for smallJack Diederich2006-05-271-4/+4
| | | | allocations. Use PyMem_MALLOC for larger (1k+) chunks. 1%-2% speedup.
* C++ compiler cleanup: proper castsSkip Montanaro2006-04-181-2/+2
|
* Move constructors, add some casts to make C++ compiler happy. Still a problemAnthony Baxter2006-04-121-202/+201
| | | | with the getstring() results in pattern_subx. Will come back to that.
* Rename sre.py -> re.pyNeal Norwitz2006-03-161-2/+4
|
* Thanks to Coverity, these were all reported by their Prevent tool.Neal Norwitz2006-03-071-1/+1
| | | | | All of these (except _lsprof.c) should be backported. Particularly the hotshot change which validates sys.path. Can someone backport?
* Revert backwards-incompatible const changes.Martin v. Löwis2006-02-271-8/+8
|
* _compile(): raise an exception if downcasting to SRE_CODETim Peters2006-01-211-37/+40
| | | | | | | | | | | | | | | | | | loses information: OverflowError: regular expression code size limit exceeded Otherwise the compiled code is gibberish, possibly leading at least to wrong results or (as reported on c.l.py) internal sre errors at match time. I'm not sure how to test this. SRE_CODE is a 2-byte type on my box, and it's easy to create a regexp that causes the new exception to trigger here. But it may be a 4-byte type on other boxes, and creating a regexp large enough to trigger problems there would be pretty crazy. Bugfix candidate.
* Check return result from Py_InitModule*(). This API can fail.Neal Norwitz2006-01-191-0/+2
| | | | Probably should be backported.
* Add const to several API functions that take char *.Jeremy Hylton2005-12-101-8/+8
| | | | | | | | | | | | | | | | | | | In C++, it's an error to pass a string literal to a char* function without a const_cast(). Rather than require every C++ extension module to put a cast around string literals, fix the API to state the const-ness. I focused on parts of the API where people usually pass literals: PyArg_ParseTuple() and friends, Py_BuildValue(), PyMethodDef, the type slots, etc. Predictably, there were a large set of functions that needed to be fixed as a result of these changes. The most pervasive change was to make the keyword args list passed to PyArg_ParseTupleAndKewords() to be a const char *kwlist[]. One cast was required as a result of the changes: A type object mallocs the memory for its tp_doc slot and later frees it. PyTypeObject says that tp_doc is const char *; but if the type was created by type_new(), we know it is safe to cast to char *.
* Fixing bug #1072259 in SRE.Gustavo Niemeyer2004-12-021-7/+10
|
* Add docstrings for regular expression objects and methods.Raymond Hettinger2004-09-241-8/+51
|
* Fixing bug #817234, which made SRE get into an infinite loop onGustavo Niemeyer2004-09-031-5/+3
| | | | | empty final matches with finditer(). New test cases included for this bug and for #581080.
* Moved SunPro warning suppression into pyport.h and out of individualNicholas Bastin2004-07-151-4/+0
| | | | modules and objects.
* Fixed end-of-loop code not reached warning when using SunPro CNicholas Bastin2004-06-171-0/+4
|
* Add weakref support to sockets and re pattern objects.Raymond Hettinger2004-05-311-1/+24
|
* - Fixing annoying warnings.Gustavo Niemeyer2004-02-141-7/+10
|
* Cleaning up recursive pieces left in the reorganization.Gustavo Niemeyer2003-12-131-119/+16
|
* Removing dead code.Gustavo Niemeyer2003-10-181-11/+0
|
* Implemented non-recursive SRE matching.Gustavo Niemeyer2003-10-171-439/+741
|
* Simplify and speedup uses of Py_BuildValue():Raymond Hettinger2003-10-121-5/+5
| | | | | | * Py_BuildValue("(OOO)",a,b,c) --> PyTuple_Pack(3,a,b,c) * Py_BuildValue("()",a) --> PyTuple_New(0) * Py_BuildValue("O", a) --> Py_INCREF(a)
* Fixing bug described in patch #756032, where SRE reads invalid dataGustavo Niemeyer2003-06-261-1/+1
| | | | due to a corrupted end pointer.
* Changes to sre.c after the application of patch #726869 have increasedAndrew MacIntyre2003-06-091-5/+11
| | | | | | | | | | | | | | | | | | stack usage on FreeBSD, requiring the recursion limit to be lowered further. Building with gcc 2.95 (the standard compiler on FreeBSD 4.x) is now also affected. The underlying issue is that FreeBSD's pthreads implementation has a hard-coded 1MB stack size for the initial (or "primary") thread, which can not be changed without rebuilding libc_r. Exhausting this stack results in a bus error. Building without pthreads (configure --without-threads), or linking with the port of the Linux pthreads library (aka Linuxthreads) instead of libc_r, avoids this limitation. On OS/2, only gcc 3.2 is affected and the stack size is controllable, so the special handling has been removed.
* Allow _sre.c to compile with Python 2.2Andrew M. Kuchling2003-04-301-0/+4
|
* - Included detailed documentation in _sre.c explaining how, when, and whyGustavo Niemeyer2003-04-271-17/+41
| | | | | | | | | | | | to use LASTMARK_SAVE()/LASTMARK_RESTORE(), based on the discussion in patch #712900. - Cleaned up LASTMARK_SAVE()/LASTMARK_RESTORE() usage, based on the established rules. - Moved the upper part of the just commited patch (relative to bug #725106) to outside the for() loop of BRANCH OP. There's no need to mark_save() in every loop iteration.
* Fix for part of the problem mentioned in #725149 by Greg Chapman.Gustavo Niemeyer2003-04-271-8/+10
| | | | | | | | | | | | | | | | | | | This problem is related to a wrong behavior from mark_save/restore(), which don't restore the mark_stack_base before restoring the marks. Greg's suggestion was to change the asserts, which happen to be the only recursive ops that can continue the loop, but the problem would happen to any operation with the same behavior. So, rather than hardcoding this into asserts, I have changed mark_save/restore() to always restore the stackbase before restoring the marks. Both solutions should fix these two cases, presented by Greg: >>> re.match('(a)(?:(?=(b)*)c)*', 'abb').groups() ('b', None) >>> re.match('(a)((?!(b)*))*', 'abb').groups() ('b', None, None) The rest of the bug and patch in #725149 must be discussed further.
* Applied patch #725106, by Greg Chapman, fixing capturing groupsGustavo Niemeyer2003-04-271-0/+10
| | | | | | | | | | | | | | | | | | | | within repeats of alternatives. The only change to the original patch was to convert the tests to the new test_re.py file. This patch fixes cases like: >>> re.match('((a)|b)*', 'abc').groups() ('b', '') Which is wrong (it's impossible to match the empty string), and incompatible with other regex systems, like the following examples show: % perl -e '"abc" =~ /^((a)|b)*/; print "$1 $2\n";' b a % echo "abc" | sed -r -e "s/^((a)|b)*/\1 \2|/" b a|c
* Applying patch #726869 by Andrew I MacIntyre, reducing in _sre.c theGustavo Niemeyer2003-04-271-0/+9
| | | | recursion limit for certain setups of FreeBSD and OS/2.
* Made MAX_UNTIL/MIN_UNTIL code more coherent about mark protection,Gustavo Niemeyer2003-04-221-4/+6
| | | | accordingly to further discussions with Greg Chapman in patch #712900.
* More work on bug #672491 and patch #712900.Gustavo Niemeyer2003-04-201-23/+38
| | | | | | | | | | | | | | | | I've applied a modified version of Greg Chapman's patch. I've included the fixes without introducing the reorganization mentioned, for the sake of stability. Also, the second fix mentioned in the patch don't fix the mentioned problem anymore, because of the change introduced by patch #720991 (by Greg as well). The new fix wasn't complicated though, and is included as well. As a note. It seems that there are other places that require the "protection" of LASTMARK_SAVE()/LASTMARK_RESTORE(), and are just waiting for someone to find how to break them. Particularly, I belive that every recursion of SRE_MATCH() should be protected by these macros. I won't do that right now since I'm not completely sure about this, and we don't have much time for testing until the next release.
* - Fixed bug #672491. This change restores the behavior of lastindex/lastgroupGustavo Niemeyer2003-04-201-5/+4
| | | | | | to be compliant with previous python versions, by backing out the changes made in revision 2.84 which affected this. The bugfix for backtracking is still maintained.
* Fully support 32-bit codes. Enable BIGCHARSET in UCS-4 builds.Martin v. Löwis2003-04-191-10/+42
|
* SF patch #720991 by Gary Herron:Guido van Rossum2003-04-141-0/+60
| | | | | | | A small fix for bug #545855 and Greg Chapman's addition of op code SRE_OP_MIN_REPEAT_ONE for eliminating recursion on simple uses of pattern '*?' on a long string.
* fix for SF #635398 (don't "downcast" return strings from unicode to ascii)Fredrik Lundh2002-11-221-21/+4
|
* Make private functions static so we don't pollute the namespaceNeal Norwitz2002-11-101-1/+2
|
* Fixed sre bug "[#581080] Provoking infinite scanner loops".Gustavo Niemeyer2002-11-071-4/+6
| | | | | | | | | | | | | | | | This bug happened because: 1) the scanner_search and scanner_match methods were not checking the buffer limits before increasing the current pointer; and 2) SRE_SEARCH was using "if (ptr == end)" as a loop break, instead of "if (ptr >= end)". * Modules/_sre.c (SRE_SEARCH): Check for "ptr >= end" to break loops, so that we don't hang forever if a pointer passing the buffer limit is used. (scanner_search,scanner_match): Don't increment the current pointer if we're going to pass the buffer limit. * Misc/NEWS Mention the fix.