summaryrefslogtreecommitdiffstats
path: root/Objects/stringobject.c
Commit message (Collapse)AuthorAgeFilesLines
* backport tim_one's patch:Anthony Baxter2002-04-301-3/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [Re-did unicodeobject.c - it's changed a lot since 2.1 :) Pretty confident that it's correct] Repair widespread misuse of _PyString_Resize. Since it's clear people don't understand how this function works, also beefed up the docs. The most common usage error is of this form (often spread out across gotos): if (_PyString_Resize(&s, n) < 0) { Py_DECREF(s); s = NULL; goto outtahere; } The error is that if _PyString_Resize runs out of memory, it automatically decrefs the input string object s (which also deallocates it, since its refcount must be 1 upon entry), and sets s to NULL. So if the "if" branch ever triggers, it's an error to call Py_DECREF(s): s is already NULL! A correct way to write the above is the simpler (and intended) if (_PyString_Resize(&s, n) < 0) goto outtahere; Bugfix candidate. Original patch(es): python/dist/src/Objects/fileobject.c:2.161 python/dist/src/Objects/stringobject.c:2.161 python/dist/src/Objects/unicodeobject.c:2.147
* Net result of Tim's checkins to stropmodule.c (2.78, 2.79, 2.80, 2.81),Thomas Wouters2001-05-231-28/+35
| | | | | | | | stringobject.c (2.114, 2.115) and test_strop.py (1.11, 1.12). Fixes 'replace' behaviour on systems on which 'malloc(0)' returns NULL (together with previous checkins) and re-synchs the string-operation code in stringobject.c and stropmodule.c, with the exception of 'replace', which has the old semantics in stropmodule but the new semantics in stringobjects.
* Backport MAL's checkin 2.105:Thomas Wouters2001-05-231-2/+3
| | | | Fix for bug #417030: "print '%*s' fails for unicode string"
* Net result of Guido's checkins of object.c (2.125 and 2.126), classobject.cThomas Wouters2001-05-231-1/+8
| | | | | | | (2.128) and stringobject.c (2.105), which reworks PyObject_Str() and PyObject_Repr() so strings and instances aren't special-cased, and print >> file, instance works like expected in all cases.
* Backport Tim's checkin 2.104:Thomas Wouters2001-05-231-2/+6
| | | | | | | | | | | | A different approach to the problem reported in Patch #419651: Metrowerks on Mac adds 0x itself C std says %#x and %#X conversion of 0 do not add the 0x/0X base marker. Metrowerks apparently does. Mark Favas reported the same bug under a Compaq compiler on Tru64 Unix, but no other libc broken in this respect is known (known to be OK under MSVC and gcc). So just try the damn thing at runtime and see what the platform does. Note that we've always had bugs here, but never knew it before because a relevant test case didn't exist before 2.1.
* Bug 415514 reported that e.g.Tim Peters2001-04-121-25/+24
| | | | | | | | | | | | "%#x" % 0 blew up, at heart because C sprintf supplies a base marker if and only if the value is not 0. I then fixed that, by tolerating C's inconsistency when it does %#x, and taking away that *Python* produced 0x0 when formatting 0L (the "long" flavor of 0) under %#x itself. But after talking with Guido, we agreed it would be better to supply 0x for the short int case too, despite that it's inconsistent with C, because C is inconsistent with itself and with Python's hex(0) (plus, while "%#x" % 0 didn't work before, "%#x" % 0L *did*, and returned "0x0"). Similarly for %#X conversion.
* Fix for SF bug #415514: "%#x" % 0 caused assertion failure/abort.Tim Peters2001-04-121-14/+25
| | | | | | | | | | | | | http://sourceforge.net/tracker/index.php?func=detail&aid=415514&group_id=5470&atid=105470 For short ints, Python defers to the platform C library to figure out what %#x should do. The code asserted that the platform C returned a string beginning with "0x". However, that's not true when-- and only when --the *value* being formatted is 0. Changed the code to live with C's inconsistency here. In the meantime, the problem does not arise if you format a long 0 (0L) instead. However, that's because the code *we* wrote to do %#x conversions on longs produces a leading "0x" regardless of value. That's probably wrong too: we should drop leading "0x", for consistency with C, when (& only when) formatting 0L. So I changed the long formatting code to do that too.
* _Py_ReleaseInternedStrings(): Private API function to decref andBarry Warsaw2001-02-231-0/+10
| | | | | | release the interned string dictionary. This is useful for memory use debugging because it eliminates a huge source of noise from the reports. Only defined when INTERN_STRINGS is defined.
* Show '\011', '\012', and '\015' as '\t', '\n', '\r' in strings.Ka-Ping Yee2001-01-241-6/+17
| | | | Switch from octal escapes to hex escapes for other nonprintable characters.
* Derivative of patch #102549, "simpler, faster(!) implementation of string.join".Tim Peters2001-01-191-38/+52
| | | | | | | | Also fixes two long-standing bugs (present in 2.0): 1. .join() didn't check that the result size fit in an int. 2. string.join(s) when len(s)==1 returned s[0] regardless of s[0]'s type; e.g., "".join([3]) returned 3 (overly optimistic optimization). I resisted a keen temptation to make .join() apply str() automagically.
* Added checks to prevent PyUnicode_Count() from dumping coreMarc-André Lemburg2001-01-161-11/+26
| | | | | | | | | | | | in case the parameters are out of bounds and fixes error handling for .count(), .startswith() and .endswith() for the case of mixed string/Unicode objects. This patch adds Python style index semantics to PyUnicode_Count() indices (including the special handling of negative indices). The patch is an extended version of patch #103249 submitted by Michael Hudson (mwh) on SF. It also includes new test cases.
* [ Patch #102852 ] Make % error a bit more informative by indicates theAndrew M. Kuchling2000-12-151-2/+3
| | | | index at which an unknown %-escape was found
* Jeffrey D. Collins <tokeneater@users.sourceforge.net>:Fred Drake2000-12-061-3/+3
| | | | | | Fix type of the self parameter to some string object methods. This closes patch #102670.
* Fox for SF bug #123859: %[duxXo] long formats inconsistent.Tim Peters2000-11-301-4/+1
|
* SF patch #102548, fix for bug #121013, by mwh@users.sourceforge.net.Guido van Rossum2000-11-271-1/+1
| | | | Fixes a typo that caused "".join(u"this is a test") to dump core.
* Ka-Ping Yee <ping@lfw.org>:Fred Drake2000-10-241-2/+2
| | | | | | Changes to error messages to increase consistency & clarity. This (mostly) closes SourceForge patch #101839.
* [ Bug #116174 ] using %% in cstrings sometimes fails with unicode paramsFix ↵Marc-André Lemburg2000-10-071-11/+17
| | | | | | | for the bug reported in Bug #116174: "%% %s" % u"abc" failed due to the way string formatting delegated work to the Unicode formatting function.
* Rationalize use of limits.h, moving the inclusion to Python.h.Fred Drake2000-09-261-5/+1
| | | | | | | | Add definitions of INT_MAX and LONG_MAX to pyport.h. Remove includes of limits.h and conditional definitions of INT_MAX and LONG_MAX elsewhere. This closes SourceForge patch #101659 and bug #115323.
* Derived from Martin's SF patch 110609: support unbounded ints in ↵Tim Peters2000-09-211-28/+203
| | | | | | | | | | | | | | | | %d,i,u,x,X,o formats. Note a curious extension to the std C rules: x, X and o formatting can never produce a sign character in C, so the '+' and ' ' flags are meaningless for them. But unbounded ints *can* produce a sign character under these conversions (no fixed- width bitstring is wide enough to hold all negative values in 2's-comp form). So these flags become meaningful in Python when formatting a Python long which is too big to fit in a C long. This required shuffling around existing code, which hacked x and X conversions to death when both the '#' and '0' flags were specified: the hacks weren't strong enough to deal with the simultaneous possibility of the ' ' or '+' flags too, since signs were always meaningless before for x and X conversions. Isomorphic shuffling was required in unicodeobject.c. Also added dozens of non-trivial new unbounded-int test cases to test_format.py.
* This patch adds a new Python C API called PyString_AsStringAndSize()Marc-André Lemburg2000-09-191-7/+63
| | | | | | | | | | | | | which implements the automatic conversion from Unicode to a string object using the default encoding. The new API is then put to use to have eval() and exec accept Unicode objects as code parameter. This closes bugs #110924 and #113890. As side-effect, the traditional C APIs PyString_Size() and PyString_AsString() will also accept Unicode objects as parameters.
* Fix for bug 113934. string*n and unicode*n did no overflow checking atTim Peters2000-09-091-2/+17
| | | | | | | all, either to see whether the # of chars fit in an int, or that the amount of memory needed fit in a size_t. Checking these is expensive, but the alternative is silently wrong answers (as in the bug report) or core dumps (which were easy to provoke using Unicode strings).
* REMOVED all CWI, CNRI and BeOpen copyright markings.Guido van Rossum2000-09-011-9/+0
| | | | This should match the situation in the 1.6b1 tree.
* Insure properly identifies the `interned' dictionary as leaking atBarry Warsaw2000-08-161-0/+12
| | | | | | shutdown time, but CVS log entry for revision 2.45 explains why this is so. Simply include a comment so we don't have to re-figure it out again 5 years from now.
* merge Include/my*.h into Include/pyport.hPeter Schneider-Kamp2000-07-311-1/+0
| | | | marked my*.h as obsolete
* Spelling fixes supplied by Rob W. W. Hooft. All these are fixes in eitherThomas Wouters2000-07-161-5/+5
| | | | | | | | | | comments, docstrings or error messages. I fixed two minor things in test_winreg.py ("didn't" -> "Didn't" and "Didnt" -> "Didn't"). There is a minor style issue involved: Guido seems to have preferred English grammar (behaviour, honour) in a couple places. This patch changes that to American, which is the more prominent style in the source. I prefer English myself, so if English is preferred, I'd be happy to supply a patch myself ;)
* replace PyXXX_Length calls with PyXXX_Size callsJeremy Hylton2000-07-121-1/+1
|
* Fix typo in error messageAndrew M. Kuchling2000-07-121-1/+1
|
* small updates to string_join:Jeremy Hylton2000-07-111-6/+9
| | | | | | | use PyString_AS_STRING macro on local string object when resizing string, make sure resized string will always be big enough split string containing error message across two lines add test to string_tests that causes resizing
* string_join(): Some cleaning up of reference counting. In theBarry Warsaw2000-07-111-7/+10
| | | | | | | | seqlen==1 clause, before returning item, we need to DECREF seq. In the res=PyString... failure clause, we need to goto finally to also decref seq (and the DECREF of res in finally is changed to a XDECREF). Also, we need to DECREF seq just before the PyUnicode_Join() return.
* fix two refcount bugs in new string_join implementation:Jeremy Hylton2000-07-111-6/+2
| | | | | 1. PySequence_Fast_GET_ITEM is a macro and borrows a reference 2. The seq returned from PySequence_Fast must be decref'd
* two changes to string_join:Jeremy Hylton2000-07-101-82/+42
| | | | | | implementation -- use PySequence_Fast interface to iterate over elements interface -- if instance object reports wrong length, ignore it; previous version raised an IndexError if reported length was too high
* Somebody started playing with const, so of course the outcomeTim Peters2000-07-091-8/+8
| | | | | | | | | | | | | was cascades of warnings about mismatching const decls. Overall, I think const creates lots of headaches and solves almost nothing. Added enough consts to shut up the warnings, but this did require casting away const in one spot too (another usual outcome of starting down this path): the function mymemreplace can't return const char*, but sometimes wants to return its first argument as-is, which latter must be declared const char* in order to avoid const warnings at mymemreplace's call sites. So, in the case the function wants to return the first arg, that arg's declared constness must be subverted.
* ANSI-fication of the sources.Fred Drake2000-07-091-187/+76
|
* Added new codec APIs and a new interface method .encode() whichMarc-André Lemburg2000-07-061-0/+114
| | | | | | | | | works just like the Unicode one. The C APIs match the ones in the Unicode implementation, but were extended to be able to reuse the existing Unicode codecs for string purposes too. Conversions from string to Unicode and back are done using the default encoding.
* Added new .isalpha() and .isalnum() methods to match the sameMarc-André Lemburg2000-07-051-0/+68
| | | | | ones on the Unicode objects. Note that the string versions use the (locale aware) C lib APIs isalpha() and isalnum().
* Change copyright notice - 2nd try.Guido van Rossum2000-06-301-6/+0
|
* Change copyright notice.Guido van Rossum2000-06-301-22/+7
|
* Marc-Andre Lemburg <mal@lemburg.com>:Marc-André Lemburg2000-06-301-26/+68
| | | | | | New buffer overflow checks for formatting strings. By Trent Mick.
* Fredrik Lundh <effbot@telia.com>:Fred Drake2000-06-201-8/+4
| | | | | Simplify find code; this is a performance improvement on at least some platforms.
* Marc-Andre Lemburg <mal@lemburg.com>:Marc-André Lemburg2000-06-141-0/+20
| | | | Added code so that .isXXX() testing returns 0 for emtpy strings.
* Patch from Michael Hudson: improve unclear error messageAndrew M. Kuchling2000-06-091-1/+1
|
* Michael Hudson <mwh21@cam.ac.uk>:Fred Drake2000-06-011-1/+3
| | | | | Removed PyErr_BadArgument() calls and replaced them with more useful error messages.
* Trent Mick:Guido van Rossum2000-05-081-5/+9
| | | | | | | | | | | | | | | | Fix the string methods that implement slice-like semantics with optional args (count, find, endswith, etc.) to properly handle indeces outside [INT_MIN, INT_MAX]. Previously the "i" formatter for PyArg_ParseTuple was used to get the indices. These could overflow. This patch changes the string methods to use the "O&" formatter with the slice_index() function from ceval.c which is used to do the same job for Python code slices (e.g. 'abcabcabc'[0:1000000000L]). slice_index() is renamed _PyEval_SliceIndex() and is now exported. As well, the return values for success/fail were changed to make slice_index directly usable as required by the "O&" formatter. [GvR: shouldn't a similar patch be applied to unicodeobject.c?]
* The methods islower(), isupper(), isspace(), isdigit() and istitle()Guido van Rossum2000-05-051-11/+11
| | | | | | gave bogus results for chars in the range 128-255, because their implementation was using signed characters. Fixed this by using unsigned character pointers (as opposed to using Py_CHARMASK()).
* Vladimir Marangozov's long-awaited malloc restructuring.Guido van Rossum2000-05-031-21/+19
| | | | | | | | | | For more comments, read the patches@python.org archives. For documentation read the comments in mymalloc.h and objimpl.h. (This is not exactly what Vladimir posted to the patches list; I've made a few changes, and Vladimir sent me a fix in private email for a problem that only occurs in debug mode. I'm also holding back on his change to main.c, which seems unnecessary to me.)
* Marc-Andre Lemburg:Guido van Rossum2000-04-111-11/+14
| | | | | | | | | The maxsplit functionality in .splitlines() was replaced by the keepends functionality which allows keeping the line end markers together with the string. Added support for '%r' % obj: this inserts repr(obj) rather than str(obj).
* Marc-Andre Lemburg:Guido van Rossum2000-04-101-3/+52
| | | | | | | | | | | | | * string_contains now calls PyUnicode_Contains() only when the other operand is a Unicode string (not whenever it's not a string). * New format style '%r' inserts repr(arg) instead of str(arg). * '...%s...' % u"abc" now coerces to Unicode just like string methods. Care is taken not to reevaluate already formatted arguments -- only the first Unicode object appearing in the argument mapping is looked up twice. Added test cases for this to test_unicode.py.
* On 17-Mar-2000, Marc-Andre Lemburg said:Barry Warsaw2000-03-201-2/+2
| | | | | | | | | | | | | Attached you find an update of the Unicode implementation. The patch is against the current CVS version. I would appreciate if someone with CVS checkin permissions could check the changes in. The patch contains all bugs and patches sent this week and also fixes a leak in the codecs code and a bug in the free list code for Unicode objects (which only shows up when compiling Python with Py_DEBUG; thanks to MarkH for spotting this one).
* Fix typo in replace() detected by Mark Hammond and fixed by Marc-Andre.Guido van Rossum2000-03-131-2/+4
|
* Many changes for Unicode, by Marc-Andre Lemburg.Guido van Rossum2000-03-101-162/+755
|