summaryrefslogtreecommitdiffstats
path: root/Objects/stringobject.c
Commit message (Collapse)AuthorAgeFilesLines
* SF bug [#467265] Compile errors on SuSe Linux on IBM/s390.Tim Peters2001-10-021-1/+6
| | | | | | | Unknown whether this fixes it. - stringobject.c, PyString_FromFormatV: don't assume that va_list is of a type that can be copied via an initializer. - errors.c, PyErr_Format: add a va_end() to balance the va_start().
* Merge branch changes (coercion, rich comparisons) into trunk.Guido van Rossum2001-09-271-4/+2
|
* Change string comparison so that it applies even when one (or both)Guido van Rossum2001-09-241-3/+4
| | | | | arguments are subclasses of str, as long as they don't override rich comparison.
* If interning an instance of a string subclass, intern a real string objectTim Peters2001-09-121-4/+20
| | | | | | with the same value instead. This ensures that a string (or string subclass) object's ob_sinterned pointer is always a str (or NULL), and that the dict of interned strings only has strs as keys.
* str_subtype_new, unicode_subtype_new:Tim Peters2001-09-121-5/+15
| | | | | | | | + These were leaving the hash fields at 0, which all string and unicode routines believe is a legitimate hash code. As a result, hash() applied to str and unicode subclass instances always returned 0, which in turn confused dict operations, etc. + Changed local names "new"; no point to antagonizing C++ compilers.
* More bug 460020: lots of string optimizations inhibited for stringTim Peters2001-09-121-79/+46
| | | | | | | | | subclasses, all "the usual" ones (slicing etc), plus replace, translate, ljust, rjust, center and strip. I don't know how to be sure they've all been caught. Question: Should we complain if someone tries to intern an instance of a string subclass? I hate to slow any code on those paths.
* More on SF bug [#460020] bug or feature: unicode() and subclasses.Tim Peters2001-09-111-1/+1
| | | | | Repaired str(i) to return a genuine string when i is an instance of a str subclass. New PyString_CheckExact() macro.
* Fix a memory leak in str_subtype_new(). (All the otherGuido van Rossum2001-08-311-3/+3
| | | | xxx_subtype_new() functions are OK, but I goofed up in this one. :-( )
* Make str and tuple types subclassable.Guido van Rossum2001-08-301-2/+24
|
* Two improvements suggested by Greg Stein:Barry Warsaw2001-08-271-2/+5
| | | | | | | | | PyString_FromFormatV(): In the final resize at the end, we can use PyString_AS_STRING() since we know the object is a string and can avoid the typechecking. PyString_FromFormat(): GS sez: "For safety/propriety, you should call va_end() on the vargs variable."
* PyString_FromFormatV: Massage platform %p output to match what gcc does,Tim Peters2001-08-251-0/+8
| | | | | | | at least in the first two characters. %p is ill-defined, and people will forever commit bad tests otherwise ("bad" in the sense that they fall over (at least on Windows) for lack of a leading '0x'; 5 of the 7 tests in test_repr.py failed on Windows for that reason this time around).
* PyString_FromFormat() and PyString_FromFormatV(): Largely ripped fromBarry Warsaw2001-08-241-0/+155
| | | | | | | | | | | | | | | | | | | PyErr_Format() these new C API methods can be used instead of sprintf()'s into hardcoded char* buffers. This allows us to fix many situation where long package, module, or class names get truncated in reprs. PyString_FromFormat() is the varargs variety. PyString_FromFormatV() is the va_list variety Original PyErr_Format() code was modified to allow %p and %ld expansions. Many reprs were converted to this, checkins coming soo. Not changed: complex_repr(), float_repr(), float_print(), float_str(), int_repr(). There may be other candidates not yet converted. Closes patch #454743.
* Patch #445762: Support --disable-unicodeMartin v. Löwis2001-08-171-4/+56
| | | | | | | | - Do not compile unicodeobject, unicodectype, and unicodedata if Unicode is disabled - check for Py_USING_UNICODE in all places that use Unicode functions - disables unicode literals, and the builtin functions - add the types.StringTypes list - remove Unicode literals from most tests.
* Patch #427190: Implement and use METH_NOARGS and METH_O.Martin v. Löwis2001-08-161-103/+56
|
* Merge of descr-branch back into trunk.Tim Peters2001-08-021-26/+50
|
* Add _PyUnicode_AsDefaultEncodedString to unicodeobject.h.Jeremy Hylton2001-07-301-5/+0
| | | | | | | And remove all the extern decls in the middle of .c files. Apparently, it was excluded from the header file because it is intended for internal use by the interpreter. It's still intended for internal use and documented as such in the header file.
* Reformat decl of new _PyString_Join. Add NEWS blurb about repr() speedup.Tim Peters2001-06-161-1/+2
|
* SF bug 433228: repr(list) woes when len(list) big.Tim Peters2001-06-161-0/+17
| | | | | | | | | | | | Gave Python linear-time repr() implementations for dicts, lists, strings. This means, e.g., that repr(range(50000)) is no longer 50x slower than pprint.pprint() in 2.2 <wink>. I don't consider this a bugfix candidate, as it's a performance boost. Added _PyString_Join() to the internal string API. If we want that in the public API, fine, but then it requires runtime error checks instead of asserts.
* Fix for bug #432384: Recursion in PyString_AsEncodedString?Marc-André Lemburg2001-06-121-1/+1
|
* Patch #424335: Implement string_richcompare, remove string_compare.Martin v. Löwis2001-05-241-12/+77
| | | | Use new _PyString_Eq in lookdict_string.
* This patch changes the way the string .encode() method works slightlyMarc-André Lemburg2001-05-151-20/+90
| | | | | | | | | | | | | | | | | | | | | | | | | and introduces a new method .decode(). The major change is that strg.encode() will no longer try to convert Unicode returns from the codec into a string, but instead pass along the Unicode object as-is. The same is now true for all other codec return types. The underlying C APIs were changed accordingly. Note that even though this does have the potential of breaking existing code, the chances are low since conversion from Unicode previously took place using the default encoding which is normally set to ASCII rendering this auto-conversion mechanism useless for most Unicode encodings. The good news is that you can now use .encode() and .decode() with much greater ease and that the door was opened for better accessibility of the builtin codecs. As demonstration of the new feature, the patch includes a few new codecs which allow string to string encoding and decoding (rot13, hex, zip, uu, base64). Written by Marc-Andre Lemburg. Copyright assigned to the PSF.
* Heh. I need a break. After this: stropmodule & stringobject were moreTim Peters2001-05-101-8/+6
| | | | | | out of synch than I realized, and I managed to break replace's "count" argument when it was 0. All is well again. Maybe. Bugfix candidate.
* Fudge. stropmodule and stringobject both had copies of the buggyTim Peters2001-05-101-32/+41
| | | | | | mymemXXX stuff, and they were already out of synch. Fix the remaining bugs in both and get them back in synch. Bugfix release candidate.
* SF patch #416247 2.1c1 stringobject: unused vrbl cleanup.Tim Peters2001-05-091-2/+0
| | | | Thanks to Mark Favas.
* Sheesh -- repair the dodge around "cast isn't an lvalue" complaints toTim Peters2001-05-091-0/+4
| | | | restore correct semantics.
* Mark Favas reported that gcc caught me using casts as lvalues. Dodge it.Tim Peters2001-05-091-6/+10
|
* Ack! Restore the COUNT_ALLOCS one_strings code.Tim Peters2001-05-091-1/+5
|
* My change to string_item() left an extra reference to each 1-characterTim Peters2001-05-091-4/+3
| | | | | interned string created by "string"[i]. Since they're immortal anyway, this was hard to notice, but it was still wrong <wink>.
* Intern 1-character strings as soon as they're created. As-is, they aren'tTim Peters2001-05-081-15/+12
| | | | | | | | | | | | | | | | | | | interned when created, so the cached versions generally aren't ever interned. With the patch, the Py_INCREF(t); *p = t; Py_DECREF(s); return; indirection block in PyString_InternInPlace() is never executed during a full run of the test suite, but was executed very many times before. So I'm trading more work when creating one-character strings for doing less work later. Note that the "more work" here can happen at most 256 times per program run, so it's trivial. The same reasoning accounts for the patch's simplification of string_item (the new version can call PyString_FromStringAndSize() no more than 256 times per run, so there's no point to inlining that stuff -- if we were serious about saving time here, we'd pre-initialize the characters vector so that no runtime testing at all was needed!).
* Make unicode.join() work nice with iterators. This also required a changeTim Peters2001-05-051-1/+8
| | | | | | | | to string.join(), so that when the latter figures out in midstream that it really needs unicode.join() instead, unicode.join() can actually get all the sequence elements (i.e., there's no guarantee that the sequence passed to string.join() can be iterated over *again* by unicode.join(), so string.join() must not pass on the original sequence object anymore).
* Fix for bug #417030: "print '%*s' fails for unicode string"Marc-André Lemburg2001-05-021-2/+3
|
* Add a proper implementation for the tp_str slot (returning self, ofGuido van Rossum2001-05-011-1/+8
| | | | | course), so I can get rid of the special case for strings in PyObject_Str().
* A different approach to the problem reported inTim Peters2001-04-281-33/+39
| | | | | | | | | | | Patch #419651: Metrowerks on Mac adds 0x itself C std says %#x and %#X conversion of 0 do not add the 0x/0X base marker. Metrowerks apparently does. Mark Favas reported the same bug under a Compaq compiler on Tru64 Unix, but no other libc broken in this respect is known (known to be OK under MSVC and gcc). So just try the damn thing at runtime and see what the platform does. Note that we've always had bugs here, but never knew it before because a relevant test case didn't exist before 2.1.
* Iterators phase 1. This comprises:Guido van Rossum2001-04-201-0/+2
| | | | | | | | | | | | | | | | | | | | | | new slot tp_iter in type object, plus new flag Py_TPFLAGS_HAVE_ITER new C API PyObject_GetIter(), calls tp_iter new builtin iter(), with two forms: iter(obj), and iter(function, sentinel) new internal object types iterobject and calliterobject new exception StopIteration new opcodes for "for" loops, GET_ITER and FOR_ITER (also supported by dis.py) new magic number for .pyc files new special method for instances: __iter__() returns an iterator iteration over dictionaries: "for x in dict" iterates over the keys iteration over files: "for x in file" iterates over lines TODO: documentation test suite decide whether to use a different way to spell iter(function, sentinal) decide whether "for key in dict" is a good idea use iterators in map/filter/reduce, min/max, and elsewhere (in/not in?) speed tuning (make next() a slot tp_next???)
* Bug 415514 reported that e.g.Tim Peters2001-04-121-25/+24
| | | | | | | | | | | | "%#x" % 0 blew up, at heart because C sprintf supplies a base marker if and only if the value is not 0. I then fixed that, by tolerating C's inconsistency when it does %#x, and taking away that *Python* produced 0x0 when formatting 0L (the "long" flavor of 0) under %#x itself. But after talking with Guido, we agreed it would be better to supply 0x for the short int case too, despite that it's inconsistent with C, because C is inconsistent with itself and with Python's hex(0) (plus, while "%#x" % 0 didn't work before, "%#x" % 0L *did*, and returned "0x0"). Similarly for %#X conversion.
* Fix for SF bug #415514: "%#x" % 0 caused assertion failure/abort.Tim Peters2001-04-121-14/+25
| | | | | | | | | | | | | http://sourceforge.net/tracker/index.php?func=detail&aid=415514&group_id=5470&atid=105470 For short ints, Python defers to the platform C library to figure out what %#x should do. The code asserted that the platform C returned a string beginning with "0x". However, that's not true when-- and only when --the *value* being formatted is 0. Changed the code to live with C's inconsistency here. In the meantime, the problem does not arise if you format a long 0 (0L) instead. However, that's because the code *we* wrote to do %#x conversions on longs produces a leading "0x" regardless of value. That's probably wrong too: we should drop leading "0x", for consistency with C, when (& only when) formatting 0L. So I changed the long formatting code to do that too.
* _Py_ReleaseInternedStrings(): Private API function to decref andBarry Warsaw2001-02-231-0/+10
| | | | | | release the interned string dictionary. This is useful for memory use debugging because it eliminates a huge source of noise from the reports. Only defined when INTERN_STRINGS is defined.
* Show '\011', '\012', and '\015' as '\t', '\n', '\r' in strings.Ka-Ping Yee2001-01-241-6/+17
| | | | Switch from octal escapes to hex escapes for other nonprintable characters.
* Derivative of patch #102549, "simpler, faster(!) implementation of string.join".Tim Peters2001-01-191-38/+52
| | | | | | | | Also fixes two long-standing bugs (present in 2.0): 1. .join() didn't check that the result size fit in an int. 2. string.join(s) when len(s)==1 returned s[0] regardless of s[0]'s type; e.g., "".join([3]) returned 3 (overly optimistic optimization). I resisted a keen temptation to make .join() apply str() automagically.
* Added checks to prevent PyUnicode_Count() from dumping coreMarc-André Lemburg2001-01-161-11/+26
| | | | | | | | | | | | in case the parameters are out of bounds and fixes error handling for .count(), .startswith() and .endswith() for the case of mixed string/Unicode objects. This patch adds Python style index semantics to PyUnicode_Count() indices (including the special handling of negative indices). The patch is an extended version of patch #103249 submitted by Michael Hudson (mwh) on SF. It also includes new test cases.
* [ Patch #102852 ] Make % error a bit more informative by indicates theAndrew M. Kuchling2000-12-151-2/+3
| | | | index at which an unknown %-escape was found
* Jeffrey D. Collins <tokeneater@users.sourceforge.net>:Fred Drake2000-12-061-3/+3
| | | | | | Fix type of the self parameter to some string object methods. This closes patch #102670.
* Fox for SF bug #123859: %[duxXo] long formats inconsistent.Tim Peters2000-11-301-4/+1
|
* SF patch #102548, fix for bug #121013, by mwh@users.sourceforge.net.Guido van Rossum2000-11-271-1/+1
| | | | Fixes a typo that caused "".join(u"this is a test") to dump core.
* Ka-Ping Yee <ping@lfw.org>:Fred Drake2000-10-241-2/+2
| | | | | | Changes to error messages to increase consistency & clarity. This (mostly) closes SourceForge patch #101839.
* [ Bug #116174 ] using %% in cstrings sometimes fails with unicode paramsFix ↵Marc-André Lemburg2000-10-071-11/+17
| | | | | | | for the bug reported in Bug #116174: "%% %s" % u"abc" failed due to the way string formatting delegated work to the Unicode formatting function.
* Rationalize use of limits.h, moving the inclusion to Python.h.Fred Drake2000-09-261-5/+1
| | | | | | | | Add definitions of INT_MAX and LONG_MAX to pyport.h. Remove includes of limits.h and conditional definitions of INT_MAX and LONG_MAX elsewhere. This closes SourceForge patch #101659 and bug #115323.
* Derived from Martin's SF patch 110609: support unbounded ints in ↵Tim Peters2000-09-211-28/+203
| | | | | | | | | | | | | | | | %d,i,u,x,X,o formats. Note a curious extension to the std C rules: x, X and o formatting can never produce a sign character in C, so the '+' and ' ' flags are meaningless for them. But unbounded ints *can* produce a sign character under these conversions (no fixed- width bitstring is wide enough to hold all negative values in 2's-comp form). So these flags become meaningful in Python when formatting a Python long which is too big to fit in a C long. This required shuffling around existing code, which hacked x and X conversions to death when both the '#' and '0' flags were specified: the hacks weren't strong enough to deal with the simultaneous possibility of the ' ' or '+' flags too, since signs were always meaningless before for x and X conversions. Isomorphic shuffling was required in unicodeobject.c. Also added dozens of non-trivial new unbounded-int test cases to test_format.py.
* This patch adds a new Python C API called PyString_AsStringAndSize()Marc-André Lemburg2000-09-191-7/+63
| | | | | | | | | | | | | which implements the automatic conversion from Unicode to a string object using the default encoding. The new API is then put to use to have eval() and exec accept Unicode objects as code parameter. This closes bugs #110924 and #113890. As side-effect, the traditional C APIs PyString_Size() and PyString_AsString() will also accept Unicode objects as parameters.
* Fix for bug 113934. string*n and unicode*n did no overflow checking atTim Peters2000-09-091-2/+17
| | | | | | | all, either to see whether the # of chars fit in an int, or that the amount of memory needed fit in a size_t. Checking these is expensive, but the alternative is silently wrong answers (as in the bug report) or core dumps (which were easy to provoke using Unicode strings).