summaryrefslogtreecommitdiffstats
path: root/Objects/unicodeobject.c
Commit message (Collapse)AuthorAgeFilesLines
* PEP 293 implemention (from SF patch http://www.python.org/sf/432401)Walter Dörwald2002-09-021-552/+1240
|
* Fix SF bug 599128, submitted by Inyeol Lee: .replace() would do theGuido van Rossum2002-08-231-3/+9
| | | | | | | | | | | | | wrong thing for a unicode subclass when there were zero string replacements. The example given in the SF bug report was only one way to trigger this; replacing a string of length >= 2 that's not found is another. The code would actually write outside allocated memory if replacement string was longer than the search string. (I wonder how many more of these are lurking? The unicode code base is full of wonders.) Bugfix candidate; this same bug is present in 2.2.1.
* Code by Inyeol Lee, submitted to SF bug 595350, to implementGuido van Rossum2002-08-231-14/+20
| | | | | the string/unicode method .replace() with a zero-lengt first argument. Inyeol contributed tests for this too.
* Fix some endcase bugs in unicode rfind()/rindex() and endswith().Guido van Rossum2002-08-201-3/+3
| | | | | | These were reported and fixed by Inyeol Lee in SF bug 595350. The endswith() bug was already fixed in 2.3, but this adds some more test cases.
* More changes of DeprecationWarning to FutureWarning.Guido van Rossum2002-08-141-1/+1
|
* Add C API PyUnicode_FromOrdinal() which exposes unichr() at C level.Marc-André Lemburg2002-08-111-1/+55
| | | | | | | u'%c' will now raise a ValueError in case the argument is an integer outside the valid range of Unicode code point ordinals. Closes SF bug #593581.
* Implement stage B0 of PEP 237: add warnings for operations thatGuido van Rossum2002-08-111-0/+10
| | | | | | | | | | currently return inconsistent results for ints and longs; in particular: hex/oct/%u/%o/%x/%X of negative short ints, and x<<n that either loses bits or changes sign. (No warnings for repr() of a long, though that will also change to lose the trailing 'L' eventually.) This introduces some warnings in the test suite; I'll take care of those later.
* Unicode replace() method with empty pattern argument should fail, likeGuido van Rossum2002-08-091-0/+5
| | | | it does for 8-bit strings.
* PyUnicode_Contains(): The memcmp() call didn't take into account theBarry Warsaw2002-08-061-1/+1
| | | | width of Py_UNICODE. Good catch, MAL.
* Committing patch #591250 which provides "str1 in str2" when str1 is aBarry Warsaw2002-08-061-17/+23
| | | | string of longer than 1 character.
* tighten up the unicode object's docstring a tadSkip Montanaro2002-07-261-2/+2
|
* staticforward bites the dust.Jeremy Hylton2002-07-171-1/+1
| | | | | | | | | | | | | | | The staticforward define was needed to support certain broken C compilers (notably SCO ODT 3.0, perhaps early AIX as well) botched the static keyword when it was used with a forward declaration of a static initialized structure. Standard C allows the forward declaration with static, and we've decided to stop catering to broken C compilers. (In fact, we expect that the compilers are all fixed eight years later.) I'm leaving staticforward and statichere defined in object.h as static. This is only for backwards compatibility with C extensions that might still use it. XXX I haven't updated the documentation.
* Patch #569753: Remove support for WIN16.Martin v. Löwis2002-06-301-3/+3
| | | | Rename all occurrences of MS_WIN32 to MS_WINDOWS.
* Fix typo in exception messageNeal Norwitz2002-06-131-1/+1
|
* Patch #568124: Add doc string macros.Martin v. Löwis2002-06-131-74/+74
|
* This is my nearly two year old patchMichael W. Hudson2002-06-111-2/+54
| | | | | | | | | [ 400998 ] experimental support for extended slicing on lists somewhat spruced up and better tested than it was when I wrote it. Includes docs & tests. The whatsnew section needs expanding, and arrays should support extended slices -- later.
* Fix a possible segfault. Found be Neal Norvitz.Marc-André Lemburg2002-05-291-1/+1
|
* Fix for bug [ 561796 ] string.find causes lazy errorMarc-André Lemburg2002-05-291-2/+2
|
* - A new type object, 'string', is added. This is a common base typeGuido van Rossum2002-05-241-1/+3
| | | | | | | for 'str' and 'unicode', and can be used instead of types.StringTypes, e.g. to test whether something is "a string": isinstance(x, string) is True for Unicode and 8-bit strings. This is an abstract base class and cannot be instantiated directly.
* Patch 549187. Improve string formatting error message.Raymond Hettinger2002-05-211-2/+2
|
* Repair widespread misuse of _PyString_Resize. Since it's clear peopleTim Peters2002-04-271-26/+8
| | | | | | | | | | | | | | | | | | | | | | don't understand how this function works, also beefed up the docs. The most common usage error is of this form (often spread out across gotos): if (_PyString_Resize(&s, n) < 0) { Py_DECREF(s); s = NULL; goto outtahere; } The error is that if _PyString_Resize runs out of memory, it automatically decrefs the input string object s (which also deallocates it, since its refcount must be 1 upon entry), and sets s to NULL. So if the "if" branch ever triggers, it's an error to call Py_DECREF(s): s is already NULL! A correct way to write the above is the simpler (and intended) if (_PyString_Resize(&s, n) < 0) goto outtahere; Bugfix candidate.
* SF patch 549375: Compromise PyUnicode_EncodeUTF8Tim Peters2002-04-271-108/+70
| | | | | | | | | | | | | | | | | | | | This implements ideas from Marc-Andre, Martin, Guido and me on Python-Dev. "Short" Unicode strings are encoded into a "big enough" stack buffer, then exactly as much string space as they turn out to need is allocated at the end. This should have speed benefits akin to Martin's "measure once, allocate once" strategy, but without needing a distinct measuring pass. "Long" Unicode strings allocate as much heap space as they could possibly need (4 x # Unicode chars), and do a realloc at the end to return the untouched excess. Since the overallocation is likely to be substantial, this shouldn't burden the platform realloc with unusably small excess blocks. Also simplified uses of the PyString_xyz functions. Also added a release- build check that 4*size doesn't overflow a C int. Sooner or later, that's going to happen.
* unicode_memchr(): Squashed gratuitous int-vs-size_t mismatch (whichTim Peters2002-04-221-3/+3
| | | | | gives a compiler wng under MSVC because of the resulting signed-vs- unsigned comparison).
* Apply patch diff.txt from SF feature requestWalter Dörwald2002-04-221-58/+163
| | | | | | | | | http://www.python.org/sf/444708 This adds the optional argument for str.strip to unicode.strip too and makes it possible to call str.strip with a unicode argument and unicode.strip with a str argument.
* PyUnicode_EncodeUTF8(): tightened the memory asserts a bit, and at leastTim Peters2002-04-211-12/+20
| | | | tried to catch some possible arithmetic overflows in the debug build.
* Back out 2.140.Martin v. Löwis2002-04-211-43/+55
|
* PyUnicode_EncodeUTF8: squash compiler wng. The difference of twoTim Peters2002-04-211-4/+5
| | | | | | pointers is a signed type. Changing "allocated" to a signed int makes undetected overflow more likely, but there was no overflow detection before either.
* Patch #495401: Count number of required bytes for encoding UTF-8 beforeMartin v. Löwis2002-04-201-54/+43
| | | | allocating the target buffer.
* Return the orginal string only if it's a real str or unicodeWalter Dörwald2002-04-151-2/+9
| | | | instance, otherwise make a copy.
* Apply the second version of SF patch http://www.python.org/sf/536241Walter Dörwald2002-04-151-3/+4
| | | | | | | | | | Add a method zfill to str, unicode and UserString and change Lib/string.py accordingly. This activates the zfill version in unicodeobject.c that was commented out and implements the same in stringobject.c. It also adds the test for unicode support in Lib/string.py back in and uses repr() instead() of str() (as it was before Lib/string.py 1.62)
* Remove PyMalloc_*.Neil Schemenauer2002-04-121-5/+5
|
* Bug fix for UTF-8 encoding bug (buffer overrun) #541828.Marc-André Lemburg2002-04-101-39/+46
|
* Added test case for UTF-8 encoding bug #541828.Marc-André Lemburg2002-04-101-2/+2
|
* Add the 'bool' type and its values 'False' and 'True', as described inGuido van Rossum2002-04-031-72/+72
| | | | | | | | | | | | | PEP 285. Everything described in the PEP is here, and there is even some documentation. I had to fix 12 unit tests; all but one of these were printing Boolean outcomes that changed from 0/1 to False/True. (The exception is test_unicode.py, which did a type(x) == type(y) style comparison. I could've fixed that with a single line using issubtype(x, type(y)), but instead chose to be explicit about those places where a bool is expected. Still to do: perhaps more documentation; change standard library modules to return False/True from predicates.
* Fix whitespace.Walter Dörwald2002-03-251-15/+15
|
* Use pymalloc if it's enabled.Neil Schemenauer2002-03-221-5/+5
|
* Do not insert characters for unicode-escape decoders if the error modeMartin v. Löwis2002-03-211-14/+24
| | | | is "ignore". Fixes #529104.
* %#x/%#X format conversion cleanup (see patch #450267):Andrew MacIntyre2002-02-281-34/+39
| | | | | | Objects/ stringobject.c unicodeobject.c
* OS/2 EMX port changes (Objects part of patch #450267):Andrew MacIntyre2002-02-261-0/+10
| | | | | | | | | | | | Objects/ fileobject.c stringobject.c unicodeobject.c This commit doesn't include the cleanup patches for stringobject.c and unicodeobject.c which are shown separately in the patch manager. Those patches will be regenerated and applied in a subsequent commit, so as to preserve a fallback position (this commit to those files).
* Fix to the UTF-8 encoder: it failed on 0-length input strings.Marc-André Lemburg2002-02-071-6/+17
| | | | | | | | | | | | | | Fix for the UTF-8 decoder: it will now accept isolated surrogates (previously it raised an exception which causes round-trips to fail). Added new tests for UTF-8 round-trip safety (we rely on UTF-8 for marshalling Unicode objects, so we better make sure it works for all Unicode code points, including isolated surrogates). Bumped the PYC magic in a non-standard way -- please review. This was needed because the old PYC format used illegal UTF-8 sequences for isolated high surrogates which now raise an exception.
* Cosmetics.Marc-André Lemburg2002-02-061-6/+6
|
* Whitespace fixes.Marc-André Lemburg2002-02-061-11/+11
|
* Fix for the UTF-8 memory allocation bug and the UTF-8 encodingMarc-André Lemburg2002-02-061-23/+27
| | | | bug related to lone high surrogates.
* Fix for #489669 (Neil Norwitz): memory leak in test_descr (unicode).Guido van Rossum2001-12-061-6/+3
| | | | | | | | | | | | | | This is best reproduced by while 1: class U(unicode): pass U(u"xxxxxx") The unicode_dealloc() code wasn't properly freeing the str and defenc fields of the Unicode object when freeing a subtype instance. Fixed this by a subtle refactoring that actually reduces the amount of code slightly.
* formatfloat(), formatint(): Conversion of sprintf() to PyOS_snprintf()Barry Warsaw2001-11-281-4/+6
| | | | for buffer overrun avoidance.
* Fix for bug #485951: repr diff between string and unicode.Marc-André Lemburg2001-11-281-1/+1
|
* Fix for bug #438164: %-formatting using Unicode objects.Marc-André Lemburg2001-11-201-0/+4
| | | | | This patch also does away with an incompatibility between Jython and CPython.
* Additional test and documentation for the unicode() changes.Marc-André Lemburg2001-10-191-2/+3
| | | | This patch should also be applied to the 2.2b1 trunk.
* SF patch #470578: Fixes to synchronize unicode() and str()Guido van Rossum2001-10-191-46/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements what we have discussed on python-dev late in September: str(obj) and unicode(obj) should behave similar, while the old behaviour is retained for unicode(obj, encoding, errors). The patch also adds a new feature with which objects can provide unicode(obj) with input data: the __unicode__ method. Currently no new tp_unicode slot is implemented; this is left as option for the future. Note that PyUnicode_FromEncodedObject() no longer accepts Unicode objects as input. The API name already suggests that Unicode objects do not belong in the list of acceptable objects and the functionality was only needed because PyUnicode_FromEncodedObject() was being used directly by unicode(). The latter was changed in the discussed way: * unicode(obj) calls PyObject_Unicode() * unicode(obj, encoding, errors) calls PyUnicode_FromEncodedObject() One thing left open to discussion is whether to leave the PyUnicode_FromObject() API as a thin API extension on top of PyUnicode_FromEncodedObject() or to turn it into a (macro) alias for PyObject_Unicode() and deprecate it. Doing so would have some surprising consequences though, e.g. u"abc" + 123 would turn out as u"abc123"... [Marc-Andre didn't have time to check this in before the deadline. I hope this is OK, Marc-Andre! You can still make changes and commit them on the trunk after the branch has been made, but then please mail Barry a context diff if you want the change to be merged into the 2.2b1 release branch. GvR]
* Enable GC for new-style instances. This touches lots of files, sinceGuido van Rossum2001-10-051-2/+7
| | | | | | | | | | | | | | | | | | | | | | many types were subclassable but had a xxx_dealloc function that called PyObject_DEL(self) directly instead of deferring to self->ob_type->tp_free(self). It is permissible to set tp_free in the type object directly to _PyObject_Del, for non-GC types, or to _PyObject_GC_Del, for GC types. Still, PyObject_DEL was a tad faster, so I'm fearing that our pystone rating is going down again. I'm not sure if doing something like void xxx_dealloc(PyObject *self) { if (PyXxxCheckExact(self)) PyObject_DEL(self); else self->ob_type->tp_free(self); } is any faster than always calling the else branch, so I haven't attempted that -- however those types whose own dealloc is fancier (int, float, unicode) do use this pattern.