summaryrefslogtreecommitdiffstats
path: root/Objects/unicodeobject.c
Commit message (Collapse)AuthorAgeFilesLines
* Backport: Double-fix of crash in Unicode freelist handling.Jeremy Hylton2003-09-171-1/+14
|
* Backport:Neal Norwitz2003-04-111-9/+9
| | | | | | | | | | | Fix SF bug #697220, string.strip implementation/doc mismatch Attempt to make all the various string/unicode *strip methods the same. * Doc - add doc for when functions were added * UserString * string/unicode object methods * string module functions 'chars' is used for the last parameter everywhere.
* Backport: Add more missing PyErr_NoMemory() after failled memory allocsNeal Norwitz2003-02-111-1/+1
|
* Backport MAL's patch for bug #659709: bogus computation of float lengthRaymond Hettinger2003-01-021-10/+21
|
* Backport 2.173:Neal Norwitz2002-11-071-3/+1
| | | | Fix for bug #626172: crash using unicode latin1 single char
* Backport stringobject.c 2.194 and unicodeobject.c 2.172:Guido van Rossum2002-10-111-2/+6
| | | | | | | | | | | | Fix a nasty endcase reported by Armin Rigo in SF bug 618623: '%2147483647d' % -123 segfaults. This was because an integer overflow in a comparison caused the string resize to be skipped. After fixing the overflow, this could call _PyString_Resize() with a negative size, so I (1) test for that and raise MemoryError instead; (2) also added a test for negative newsize to _PyString_Resize(), raising SystemError as for all bad arguments. An identical bug existed in unicodeobject.c, of course.
* Backport:Michael W. Hudson2002-10-071-1/+55
| | | | | | | | | | | | | | 2002/08/11 12:23:04 lemburg Python/bltinmodule.c 2.262 2002/08/11 12:23:04 lemburg Objects/unicodeobject.c 2.162 2002/08/11 12:23:03 lemburg Misc/NEWS 1.461 2002/08/11 12:23:03 lemburg Lib/test/test_unicode.py 1.65 2002/08/11 12:23:03 lemburg Include/unicodeobject.h 2.39 Add C API PyUnicode_FromOrdinal() which exposes unichr() at C level. u'%c' will now raise a ValueError in case the argument is an integer outside the valid range of Unicode code point ordinals. Closes SF bug #593581.
* unicode_memchr(): Squashed compiler wng (signed-vs-unsigned comparison).Tim Peters2002-09-241-1/+1
|
* Backport the UTF-8 codec from 2.3 and add a work-around to let theMarc-André Lemburg2002-09-241-101/+118
| | | | | | | | UTF-8 decoder accept broken UTF-8 sequences which encode lone high surrogates (the pre-2.2.2 versions forgot to generate the UTF-8 prefix \xed for these). Fixes SF bug #610783: Lone surrogates cause bad .pyc files.
* Fix cast from backport.Marc-André Lemburg2002-09-241-1/+1
|
* Backport from trunk:Guido van Rossum2002-09-231-1/+2
| | | | | | | | unicodeobject.c 2.169 stringobject.c 2.189 Fix warnings on 64-bit platforms about casts from pointers to ints. Two of these were real bugs.
* Backport 2.166 from trunk:Guido van Rossum2002-09-231-3/+9
| | | | | | | | | Fix SF bug 599128, submitted by Inyeol Lee: .replace() would do the wrong thing for a unicode subclass when there were zero string replacements. The example given in the SF bug report was only one way to trigger this; replacing a string of length >= 2 that's not found is another. The code would actually write outside allocated memory if replacement string was longer than the search string.
* Fix some endcase bugs in unicode rfind()/rindex() and endswith().Guido van Rossum2002-08-201-3/+3
| | | | | | These were reported and fixed by Inyeol Lee in SF bug 595350. The endswith() bug is already fixed in 2.3; I'll fix the others in 2.3 next.
* backport tim_one's patch:Anthony Baxter2002-04-301-26/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Repair widespread misuse of _PyString_Resize. Since it's clear people don't understand how this function works, also beefed up the docs. The most common usage error is of this form (often spread out across gotos): if (_PyString_Resize(&s, n) < 0) { Py_DECREF(s); s = NULL; goto outtahere; } The error is that if _PyString_Resize runs out of memory, it automatically decrefs the input string object s (which also deallocates it, since its refcount must be 1 upon entry), and sets s to NULL. So if the "if" branch ever triggers, it's an error to call Py_DECREF(s): s is already NULL! A correct way to write the above is the simpler (and intended) if (_PyString_Resize(&s, n) < 0) goto outtahere; Bugfix candidate. Original patch(es): python/dist/src/Objects/fileobject.c:2.161 python/dist/src/Objects/stringobject.c:2.161 python/dist/src/Objects/unicodeobject.c:2.147
* Backport checkin:Walter Dörwald2002-04-221-58/+163
| | | | | | | | | | Apply patch diff.txt from SF feature request http://www.python.org/sf/444708 This adds the optional argument for str.strip to unicode.strip too and makes it possible to call str.strip with a unicode argument and unicode.strip with a str argument.
* Backport the following changes:Walter Dörwald2002-04-221-5/+13
| | | | | | | | | | | | | | | | | | | | Misc/NEWS 1.387->1.388 Lib/test/string_tests.py 1.10->1.11, 1.12->1.14, Lib/test/test_unicode.py 1.50->1.51, 1.53->1.54, 1.55->1.56 Lib/test/test_string.py 1.15->1.16 Lib/string.py 1.61->1.63 Lib/test/test_userstring.py 1.5->1.6, 1.11, 1.12 Objects/stringobject.c 2.156->2.159 Objects/unicodeobject.c 2.137->2.139 Doc/lib/libstdtypes.tec 1.87->1.88 Add a method zfill to str, unicode and UserString and change Lib/string.py accordingly (see SF patch http://www.python.org/sf/536241) This also adds Guido's fix to test_userstring.py and the subinstance checks in test_string.py and test_unicode.py.
* Martin's fix forMichael W. Hudson2002-03-181-14/+24
| | | | | | | | [ 529104 ] broken error handling in unicode-escape I presume this will need to be fixed on the trunk, too. Later.
* FixMichael W. Hudson2002-03-181-7/+9
| | | | | | | | [ 531306 ] ucs4 build horked. Classic C mistake, I think. Also squashed a couple of warnings in the ucs4 build.
* Move to zlib 1.1.4 on Windows (the new version that squashes the "doubleTim Peters2002-03-131-2/+2
| | | | | | | | | | | free" glitch). unicodeobject.c: squash compiler warnings. Noting that test_pyclbr currently fails in 2.2.1: test_others (__main__.PyclbrTest) ... ??? HTTP11 FAIL
* Whitespace normalization and minor cosmetics.Marc-André Lemburg2002-02-251-22/+24
|
* Fix UTF-8 encoder pointer arithmetic and restore 2.2 behaviour.Marc-André Lemburg2002-02-251-13/+10
|
* Fix the problem reported inMichael W. Hudson2002-02-221-3/+10
| | | | | | [ #495401 ] Build troubles: --with-pymalloc in a slightly different manner to the trunk, as discussed on python-dev.
* Fix for #489669 (Neil Norwitz): memory leak in test_descr (unicode).Guido van Rossum2001-12-061-6/+3
| | | | | | | | | | | | | | This is best reproduced by while 1: class U(unicode): pass U(u"xxxxxx") The unicode_dealloc() code wasn't properly freeing the str and defenc fields of the Unicode object when freeing a subtype instance. Fixed this by a subtle refactoring that actually reduces the amount of code slightly.
* formatfloat(), formatint(): Conversion of sprintf() to PyOS_snprintf()Barry Warsaw2001-11-281-4/+6
| | | | for buffer overrun avoidance.
* Fix for bug #485951: repr diff between string and unicode.Marc-André Lemburg2001-11-281-1/+1
|
* Fix for bug #438164: %-formatting using Unicode objects.Marc-André Lemburg2001-11-201-0/+4
| | | | | This patch also does away with an incompatibility between Jython and CPython.
* Additional test and documentation for the unicode() changes.Marc-André Lemburg2001-10-191-2/+3
| | | | This patch should also be applied to the 2.2b1 trunk.
* SF patch #470578: Fixes to synchronize unicode() and str()Guido van Rossum2001-10-191-46/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements what we have discussed on python-dev late in September: str(obj) and unicode(obj) should behave similar, while the old behaviour is retained for unicode(obj, encoding, errors). The patch also adds a new feature with which objects can provide unicode(obj) with input data: the __unicode__ method. Currently no new tp_unicode slot is implemented; this is left as option for the future. Note that PyUnicode_FromEncodedObject() no longer accepts Unicode objects as input. The API name already suggests that Unicode objects do not belong in the list of acceptable objects and the functionality was only needed because PyUnicode_FromEncodedObject() was being used directly by unicode(). The latter was changed in the discussed way: * unicode(obj) calls PyObject_Unicode() * unicode(obj, encoding, errors) calls PyUnicode_FromEncodedObject() One thing left open to discussion is whether to leave the PyUnicode_FromObject() API as a thin API extension on top of PyUnicode_FromEncodedObject() or to turn it into a (macro) alias for PyObject_Unicode() and deprecate it. Doing so would have some surprising consequences though, e.g. u"abc" + 123 would turn out as u"abc123"... [Marc-Andre didn't have time to check this in before the deadline. I hope this is OK, Marc-Andre! You can still make changes and commit them on the trunk after the branch has been made, but then please mail Barry a context diff if you want the change to be merged into the 2.2b1 release branch. GvR]
* Enable GC for new-style instances. This touches lots of files, sinceGuido van Rossum2001-10-051-2/+7
| | | | | | | | | | | | | | | | | | | | | | many types were subclassable but had a xxx_dealloc function that called PyObject_DEL(self) directly instead of deferring to self->ob_type->tp_free(self). It is permissible to set tp_free in the type object directly to _PyObject_Del, for non-GC types, or to _PyObject_GC_Del, for GC types. Still, PyObject_DEL was a tad faster, so I'm fearing that our pystone rating is going down again. I'm not sure if doing something like void xxx_dealloc(PyObject *self) { if (PyXxxCheckExact(self)) PyObject_DEL(self); else self->ob_type->tp_free(self); } is any faster than always calling the else branch, so I haven't attempted that -- however those types whose own dealloc is fancier (int, float, unicode) do use this pattern.
* Fix a bug in rendering of \\ by repr() -- it rendered as \\\ insteadGuido van Rossum2001-09-211-0/+1
| | | | of \\.
* Fix Unicode .join() method to raise a TypeError for sequenceMarc-André Lemburg2001-09-201-1/+11
| | | | | | | | | | elements which are not Unicode objects or strings. (This matches the string.join() behaviour.) Fix a memory leak in the .join() method which occurs in case the Unicode resize fails. Restore the test_unicode output.
* Implement the changes proposed in patch #413333. unicode(obj) nowMarc-André Lemburg2001-09-201-42/+55
| | | | | works just like str(obj) in that it tries __str__/tp_str on the object in case it finds that the object is not a string or buffer.
* Patch #435971: UTF-7 codec by Brian Quinlan.Marc-André Lemburg2001-09-201-0/+300
|
* str_subtype_new, unicode_subtype_new:Tim Peters2001-09-121-10/+11
| | | | | | | | + These were leaving the hash fields at 0, which all string and unicode routines believe is a legitimate hash code. As a result, hash() applied to str and unicode subclass instances always returned 0, which in turn confused dict operations, etc. + Changed local names "new"; no point to antagonizing C++ compilers.
* More on bug 460020: disable many optimizations of unicode subclasses.Tim Peters2001-09-121-10/+11
|
* Possibly the end of SF [#460020] bug or feature: unicode() and subclasses.Tim Peters2001-09-111-4/+12
| | | | | Changed unicode(i) to return a true Unicode object when i is an instance of a unicode subclass. Added PyUnicode_CheckExact macro.
* PyUnicode_FromEncodedObject(): Repair memory leak in an error case.Tim Peters2001-09-111-2/+2
|
* Make unicode subclassable.Guido van Rossum2001-08-301-2/+32
|
* Patch #427190: Implement and use METH_NOARGS and METH_O.Martin v. Löwis2001-08-161-116/+60
|
* SF patch #438013 Remove 2-byte Py_UCS2 assumptionsTim Peters2001-08-091-76/+90
| | | | | | | | Removed all instances of Py_UCS2 from the codebase, and so also (I hope) the last remaining reliance on the platform having an integral type with exactly 16 bits. PyUnicode_DecodeUTF16() and PyUnicode_EncodeUTF16() now read and write one byte at a time.
* Merge of descr-branch back into trunk.Tim Peters2001-08-021-9/+45
|
* Add _PyUnicode_AsDefaultEncodedString to unicodeobject.h.Jeremy Hylton2001-07-301-14/+0
| | | | | | | And remove all the extern decls in the middle of .c files. Apparently, it was excluded from the header file because it is intended for internal use by the interpreter. It's still intended for internal use and documented as such in the header file.
* Fix for bug #444493: u'\U00010001' segfaults with current CVS onMarc-André Lemburg2001-07-251-6/+21
| | | | wide builds.
* Make the unicode-escape and the UTF-16 codecs handle surrogatesMarc-André Lemburg2001-07-201-24/+46
| | | | | | | | correctly and thus roundtrip-safe. Some minor cleanups of the code. Added tests for the roundtrip-safety.
* #ifdef out generation of \U escapes unless Py_UNICODE_WIDE. ThisGuido van Rossum2001-07-201-0/+2
| | | | | | | | | | #caused warnings with the VMS C compiler. (SF bug #442998, in part.) On a narrow system the current code should never be executed since ch will always be < 0x10000. Marc-Andre: you may end up fixing this a different way, since I believe you have plans to generate \U for surrogate pairs. I'll leave that to you.
* use Py_UNICODE_WIDE instead of USE_UCS4_STORAGE and Py_UNICODE_SIZEFredrik Lundh2001-06-271-4/+4
| | | | tests.
* Encode surrogates in UTF-8 even for a wide Py_UNICODE.Martin v. Löwis2001-06-271-7/+12
| | | | | | | Implement sys.maxunicode. Explicitly wrap around upper/lower computations for wide Py_UNICODE. When decoding large characters with UTF-8, represent expected test results using the \U notation.
* When decoding UTF-16, don't assume that the buffer is in native endiannessMartin v. Löwis2001-06-261-4/+4
| | | | when checking surrogates.
* Support using UCS-4 as the Py_UNICODE type:Martin v. Löwis2001-06-261-30/+89
| | | | | | | | | | Add configure option --enable-unicode. Add config.h macros Py_USING_UNICODE, PY_UNICODE_TYPE, Py_UNICODE_SIZE, SIZEOF_WCHAR_T. Define Py_UCS2. Encode and decode large UTF-8 characters into single Py_UNICODE values for wide Unicode types; likewise for UTF-16. Remove test whether sizeof Py_UNICODE is two.
* experimental UCS-4 support: added USE_UCS4_STORAGE define toFredrik Lundh2001-06-261-0/+2
| | | | | | unicodeobject.h, which forces sizeof(Py_UNICODE) == sizeof(Py_UCS4). (this may be good enough for platforms that doesn't have a 16-bit type. the UTF-16 codecs don't work, though)