summaryrefslogtreecommitdiffstats
path: root/Objects/unicodeobject.c
Commit message (Collapse)AuthorAgeFilesLines
* Fix extra space.Ezio Melotti2010-06-261-1/+1
|
* fix warning with ucs4Benjamin Peterson2010-06-121-1/+2
|
* Issue #8941: decoding big endian UTF-32 data in UCS-2 builds could crashAntoine Pitrou2010-06-111-19/+21
| | | | | the interpreter with characters outside the Basic Multilingual Plane (higher than 0x10000).
* Update PyUnicode_DecodeUTF8 from RFC 2279 to RFC 3629.Ezio Melotti2010-06-051-63/+56
| | | | | | | | | | | | | | | 1) #8271: when a byte sequence is invalid, only the start byte and all the valid continuation bytes are now replaced by U+FFFD, instead of replacing the number of bytes specified by the start byte. See http://www.unicode.org/versions/Unicode5.2.0/ch03.pdf (pages 94-95); 2) 5- and 6-bytes-long UTF-8 sequences are now considered invalid (no changes in behavior); 3) Add code and tests to reject surrogates (U+D800-U+DFFF) as defined in RFC 3629, but leave it commented out since it's not backward compatible; 4) Change the error messages "unexpected code byte" to "invalid start byte" and "invalid data" to "invalid continuation byte"; 5) Add an extensive set of tests in test_unicode; 6) Fix test_codeccallbacks because it was failing after this change.
* Remove an unneeded variable and assignment.Brett Cannon2010-05-041-3/+2
| | | | Found using Clang's static analyzer.
* more _PyString_Resize error checkingBenjamin Peterson2010-04-031-4/+8
|
* #7643: Unicode codepoints VT (0x0B) and FF (0x0C) are linebreaks according ↵Florent Xicluna2010-03-301-3/+5
| | | | to Unicode Standard Annex #14.
* Backported PyCapsule from 3.1, and converted most uses ofLarry Hastings2010-03-251-10/+1
| | | | CObject to PyCapsule.
* Issue #1583863: An unicode subclass can now override the __str__ methodVictor Stinner2010-03-221-1/+1
|
* #7649: "u'%c' % char" now behaves like "u'%s' % char" and raises a ↵Ezio Melotti2010-02-251-9/+18
| | | | UnicodeDecodeError if 'char' is a byte string that can't be decoded using the default encoding.
* Issue #7649: Fix u'%c' % char for character in range 0x80..0xFFVictor Stinner2010-02-231-1/+9
| | | | => raise an UnicodeDecodeError. Patch written by Ezio Melotti.
* #7775: fixed docstring for rpartitionEzio Melotti2010-01-251-1/+1
|
* Sanitize bloom filter macrosAntoine Pitrou2010-01-131-3/+13
|
* Fix Windows build (re r77461)Antoine Pitrou2010-01-131-1/+1
|
* Issue #7622: Improve the split(), rsplit(), splitlines() and replace()Antoine Pitrou2010-01-131-348/+72
| | | | | methods of bytes, bytearray and unicode objects by using a common implementation based on stringlib's fast search. Patch by Florent Xicluna.
* Issue #1680159: unicode coercion during an 'in' operation was maskingR. David Murray2009-12-141-2/+0
| | | | | | | any errors that might occur during coercion of the left operand and turning them into a TypeError with a message text that was confusing in the given context. This patch lets any errors through, as was already done during coercion of the right hand side.
* Issue #3382: Make '%F' and float.__format__('F') convert results to upper ↵Eric Smith2009-11-291-2/+0
| | | | case. Much of the patch came from Mark Dickinson.
* Issue #7117, continued: Remove substitution of %g-style formatting forMark Dickinson2009-11-231-3/+0
| | | | | %f-style formatting, which used to occur at high precision. Float formatting should now be consistent between 2.7 and 3.1.
* Remove restriction on precision when formatting floats. This is theMark Dickinson2009-11-231-57/+21
| | | | | first step towards removing the %f -> %g switch (see issues 7117, 5859).
* Finished removing _PyOS_double_to_string, as mentioned in issue 7117.Eric Smith2009-10-261-14/+11
|
* #7116: str.join() takes an iterable.Georg Brandl2009-10-141-2/+2
|
* add keyword arguments support to str/unicode encode and decode #6300Benjamin Peterson2009-09-181-6/+10
|
* Issue #6922: Fix an infinite loop when trying to decode an invalidGeorg Brandl2009-09-171-1/+1
| | | | UTF-32 stream with a non-raising error handler like "replace" or "ignore".
* Silence gcc 'comparison always false' warningMark Dickinson2009-08-281-1/+3
|
* Grow the allocated buffer in PyUnicode_EncodeUTF7 to avoid buffer overrun.Alexandre Vassalotti2009-07-071-2/+2
| | | | | | | Without this change, test_unicode.UnicodeTest.test_codecs_utf7 crashes in debug mode. What happens is the unicode string u'\U000abcde' with a length of 1 encodes to the string '+2m/c3g-' of length 8. Since only 5 bytes is reserved in the buffer, a buffer overrun occurs.
* #6224: s/JPython/Jython/, and remove one link to a module nine years old.Georg Brandl2009-06-061-1/+1
|
* #5929: fix signedness warning.Georg Brandl2009-05-051-1/+1
|
* Issue #4426: The UTF-7 decoder was too strict and didn't accept some legal ↵Antoine Pitrou2009-05-041-184/+240
| | | | | | sequences. Patch by Nick Barnes and Victor Stinner.
* There's no %A in Python 2.x!Walter Dörwald2009-05-031-1/+1
|
* Issue #5108: Handle %s like %S and %R in PyUnicode_FromFormatV(): CallWalter Dörwald2009-05-031-48/+32
| | | | | | PyUnicode_DecodeUTF8() once, remember the result and output it in a second step. This avoids problems with counting UTF-8 bytes that ignores the effect of using the replace error handler in PyUnicode_DecodeUTF8().
* Issue #5835, deprecate PyOS_ascii_formatd.Eric Smith2009-04-251-9/+6
| | | | | | If anyone wants to clean up the documentation, feel free. It's my first documentation foray, and it's not that great. Will port to py3k with a different strategy.
* Issue #532631: Apply floatformat changes to unicodeobject.cMark Dickinson2009-03-291-0/+9
| | | | as well as stringobject.c.
* Issue #532631: Replace confusing fabs(x)/1e25 >= 1e25 testMark Dickinson2009-03-291-1/+1
| | | | with fabs(x) >= 1e50, and fix documentation.
* There is no macro named SIZEOF_SSIZE_T. Should use SIZEOF_SIZE_T instead.Hirokazu Yamamoto2009-03-211-1/+1
|
* Issue 4474: On platforms with sizeof(wchar_t) == 4 andMark Dickinson2009-03-181-0/+58
| | | | | | | | | sizeof(Py_UNICODE) == 2, PyUnicode_FromWideChar now converts each character outside the BMP to the appropriate surrogate pair. Thanks Victor Stinner for the patch. (backport of r70452 from py3k to trunk)
* Issue #5341: Fix a variety of spelling errors.Mark Dickinson2009-02-211-1/+1
|
* Fix warnings GCC emits where the argument of PyErr_Format is a single variable.Georg Brandl2009-02-131-3/+3
|
* fix indentation in commentBenjamin Peterson2009-01-311-2/+2
|
* fix indentation; looks like all I managed to do the first time is make ↵Benjamin Peterson2009-01-311-2558/+2558
| | | | things uglier
* fix indentationBenjamin Peterson2009-01-311-2/+2
|
* completely detabify unicodeobject.cBenjamin Peterson2009-01-311-3010/+3010
|
* Remove unnecessary casts related to unicode_decode_call_errorhandler.Alexandre Vassalotti2008-12-271-27/+29
| | | | | | Make the _PyUnicode_Resize macro a static function. These changes are needed to avoid breaking strict aliasing rules.
* Fix a small typo in docstringAmaury Forgeot d'Arc2008-11-291-1/+1
|
* Docstring change for *partition: use same tense as other docstrings.Andrew M. Kuchling2008-10-041-6/+6
| | | | | Hyphenate left- and right-justified. Fix 'registerd' typo
* Fixed a couple more C99 comments and one occurence of inline.Christian Heimes2008-10-021-15/+15
|
* Fix varname in docstring. #3822.Georg Brandl2008-09-091-2/+2
|
* Correct a crash when two successive unicode allocations fail with a MemoryError:Amaury Forgeot d'Arc2008-07-311-1/+3
| | | | | | | | | the freelist contained half-initialized objects with freed pointers. The comment /* XXX UNREF/NEWREF interface should be more symmetrical */ was copied from tupleobject.c, and appears in some other places. I sign the petition.
* Security patches from Apple: prevent int overflow when allocating memoryNeal Norwitz2008-07-311-14/+47
|
* #2242: utf7 decoding crashes on bogus input on some Windows/MSVC versionsAntoine Pitrou2008-07-251-1/+1
|
* Backed out r65069, pending fixing it in Windows.Eric Smith2008-07-171-8/+5
|