summaryrefslogtreecommitdiffstats
path: root/Lib/test/test_unicode.py
Commit message (Collapse)AuthorAgeFilesLines
* use unicode literalsBenjamin Peterson2010-06-071-3/+3
|
* correctly overflow when indexes are too largeBenjamin Peterson2010-06-071-0/+3
|
* Add a NEWS entry for r81758 and clarify a comment.Ezio Melotti2010-06-051-3/+3
|
* Update PyUnicode_DecodeUTF8 from RFC 2279 to RFC 3629.Ezio Melotti2010-06-051-0/+158
| | | | | | | | | | | | | | | 1) #8271: when a byte sequence is invalid, only the start byte and all the valid continuation bytes are now replaced by U+FFFD, instead of replacing the number of bytes specified by the start byte. See http://www.unicode.org/versions/Unicode5.2.0/ch03.pdf (pages 94-95); 2) 5- and 6-bytes-long UTF-8 sequences are now considered invalid (no changes in behavior); 3) Add code and tests to reject surrogates (U+D800-U+DFFF) as defined in RFC 3629, but leave it commented out since it's not backward compatible; 4) Change the error messages "unexpected code byte" to "invalid start byte" and "invalid data" to "invalid continuation byte"; 5) Add an extensive set of tests in test_unicode; 6) Fix test_codeccallbacks because it was failing after this change.
* #8016: add the CP858 codec (approved by Benjamin). (Also add CP720 to the ↵Georg Brandl2010-05-241-4/+4
| | | | tests, it was missing there.)
* Fix the NEWS about my last commit: an unicode subclass can now override theVictor Stinner2010-03-221-2/+0
| | | | | | __unicode__ method (and not the __str__ method). Simplify also the testcase.
* Issue #1583863: An unicode subclass can now override the __str__ methodVictor Stinner2010-03-221-0/+11
|
* Issue #7849: Now the utility ``check_warnings`` verifies if the warnings areFlorent Xicluna2010-03-071-2/+1
| | | | effectively raised. A new utility ``check_py3k_warnings`` deals with py3k warnings.
* Issue #7649: Fix u'%c' % char for character in range 0x80..0xFFVictor Stinner2010-02-231-0/+13
| | | | => raise an UnicodeDecodeError. Patch written by Ezio Melotti.
* use assert[Not]In where appropriateEzio Melotti2010-01-231-51/+51
|
* Issue #7462: Implement the stringlib fast search algorithm for the `rfind`,Antoine Pitrou2010-01-021-1/+4
| | | | `rindex`, `rsplit` and `rpartition` methods. Patch by Florent Xicluna.
* Issue #1680159: unicode coercion during an 'in' operation was maskingR. David Murray2009-12-141-1/+3
| | | | | | | any errors that might occur during coercion of the left operand and turning them into a TypeError with a message text that was confusing in the given context. This patch lets any errors through, as was already done during coercion of the right hand side.
* add keyword arguments support to str/unicode encode and decode #6300Benjamin Peterson2009-09-181-0/+8
|
* convert usage of fail* to assert*Benjamin Peterson2009-06-301-63/+63
|
* Issue 6089: str.format raises SystemError.Eric Smith2009-05-231-0/+4
|
* Issue #4426: The UTF-7 decoder was too strict and didn't accept some legal ↵Antoine Pitrou2009-05-041-6/+15
| | | | | | sequences. Patch by Nick Barnes and Victor Stinner.
* Unicode format tests weren't actually testing unicode. This was probably due ↵Eric Smith2009-03-141-51/+51
| | | | to the original backport from py3k.
* Issue 5237, Allow auto-numbered replacement fields in str.format() strings.Eric Smith2009-03-141-3/+33
| | | | | | | | | | | | | | | | | For simple uses for str.format(), this makes the typing easier. Hopfully this will help in the adoption of str.format(). For example: 'The {} is {}'.format('sky', 'blue') You can mix and matcth auto-numbering and named replacement fields: 'The {} is {color}'.format('sky', color='blue') But you can't mix and match auto-numbering and specified numbering: 'The {0} is {}'.format('sky', 'blue') ValueError: cannot switch from manual field specification to automatic field numbering Will port to 3.1.
* #3601: test_unicode.test_raiseMemError fails in UCS4Antoine Pitrou2008-09-051-1/+4
| | | | Reviewed by Benjamin Peterson on IRC.
* #3556: test_raiseMemError consumes an insane amount of memoryAntoine Pitrou2008-08-171-8/+3
|
* Correct a crash when two successive unicode allocations fail with a MemoryError:Amaury Forgeot d'Arc2008-07-311-0/+14
| | | | | | | | | the freelist contained half-initialized objects with freed pointers. The comment /* XXX UNREF/NEWREF interface should be more symmetrical */ was copied from tupleobject.c, and appears in some other places. I sign the petition.
* #2242: utf7 decoding crashes on bogus input on some Windows/MSVC versionsAntoine Pitrou2008-07-251-0/+3
|
* #1477: ur'\U0010FFFF' raised in narrow unicode builds.Amaury Forgeot d'Arc2008-03-231-2/+15
| | | | | Corrected the raw-unicode-escape codec to use UTF-16 surrogates in this case, just like the unicode-escape codec.
* Patch #2167 from calvin: Remove unused importsChristian Heimes2008-02-231-1/+1
|
* Added code to correct combining str and unicode in ''.format(). Added test ↵Eric Smith2008-02-181-0/+9
| | | | case.
* Backport of PEP 3101, Advanced String Formatting, from py3k.Eric Smith2008-02-171-0/+262
| | | | | | | | | | | | | | | Highlights: - Adding PyObject_Format. - Adding string.Format class. - Adding __format__ for str, unicode, int, long, float, datetime. - Adding builtin format. - Adding ''.format and u''.format. - str/unicode fixups for formatters. The files in Objects/stringlib that implement PEP 3101 (stringdefs.h, unicodedefs.h, formatter.h, string_format.h) are identical in trunk and py3k. Any changes from here on should be made to trunk, and changes will propogate to py3k).
* Fix failing unicode test caused by change to ast.c at r56441Kurt B. Kaiser2007-07-181-3/+3
|
* Prevent these tests from running on Win64 since they don\'t apply there eitherNeal Norwitz2007-06-111-2/+2
|
* Prevent expandtabs() on string and unicode objects from causing a segfault whenNeal Norwitz2007-06-091-2/+7
| | | | | | | a large width is passed on 32-bit platforms. Found by Google. It would be good for people to review this especially carefully and verify I don't have an off by one error and there is no other way to cause overflow.
* Standardize on test.test_support.run_unittest() (as opposed to a mix of ↵Collin Winter2007-04-251-1/+1
| | | | run_unittest() and run_suite()). Also, add functionality to run_unittest() that admits usage of unittest.TestLoader.loadTestsFromModule().
* Patch #1541585: fix buffer overrun when performing repr() onNeal Norwitz2006-08-211-0/+4
| | | | | | a unicode string in a build with wide unicode (UCS-4) support. This code could be improved, so add an XXX comment.
* Whitespace normalization.Tim Peters2006-05-031-1/+1
|
* Bug #1473625: stop cPickle making float dumps locale dependent in protocol 0.Georg Brandl2006-04-301-13/+4
| | | | | On the way, add a decorator to test_support to facilitate running single test functions in different locales with automatic cleanup.
* Fixed bug #1459029 - unicode reprs were double-escaped.Anthony Baxter2006-03-301-0/+16
|
* Checkin the test of patch #1400181.Georg Brandl2006-01-201-0/+14
|
* Bug #1379994: Fix *unicode_escape codecs to encode r'\' as r'\\'Hye-Shik Chang2005-12-171-10/+14
| | | | just like string codecs.
* Move registration of the codec search function to the module scopeNeal Norwitz2005-11-241-17/+18
| | | | | | so it is only executed once. Otherwise the same search function is repeated added to the codec search path when regrtest is run with -R and leaks are reported.
* Change the %s format specifier for str objects so that it returns aNeil Schemenauer2005-08-121-0/+4
| | | | | unicode instance if the argument is not an instance of basestring and calling __str__ on the argument returns a unicode instance.
* Make subclasses of int, long, complex, float, and unicode perform typeBrett Cannon2005-04-261-1/+63
| | | | | | | conversion using the proper magic slot (e.g., __int__()). Also move conversion code out of PyNumber_*() functions in the C API into the nb_* function. Applied patch #1109424. Thanks Walter Doewald.
* Move test_bug1001011() to string_tests.MixinStrUnicodeTest so thatWalter Dörwald2004-08-261-1/+2
| | | | | | it can be used for str and unicode. Drop the test for "".join([s]) is s because this is an implementation detail (and doesn't work for unicode)
* SF #989185: Drop unicode.iswide() and unicode.width() and addHye-Shik Chang2004-08-041-2/+1
| | | | | | | | | | | | unicodedata.east_asian_width(). You can still implement your own simple width() function using it like this: def width(u): w = 0 for c in unicodedata.normalize('NFC', u): cwidth = unicodedata.east_asian_width(c) if cwidth in ('W', 'F'): w += 2 else: w += 1 return w
* Let u'%s' % obj try obj.__unicode__() first and fallback to obj.__str__().Marc-André Lemburg2004-07-231-0/+8
|
* Reuse width/iswide tests from strings_test. (Suggested by Walter Dörwald)Hye-Shik Chang2004-06-041-21/+2
|
* Fix typo.Hye-Shik Chang2004-06-041-1/+1
|
* - SF #962502: Add two more methods for unicode type; width() andHye-Shik Chang2004-06-021-0/+20
| | | | | | | iswide() for east asian width manipulation. (Inspired by David Goodger, Reviewed by Martin v. Loewis) - Move _PyUnicode_TypeRecord.flags to the end of the struct so that no padding is added for UCS-4 builds. (Suggested by Martin v. Loewis)
* Fix reallocation bug in unicode.translate(): The code was comparingWalter Dörwald2004-02-051-0/+1
| | | | characters instead of character pointers to determine space requirements.
* Fix for SF bug [ 817156 ] invalid \U escape gives 0=length unistr.Jeremy Hylton2003-10-061-0/+7
|
* Support trailing dots in DNS names. Fixes #782510. Will backport to 2.3.Martin v. Löwis2003-08-051-0/+4
|
* Consider \U-escapes in raw-unicode-escape. Fixes #444514.Martin v. Löwis2003-05-181-0/+7
|
* Combine the functionality of test_support.run_unittest()Walter Dörwald2003-05-011-3/+1
| | | | | | | | | | and test_support.run_classtests() into run_unittest() and use it wherever possible. Also don't use "from test.test_support import ...", but "from test import test_support" in a few spots. From SF patch #662807.