summaryrefslogtreecommitdiffstats
path: root/Lib/test/test_unicode.py
Commit message (Collapse)AuthorAgeFilesLines
* bpo-15999: Clean up of handling boolean arguments. (GH-15610)Serhiy Storchaka2019-09-011-8/+8
| | | | | | * Use the 'p' format unit instead of manually called PyObject_IsTrue(). * Pass boolean value instead 0/1 integers to functions that needs boolean. * Convert some arguments to boolean only once.
* bpo-36502: Correct documentation of str.isspace() (GH-15019)Greg Price2019-08-141-1/+12
| | | | | | | | | | | | | | | | | | The documented definition was much broader than the real one: there are tons of characters with general category "Other", and we don't (and shouldn't) treat most of them as whitespace. Rewrite the definition to agree with the comment on _PyUnicode_IsWhitespace, and with the logic in makeunicodedata.py, which is what generates that function and so ultimately governs. Add suitable breadcrumbs so that a reader who wants to pin down exactly what this definition means (what's a "bidirectional class" of "B"?) can do so. The `unicodedata` module documentation is an appropriate central place for our references to Unicode's own copious documentation, so point there. Also add to the isspace() test a thorough check that the implementation agrees with the intended definition.
* bpo-37476: Adding tests for asutf8 and asutf8andsize (GH-14531)Hai Shi2019-07-201-0/+28
|
* bpo-37388: Development mode check encoding and errors (GH-14341)Victor Stinner2019-06-251-0/+62
| | | | | | | | | In development mode and in debug build, encoding and errors arguments are now checked on string encoding and decoding operations. Examples: open(), str.encode() and bytes.decode(). By default, for best performances, the errors argument is only checked at the first encoding/decoding error, and the encoding argument is sometimes ignored for empty strings.
* bpo-36549: str.capitalize now titlecases the first character instead of ↵Kingsley M2019-04-121-1/+1
| | | | uppercasing it (GH-12804)
* bpo-36297: remove "unicode_internal" codec (GH-12342)Inada Naoki2019-03-181-22/+14
|
* bpo-33817: Fix _PyBytes_Resize() for empty bytes object. (GH-11516)Serhiy Storchaka2019-01-121-0/+6
| | | | Add also tests for PyUnicode_FromFormat() and PyBytes_FromFormat() with empty result.
* Revert "bpo-34595: Add %T format to PyUnicode_FromFormatV() (GH-9080)" (GH-9187)Victor Stinner2018-09-111-4/+0
| | | This reverts commit 886483e2b9bbabf60ab769683269b873381dd5ee.
* bpo-34595: Add %T format to PyUnicode_FromFormatV() (GH-9080)Victor Stinner2018-09-071-0/+4
| | | | | | | | | * Add %T format to PyUnicode_FromFormatV(), and so to PyUnicode_FromFormat() and PyErr_Format(), to format an object type name: equivalent to "%s" with Py_TYPE(obj)->tp_name. * Replace Py_TYPE(obj)->tp_name with %T format in unicodeobject.c. * Add unit test on %T format. * Rename unicode_fromformat_write_cstr() to unicode_fromformat_write_utf8(), to make the intent more explicit.
* bpo-22602: Raise an exception in the UTF-7 decoder for ill-formed sequences ↵Zackery Spytz2018-08-191-0/+4
| | | | | | | starting with "+". (GH-8741) The UTF-7 decoder now raises UnicodeDecodeError for ill-formed sequences starting with "+" (as specified in RFC 2152).
* bpo-32677: Add .isascii() to str, bytes and bytearray (GH-5342)INADA Naoki2018-01-271-0/+5
|
* bpo-31979: Simplify transforming decimals to ASCII (#4336)Serhiy Storchaka2017-11-131-5/+8
| | | | | in int(), float() and complex() parsers. This also speeds up parsing non-ASCII numbers by around 20%.
* bpo-30978: str.format_map() now passes key lookup exceptions through. (#2790)Serhiy Storchaka2017-08-031-0/+7
| | | Previously any exception was replaced with a KeyError exception.
* bpo-29919: Remove unused imports found by pyflakes (#137)Victor Stinner2017-03-271-1/+0
| | | Make also minor PEP8 coding style fixes on modified imports.
* bpo-28598: Support __rmod__ for RHS subclasses of str in % string formatting ↵Martijn Pieters2017-02-231-0/+9
| | | | | | | | operations (#51) When you use `'%s' % SubClassOfStr()`, where `SubClassOfStr.__rmod__` exists, the reverse operation is ignored as normally such string formatting operations use the `PyUnicode_Format()` fast path. This patch tests for subclasses of `str` first and picks the slow path in that case. Patch by Martijn Pieters.
* Issue #29145: Merge test from 3.6Martin Panter2017-01-141-0/+7
|\
| * Merge tests from 3.5Martin Panter2017-01-141-0/+7
| |\
| | * Issues #1621, #29145: Test for str.join() overflowMartin Panter2017-01-121-0/+7
| | |
* | | Issue #28992: Use bytes.fromhex().Serhiy Storchaka2016-12-211-7/+4
| | |
* | | Issue #28822: Adjust indices handling of PyUnicode_FindChar().Xiang Zhang2016-12-201-0/+23
|/ /
* | Merge spelling and grammar from 3.5Martin Panter2016-12-181-1/+1
|\ \ | |/
| * Fix spelling and grammar in code comments and documentationMartin Panter2016-12-181-1/+1
| |
* | Issue 28128: Print out better error/warning messages for invalid string ↵Eric V. Smith2016-10-311-7/+0
| | | | | | | | escapes. Backport to 3.6.
* | Merge from 3.5.Serhiy Storchaka2016-10-081-1/+44
|\ \ | |/
| * Issue #28379: Added sanity checks and tests for PyUnicode_CopyCharacters().Serhiy Storchaka2016-10-081-1/+44
| | | | | | | | Patch by Xiang Zhang.
* | test_invalid_sequences seems don't have to stay in CAPITest.Serhiy Storchaka2016-10-021-7/+7
| | | | | | | | Reported by Xiang Zhang.
* | Issue #28295: Fixed the documentation and added tests for PyUnicode_AsUCS4().Serhiy Storchaka2016-10-021-0/+17
|\ \ | |/ | | | | Original patch by Xiang Zhang.
| * Issue #28295: Fixed the documentation and added tests for PyUnicode_AsUCS4().Serhiy Storchaka2016-10-021-0/+17
| | | | | | | | Original patch by Xiang Zhang.
* | Moved Unicode C API related tests to separate test class.Serhiy Storchaka2016-10-021-114/+117
|\ \ | |/
| * Moved Unicode C API related tests to separate test class.Serhiy Storchaka2016-10-021-114/+117
| |
* | #27364: Deprecate invalid escape strings in str/byutes.R David Murray2016-09-081-0/+7
| | | | | | | | Patch by Emanuel Barry, reviewed by Serhiy Storchaka and Martin Panter.
* | #27364: fix "incorrect" uses of escape character in the stdlib.R David Murray2016-09-081-2/+2
| | | | | | | | | | | | | | And most of the tools. Patch by Emanual Barry, reviewed by me, Serhiy Storchaka, and Martin Panter.
* | Anti-registration of various ABC methods.Guido van Rossum2016-08-181-0/+23
|/ | | | | | | | | | - Issue #25958: Support "anti-registration" of special methods from various ABCs, like __hash__, __iter__ or __len__. All these (and several more) can be set to None in an implementation class and the behavior will be as if the method is not defined at all. (Previously, this mechanism existed only for __hash__, to make mutable classes unhashable.) Code contributed by Andrew Barnert and Ivan Levkivskyi.
* Correct “an” → “a” with “Unicode”, “user”, “UTF”, etcMartin Panter2016-04-151-1/+1
| | | | This affects documentation, code comments, and a debugging messages.
* Issue #26712: Unify (r)split, (l/r)strip tests into string_testsMartin Panter2016-04-101-4/+0
| | | | This eliminates a few redundant test cases.
* Issue #26257: Eliminate buffer_tests.py and fix ByteArrayAsStringTestMartin Panter2016-04-061-8/+8
| | | | | | | | | | | | | | | | ByteArrayAsStringTest.fixtype() was converting test data to bytes, not byte- array, therefore many of the test cases inherited in this class were not actually being run on the bytearray type. The tests in buffer_tests.py were redundant with methods in string_tests .MixinStrUnicodeUserStringTest and string_tests.CommonTest. These methods are now moved into string_tests.BaseTest, where they will also get run for bytes and bytearray. This change also moves test_additional_split(), test_additional_rsplit(), and test_strip() from CommonTest to BaseTest, meaning these tests are now run for bytes and bytearray. I plan to eliminate redundancies with existing tests in test_bytes.py soon.
* Issue #26494: Fixed crash on iterating exhausting iterators.Serhiy Storchaka2016-03-301-0/+4
| | | | | | Affected classes are generic sequence iterators, iterators of str, bytes, bytearray, list, tuple, set, frozenset, dict, OrderedDict, corresponding views and os.scandir() iterator.
* Issue #26464: Fix unicode_fast_translate() againVictor Stinner2016-03-011-4/+10
| | | | Initialize i variable if the string is non-ASCII.
* Fix str.translate()Victor Stinner2016-03-011-0/+4
| | | | | | Issue #26464: Fix str.translate() when string is ASCII and first replacements removes character, but next replacement uses a non-ASCII character or a string longer than 1 character. Regression introduced in Python 3.5.0.
* Issue #25709: Fixed problem with in-place string concatenation and utf-8 cache.Serhiy Storchaka2015-12-021-0/+17
|\
| * Issue #25709: Fixed problem with in-place string concatenation and utf-8 cache.Serhiy Storchaka2015-12-021-0/+17
| |
* | Issue #24731: Fixed crash on converting objects with special methodsSerhiy Storchaka2015-11-251-4/+7
|\ \ | |/ | | | | | | __bytes__, __trunc__, and __float__ returning instances of subclasses of bytes, int, and float to subclasses of bytes, int, and float correspondingly.
| * Issue #24731: Fixed crash on converting objects with special methodsSerhiy Storchaka2015-11-251-4/+7
| | | | | | | | | | __bytes__, __trunc__, and __float__ returning instances of subclasses of bytes, int, and float to subclasses of bytes, int, and float correspondingly.
* | Issue #22643: Skip test_case_operation_overflow on computers with low memory.Serhiy Storchaka2015-11-071-1/+9
|\ \ | |/
| * Issue #22643: Skip test_case_operation_overflow on computers with low memory.Serhiy Storchaka2015-11-071-1/+9
| |
* | Issue #24848: Fixed bugs in UTF-7 decoding of misformed data:Serhiy Storchaka2015-10-021-1/+2
|\ \ | |/ | | | | | | | | | | 1. Non-ASCII bytes were accepted after shift sequence. 2. A low surrogate could be emitted in case of error in high surrogate. 3. In some circumstances the '\xfd' character was produced instead of the replacement character '\ufffd' (due to a bug in _PyUnicodeWriter).
| * Issue #24848: Fixed bugs in UTF-7 decoding of misformed data:Serhiy Storchaka2015-10-021-1/+2
| | | | | | | | | | 1. Non-ASCII bytes were accepted after shift sequence. 2. A low surrogate could be emitted in case of error in high surrogate.
* | Issue #22681: Added support for the koi8_t encoding.Serhiy Storchaka2015-05-121-3/+4
| |
* | Issue #22682: Added support for the kz1048 encoding.Serhiy Storchaka2015-05-121-2/+2
| |
* | Added explicit tests for issue #23803.Serhiy Storchaka2015-03-291-0/+2
|\ \ | |/