summaryrefslogtreecommitdiffstats
path: root/Lib/test/test_unicodedata.py
Commit message (Collapse)AuthorAgeFilesLines
* closes bpo-45190: Update Unicode data to version 14.0.0. (GH-28336)Benjamin Peterson2021-09-141-2/+2
|
* bpo-43988: Use check disallow instantiation helper (GH-26392)Erlend Egeberg Aasland2021-05-271-2/+2
|
* bpo-43916: Apply Py_TPFLAGS_DISALLOW_INSTANTIATION to selected types (GH-25748)Erlend Egeberg Aasland2021-04-301-1/+7
| | | | | | | | | | | | | | | | | | | | | Apply Py_TPFLAGS_DISALLOW_INSTANTIATION to the following types: * _dbm.dbm * _gdbm.gdbm * _multibytecodec.MultibyteCodec * _sre..SRE_Scanner * _thread._localdummy * _thread.lock * _winapi.Overlapped * array.arrayiterator * functools.KeyWrapper * functools._lru_list_elem * pyexpat.xmlparser * re.Match * re.Pattern * unicodedata.UCD * zlib.Compress * zlib.Decompress
* bpo-43144: Mark unicodedata's test_normalization as requiring network (GH-24650)Ammar Askar2021-02-261-0/+1
| | | Co-authored-by: Arkadiusz Miśkiewicz <arekm@maven.pl>
* bpo-39926: Update unicodedata checksum tests for Unicode 13.0 update. (GH-18913)Benjamin Peterson2020-03-111-2/+2
| | | I forget these tests required the cpu resource.
* Update some www.unicode.org URLs to use HTTPS. (GH-18912)Benjamin Peterson2020-03-111-1/+1
|
* closes bpo-37758: Extend unicodedata checksum tests to cover all of Unicode. ↵Greg Price2019-09-121-5/+8
| | | | | | | | | | | | | | (GH-15125) Unicode has grown since Python first gained support for it, when Unicode itself was still rather new. This pair of test cases was added in commit 6a20ee7de back in 2000, and they haven't needed to change much since then. But do change them to look beyond the Basic Multilingual Plane (range(0x10000)) and cover all 17 planes of Unicode's final form. This adds about 5 seconds to the test suite's runtime. Mark the tests as CPU-using accordingly.
* bpo-38043: Move unicodedata.normalize tests into test_unicodedata. (GH-15712)Greg Price2019-09-101-11/+99
| | | | | | | | | | | | | | | | | | | | | | Having these in a separate file from the one that's named after the module in the usual way makes it very easy to miss them when looking for tests for these two functions. (In fact when working recently on is_normalized, I'd been surprised to see no tests for it here and concluded the function had evaded being tested at all. I'd gone as far as to write up some tests myself before I spotted this other file.) Mostly this just means moving all the one file's code into the other, and moving code from the module toplevel to inside the test class to keep it tidily separate from the rest of the file's code. There's one substantive change, which reduces by a bit the amount of code to be moved: we drop the `x > sys.maxunicode` conditional and all the `RangeError` logic behind it. Now if that condition ever occurs it will cause an error at `chr(x)`, and a test failure. That's the right result because, since PEP 393 in Python 3.3, there is no longer such a thing as an "unsupported character".
* closes bpo-37966: Fully implement the UAX #15 quick-check algorithm. (GH-15558)Greg Price2019-09-041-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The purpose of the `unicodedata.is_normalized` function is to answer the question `str == unicodedata.normalized(form, str)` more efficiently than writing just that, by using the "quick check" optimization described in the Unicode standard in UAX #15. However, it turns out the code doesn't implement the full algorithm from the standard, and as a result we often miss the optimization and end up having to compute the whole normalized string after all. Implement the standard's algorithm. This greatly speeds up `unicodedata.is_normalized` in many cases where our partial variant of quick-check had been returning MAYBE and the standard algorithm returns NO. At a quick test on my desktop, the existing code takes about 4.4 ms/MB (so 4.4 ns per byte) when the partial quick-check returns MAYBE and it has to do the slow normalize-and-compare: $ build.base/python -m timeit -s 'import unicodedata; s = "\uf900"*500000' \ -- 'unicodedata.is_normalized("NFD", s)' 50 loops, best of 5: 4.39 msec per loop With this patch, it gets the answer instantly (58 ns) on the same 1 MB string: $ build.dev/python -m timeit -s 'import unicodedata; s = "\uf900"*500000' \ -- 'unicodedata.is_normalized("NFD", s)' 5000000 loops, best of 5: 58.2 nsec per loop This restores a small optimization that the original version of this code had for the `unicodedata.normalize` use case. With this, that case is actually faster than in master! $ build.base/python -m timeit -s 'import unicodedata; s = "\u0338"*500000' \ -- 'unicodedata.normalize("NFD", s)' 500 loops, best of 5: 561 usec per loop $ build.dev/python -m timeit -s 'import unicodedata; s = "\u0338"*500000' \ -- 'unicodedata.normalize("NFD", s)' 500 loops, best of 5: 512 usec per loop
* bpo-37758: Clean out vestigial script-bits from test_unicodedata. (GH-15126)Greg Price2019-08-131-17/+5
| | | | | | | | | | | This file started life as a script, before conversion to a `unittest` test file. Clear out some legacies of that conversion that are a bit confusing about how it works. Most notably, it's unlikely there's still a good reason to try to recover from `unicodedata` failing to import -- as there was when that logic was first added, when the module was very new. So take that out entirely. Keep `self.db` working, though, to avoid a noisy diff.
* closes bpo-36861: Update Unicode database to 12.1.0. (GH-13214)Benjamin Peterson2019-05-091-1/+1
| | | Adds ㋿.
* closes bpo-33376: Update to Unicode 12.0.0. (GH-12256)Benjamin Peterson2019-03-101-2/+2
|
* bpo-29456: Fix bugs in unicodedata.normalize: u1176, u11a7 and u11c3 (GH-1958)Wonsup Yoon2018-06-151-0/+13
| | | | | Hangul composition check boundaries are wrong for the second character ([0x1161, 0x1176) instead of [0x1161, 0x1176]) and third character ((0x11A7, 0x11C3) instead of [0x11A7, 0x11C3]).
* update to Unicode 11.0.0 (closes bpo-33778) (GH-7439)Benjamin Peterson2018-06-071-2/+2
| | | Also, standardize indentation of generated tables.
* bpo-30736: upgrade to Unicode 10.0 (#2344)Benjamin Peterson2017-06-231-2/+2
| | | Straightforward. While we're at it, though, strip trailing whitespace from generated tables.
* Unicode 9.0.0Benjamin Peterson2016-09-151-2/+6
| | | | | Not completely mechanical since support for East Asian Width changes—emoji codepoints became Wide—had to be added to unicodedata.
* Issue #23981: Update test_unicodedata to use script_helpersBerker Peksag2015-10-221-10/+5
| | | | Patch by Christie.
* upgrade to Unicode 8.0.0Benjamin Peterson2015-06-271-2/+2
|
* Issue #21741: Update 147 test modules to use test discovery.Zachary Ware2015-04-131-8/+1
| | | | | | | I have compared output between pre- and post-patch runs of these tests to make sure there's nothing missing and nothing broken, on both Windows and Linux. The only differences I found were actually tests that were previously *not* run.
* for some reason, you don't get the right checksum from an incremental buildBenjamin Peterson2014-07-071-2/+3
|
* upgrade to unicode 7.0.0Benjamin Peterson2014-07-061-2/+2
|
* Fix expected checksum for new unicodedata (after full rebuild)Antoine Pitrou2013-10-111-1/+1
|
* upgrade unicode db to 6.3.0 (closes #19221)Benjamin Peterson2013-10-101-1/+1
|
* upgrade to UCD 6.2Benjamin Peterson2012-09-291-1/+1
|
* update to Unicode 6.1Benjamin Peterson2012-02-211-2/+2
|
* use full unicode mappings for upper/lower/title case (#12736)Benjamin Peterson2012-01-111-1/+1
| | | | Also broaden the category of characters that count as lowercase/uppercase.
* Move UCS4-specific tests with the "normal" tests.Ezio Melotti2011-09-291-8/+4
|
* Issue #10254: Fixed a crash and a regression introduced by the ↵Alexander Belopolsky2010-12-231-1/+14
| | | | implementation of PRI 29.
* #9424: Replace deprecated assert* methods in the Python test suite.Ezio Melotti2010-11-201-1/+1
|
* Fix resource warning in test_unicodedata. Patch by Brian Brazil.Antoine Pitrou2010-10-301-0/+1
|
* Upgrade to Unicode 6.0.0.Martin v. Löwis2010-10-111-2/+2
| | | | | | | | makeunicodedata.py: download all data files from unicode.org, switch to extracting Unihan data from zip file. Read linebreakprops and derivednormalizationprops even for old versions, even though they are not used in delta records. test:unicode.py: U+11000 is now assigned, use U+14000 instead.
* Add more tests to unicodedata with large code pointsAmaury Forgeot d'Arc2010-08-181-0/+2
| | | | (the other functions where not affected by the recent change)
* Fix stupid typo in test.Amaury Forgeot d'Arc2010-08-181-2/+2
|
* #5127: Even on narrow unicode builds, the C functions that access the UnicodeAmaury Forgeot d'Arc2010-08-181-0/+6
| | | | | | | | | | Database (Py_UNICODE_TOLOWER, Py_UNICODE_ISDECIMAL, and others) now accept and return characters from the full Unicode range (Py_UCS4). The differences from Python code are few: - unicodedata.numeric(), unicodedata.decimal() and unicodedata.digit() now return the correct value for large code points - repr() may consider more characters as printable.
* Issue #9337: Make float.__str__ identical to float.__repr__.Mark Dickinson2010-08-041-5/+4
| | | | (And similarly for complex numbers.)
* Merged revisions 79494,79496 via svnmerge fromFlorent Xicluna2010-03-301-1/+12
| | | | | | | | | | | | | | svn+ssh://pythondev@svn.python.org/python/trunk ........ r79494 | florent.xicluna | 2010-03-30 10:24:06 +0200 (mar, 30 mar 2010) | 2 lines #7643: Unicode codepoints VT (0x0B) and FF (0x0C) are linebreaks according to Unicode Standard Annex #14. ........ r79496 | florent.xicluna | 2010-03-30 18:29:03 +0200 (mar, 30 mar 2010) | 2 lines Highlight the change of behavior related to r79494. Now VT and FF are linebreaks. ........
* Fixed a failure in test_bigmem.Florent Xicluna2010-03-191-2/+2
| | | | | | | | | | | Merged revision 79059 via svnmerge from svn+ssh://pythondev@svn.python.org/python/trunk ........ r79059 | florent.xicluna | 2010-03-18 22:50:06 +0100 (jeu, 18 mar 2010) | 2 lines Issue #8024: Update the Unicode database to 5.2 ........
* Revert Unicode UCD 5.2 upgrade in 3.x. It broke repr() for unicode objects, ↵Florent Xicluna2010-03-191-2/+2
| | | | and gave failures in test_bigmem. Revert 79062, 79065 and 79083.
* Fix bad unicodedata checksum merge from trunk in r79062Florent Xicluna2010-03-191-1/+1
|
* Merged revisions 79059 via svnmerge fromFlorent Xicluna2010-03-181-2/+2
| | | | | | | | | | svn+ssh://pythondev@svn.python.org/python/trunk ........ r79059 | florent.xicluna | 2010-03-18 22:50:06 +0100 (jeu, 18 mar 2010) | 2 lines Issue #8024: Update the Unicode database to 5.2 ........
* oops, fix the test of my previous commit about unicodedata and PR #29 (r78647)Victor Stinner2010-03-041-1/+1
|
* Merged revisions 78646 via svnmerge fromVictor Stinner2010-03-041-0/+5
| | | | | | | | | | | | | svn+ssh://pythondev@svn.python.org/python/trunk ........ r78646 | victor.stinner | 2010-03-04 13:09:33 +0100 (jeu., 04 mars 2010) | 5 lines Issue #1054943: Fix unicodedata.normalize('NFC', text) for the Public Review Issue #29. PR #29 was released in february 2004! ........
* use assert[Not]In where appropriateBenjamin Peterson2010-01-191-1/+1
| | | | A patch from Dave Malcolm.
* Merged revisions 75272-75273 via svnmerge fromAmaury Forgeot d'Arc2009-10-061-2/+3
| | | | | | | | | | | | | | | | | svn+ssh://pythondev@svn.python.org/python/trunk ........ r75272 | amaury.forgeotdarc | 2009-10-06 21:56:32 +0200 (mar., 06 oct. 2009) | 5 lines #1571184: makeunicodedata.py now generates the functions _PyUnicode_ToNumeric, _PyUnicode_IsLinebreak and _PyUnicode_IsWhitespace. It now also parses the Unihan.txt for numeric values. ........ r75273 | amaury.forgeotdarc | 2009-10-06 22:02:09 +0200 (mar., 06 oct. 2009) | 2 lines Add Anders Chrigstrom to Misc/ACKS for his work on unicodedata. ........
* convert old fail* assertions to assert*Benjamin Peterson2009-06-301-7/+7
|
* Rename the surrogates handler to surrogatepass.Martin v. Löwis2009-05-101-1/+1
|
* Issue #3672: Reject surrogates in utf-8 codec; add surrogates errorMartin v. Löwis2009-05-021-1/+2
| | | | handler.
* Merged revisions 71972 via svnmerge fromWalter Dörwald2009-04-261-1/+1
| | | | | | | | | | svn+ssh://pythondev@svn.python.org/python/trunk ........ r71972 | walter.doerwald | 2009-04-26 21:11:43 +0200 (So, 26 Apr 2009) | 2 lines Fix typo. ........
* Merged revisions 71947 via svnmerge fromMartin v. Löwis2009-04-261-1/+6
| | | | | | | | | | | svn+ssh://pythondev@svn.python.org/python/trunk ........ r71947 | martin.v.loewis | 2009-04-26 02:53:18 +0200 (So, 26 Apr 2009) | 3 lines Issue #4971: Fix titlecase for characters that are their own titlecase, but not their own uppercase. ........
* Merged revisions 71894 via svnmerge fromWalter Dörwald2009-04-251-1/+14
| | | | | | | | | | | | svn+ssh://pythondev@svn.python.org/python/trunk ........ r71894 | walter.doerwald | 2009-04-25 16:03:16 +0200 (Sa, 25 Apr 2009) | 4 lines Issue #5828 (Invalid behavior of unicode.lower): Fixed bogus logic in makeunicodedata.py and regenerated the Unicode database (This fixes u'\u1d79'.lower() == '\x00'). ........