Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | gh-96954: use a directed acyclic word graph for storing the unicodedata ↵ | CF Bolz-Tereick | 2023-11-04 | 2 | -218/+547 |
| | | | | | | | codepoint names (#97906) Co-authored-by: Łukasz Langa <lukasz@langa.pl> Co-authored-by: Pieter Eendebak <pieter.eendebak@gmail.com> Co-authored-by: Dennis Sweeney <36520290+sweeneyde@users.noreply.github.com> | ||||
* | Code: Update Donghee Na's name (#109744) | Hugo van Kemenade | 2023-09-25 | 4 | -4/+4 |
| | |||||
* | fixes gh-109559: Update `unicodedata` for Unicode 15.1.0 (GH-109560) | James Gerity | 2023-09-20 | 1 | -12/+17 |
| | | | | | --------- Co-authored-by: Benjamin Peterson <benjamin@python.org> | ||||
* | bpo-47243: Duplicate entry in 'Objects/unicodetype_db.h' (GH-32376) | LiarPrincess | 2022-09-28 | 1 | -1/+1 |
| | | | | | | | | | | | | | | | | | Fix for duplicate 1st entry in 'Objects/unicodetype_db.h': ```c /* a list of unique character type descriptors */ const _PyUnicode_TypeRecord _PyUnicode_TypeRecords[] = { {0, 0, 0, 0, 0, 0}, {0, 0, 0, 0, 0, 0}, <--- HERE {0, 0, 0, 0, 0, 32}, {0, 0, 0, 0, 0, 48}, … ``` https://bugs.python.org/issue47243 Automerge-Triggered-By: GH:isidentical | ||||
* | closes gh-96734: Update to Unicode 15.0.0. (GH-96809) | Benjamin Peterson | 2022-09-13 | 1 | -2/+3 |
| | |||||
* | GH-96172 fix unicodedata.east_asian_width being wrong on unassigned code ↵ | Carl Friedrich Bolz-Tereick | 2022-08-26 | 1 | -7/+25 |
| | | | | points (#96207) | ||||
* | gh-96019: Fix caching of decompositions in makeunicodedata (GH-96020) | Carl Friedrich Bolz-Tereick | 2022-08-19 | 1 | -3/+7 |
| | |||||
* | gh-84508: tool to generate cjk traditional chinese mappings (gh-93272) | Davide Rizzo | 2022-06-11 | 1 | -0/+239 |
| | |||||
* | Revert "gh-84508: Add mapping files for Korean and Japanese. (gh-93309)" ↵ | Dong-hee Na | 2022-05-29 | 4 | -38406/+0 |
| | | | | | (#93320) This reverts commit dec1e9346d82fa4a4761389c81d36ef9d01f332b. | ||||
* | gh-84508: Add mapping files for Korean and Japanese. (gh-93309) | Dong-hee Na | 2022-05-28 | 4 | -0/+38406 |
| | |||||
* | closes bpo-45190: Update Unicode data to version 14.0.0. (GH-28336) | Benjamin Peterson | 2021-09-14 | 1 | -4/+4 |
| | |||||
* | bpo-40328: Add tool for generating cjk mapping headers (GH-19602) | Dong-hee Na | 2020-04-29 | 9 | -0/+51008 |
| | |||||
* | Update some www.unicode.org URLs to use HTTPS. (GH-18912) | Benjamin Peterson | 2020-03-11 | 1 | -2/+2 |
| | |||||
* | closes bpo-39926: Update Unicode to 13.0.0. (GH-18910) | Benjamin Peterson | 2020-03-11 | 1 | -4/+5 |
| | |||||
* | bpo-37760: Convert from length-18 lists to a dataclass, in makeunicodedata. ↵ | Greg Price | 2019-09-12 | 1 | -62/+88 |
| | | | | | | | | | (GH-15265) Now the fields have names! Much easier to keep straight as a reader than the elements of an 18-tuple. Runs about 10-15% slower: from 10.8s to 12.3s, on my laptop. Fortunately that's perfectly fine for this maintenance script. | ||||
* | bpo-37758: Cut always-constant conditionals on sys.maxunicode. (GH-15302) | Greg Price | 2019-09-09 | 1 | -4/+1 |
| | | | | | | | | | Since PEP 393 in Python 3.3, this value is always 0x10ffff, the maximum codepoint in Unicode; there's no longer such a thing as a UCS-2 build of Python, which couldn't properly represent some characters. There are a couple of spots left where we still condition on the value of this constant. Take them out. | ||||
* | bpo-37760: Avoid cluttering work tree with downloaded Unicode files. (GH-15128) | Greg Price | 2019-08-15 | 1 | -2/+5 |
| | |||||
* | bpo-37760: Factor out standard range-expanding logic in makeunicodedata. ↵ | Greg Price | 2019-08-14 | 1 | -33/+35 |
| | | | | | | | | (GH-15248) Much like the lower-level logic in commit ef2af1ad4, we had 4 copies of this logic, written in a couple of different ways. They're all implementing the same standard, so write it just once. | ||||
* | bpo-37760: Constant-fold some old options in makeunicodedata. (GH-15129) | Greg Price | 2019-08-13 | 1 | -24/+20 |
| | | | | | | | | The `expand` option was introduced in 2000 in commit fad27aee1. It appears to have been always set since it was committed, and what it does is tell the code to do something essential. So, just always do that, and cut the option. Also cut the `linebreakprops` option, which isn't consulted anymore. | ||||
* | bpo-37760: Factor out the basic UCD parsing logic of makeunicodedata. (GH-15130) | Greg Price | 2019-08-13 | 1 | -133/+109 |
| | | | | | | There were 10 copies of this, and almost as many distinct versions of exactly how it was written. They're all implementing the same standard. Pull them out to the top, so the more interesting logic that remains becomes easier to read. | ||||
* | Clean up and reduce visual clutter in the makeunicode.py script. (GH-7558) | Stefan Behnel | 2019-06-01 | 1 | -263/+275 |
| | |||||
* | closes bpo-36861: Update Unicode database to 12.1.0. (GH-13214) | Benjamin Peterson | 2019-05-09 | 1 | -1/+1 |
| | | | Adds ㋿. | ||||
* | bpo-36642: make unicodedata const (GH-12855) | Inada Naoki | 2019-04-16 | 1 | -1/+1 |
| | |||||
* | bpo-22831: Use "with" to avoid possible fd leaks in tools (part 2). (GH-10927) | Serhiy Storchaka | 2019-03-30 | 2 | -12/+10 |
| | |||||
* | closes bpo-33376: Update to Unicode 12.0.0. (GH-12256) | Benjamin Peterson | 2019-03-10 | 1 | -1/+1 |
| | |||||
* | update to Unicode 11.0.0 (closes bpo-33778) (GH-7439) | Benjamin Peterson | 2018-06-07 | 1 | -20/+19 |
| | | | Also, standardize indentation of generated tables. | ||||
* | bpo-30736: upgrade to Unicode 10.0 (#2344) | Benjamin Peterson | 2017-06-23 | 1 | -4/+5 |
| | | | Straightforward. While we're at it, though, strip trailing whitespace from generated tables. | ||||
* | bpo-27425: Be more explicit in .gitattributes (GH-840) | Zachary Ware | 2017-06-10 | 1 | -7/+7 |
| | | | Updates checked-in line endings on several files. | ||||
* | bpo-30296 Remove unnecessary tuples, lists, sets, and dicts (#1489) | Jon Dufresne | 2017-05-18 | 1 | -1/+1 |
| | | | | | | | | * Replaced list(<generator expression>) with list comprehension * Replaced dict(<generator expression>) with dict comprehension * Replaced set(<list literal>) with set literal * Replaced builtin func(<list comprehension>) with func(<generator expression>) when supported (e.g. any(), all(), tuple(), min(), & max()) | ||||
* | Unicode 9.0.0 | Benjamin Peterson | 2016-09-15 | 1 | -3/+8 |
| | | | | | Not completely mechanical since support for East Asian Width changes—emoji codepoints became Wide—had to be added to unicodedata. | ||||
* | #27364: fix "incorrect" uses of escape character in the stdlib. | R David Murray | 2016-09-08 | 1 | -5/+5 |
| | | | | | | | And most of the tools. Patch by Emanual Barry, reviewed by me, Serhiy Storchaka, and Martin Panter. | ||||
* | upgrade to Unicode 8.0.0 | Benjamin Peterson | 2015-06-27 | 1 | -3/+4 |
| | |||||
* | Issue #16261: Converted some bare except statements to except statements | Serhiy Storchaka | 2015-05-20 | 1 | -1/+1 |
| | | | | with specified exception type. Original patch by Ramchandra Apte. | ||||
* | Closes #17202: Merge with 3.4 | Zachary Ware | 2015-04-13 | 1 | -7/+7 |
|\ | |||||
| * | Issue #17202: Add .bat to .hgeol to force them to CRLF. | Zachary Ware | 2015-04-13 | 1 | -7/+7 |
| | | | | | | | | | | Using LF can a script to fail if it tries to use a label that is split across 512 byte blocks. Who knows why. | ||||
* | | Issue #23181: More "codepoint" -> "code point". | Serhiy Storchaka | 2015-01-18 | 1 | -1/+1 |
|\ \ | |/ | |||||
| * | Issue #23181: More "codepoint" -> "code point". | Serhiy Storchaka | 2015-01-18 | 1 | -1/+1 |
| | | |||||
* | | Merge: #18176: Change generic UCD PropList link to version specific link. | R David Murray | 2014-10-10 | 1 | -1/+1 |
|\ \ | |/ | |||||
| * | #18176: Change generic UCD PropList link to version specific link. | R David Murray | 2014-10-10 | 1 | -1/+1 |
| | | |||||
* | | Merge: #18176: fix another reference and add it to the makeunicodedata comment. | R David Murray | 2014-10-09 | 1 | -0/+1 |
|\ \ | |/ | |||||
| * | #18176: fix another reference and add it to the makeunicodedata comment. | R David Murray | 2014-10-09 | 1 | -0/+1 |
| | | |||||
* | | Merge: #18176: updated stdtypes UCD link, added reminder to makeunicodedata. | R David Murray | 2014-10-09 | 1 | -0/+4 |
|\ \ | |/ | |||||
| * | #18176: updated stdtypes UCD link, added reminder to makeunicodedata. | R David Murray | 2014-10-09 | 1 | -0/+4 |
| | | | | | | | | Patch by Alexander Belopolsky. | ||||
* | | upgrade to unicode 7.0.0 | Benjamin Peterson | 2014-07-06 | 1 | -1/+1 |
|/ | |||||
* | Issue #19936: Added executable bits or shebang lines to Python scripts which | Serhiy Storchaka | 2014-01-16 | 1 | -0/+0 |
|\ | | | | | | | | | | | | | requires them. Disable executable bits and shebang lines in test and benchmark files in order to prevent using a random system python, and in source files of modules which don't provide command line interface. Fixed shebang lines in the unittestgui and checkpip scripts. | ||||
| * | Issue #19936: Added executable bits or shebang lines to Python scripts which | Serhiy Storchaka | 2014-01-16 | 1 | -0/+0 |
| | | | | | | | | | | | | | | requires them. Disable executable bits and shebang lines in test and benchmark files in order to prevent using a random system python, and in source files of modules which don't provide command line interface. Fixed shebang line to use python3 executable in the unittestgui script. | ||||
* | | #1097797: add the original mapping file | Andrew Kuchling | 2013-11-11 | 1 | -0/+258 |
| | | |||||
* | | Fix some PEP8-formatting problems in the generated code | Andrew Kuchling | 2013-11-11 | 1 | -9/+9 |
| | | |||||
* | | upgrade unicode db to 6.3.0 (closes #19221) | Benjamin Peterson | 2013-10-10 | 1 | -2/+2 |
| | | |||||
* | | #18803: merge with 3.3. | Ezio Melotti | 2013-08-25 | 1 | -1/+1 |
|\ \ | |/ |