summaryrefslogtreecommitdiffstats
path: root/Tools/unicode
Commit message (Collapse)AuthorAgeFilesLines
* gh-96954: use a directed acyclic word graph for storing the unicodedata ↵CF Bolz-Tereick2023-11-042-218/+547
| | | | | | | codepoint names (#97906) Co-authored-by: Łukasz Langa <lukasz@langa.pl> Co-authored-by: Pieter Eendebak <pieter.eendebak@gmail.com> Co-authored-by: Dennis Sweeney <36520290+sweeneyde@users.noreply.github.com>
* Code: Update Donghee Na's name (#109744)Hugo van Kemenade2023-09-254-4/+4
|
* fixes gh-109559: Update `unicodedata` for Unicode 15.1.0 (GH-109560)James Gerity2023-09-201-12/+17
| | | | | --------- Co-authored-by: Benjamin Peterson <benjamin@python.org>
* bpo-47243: Duplicate entry in 'Objects/unicodetype_db.h' (GH-32376)LiarPrincess2022-09-281-1/+1
| | | | | | | | | | | | | | | | | Fix for duplicate 1st entry in 'Objects/unicodetype_db.h': ```c /* a list of unique character type descriptors */ const _PyUnicode_TypeRecord _PyUnicode_TypeRecords[] = { {0, 0, 0, 0, 0, 0}, {0, 0, 0, 0, 0, 0}, <--- HERE {0, 0, 0, 0, 0, 32}, {0, 0, 0, 0, 0, 48}, … ``` https://bugs.python.org/issue47243 Automerge-Triggered-By: GH:isidentical
* closes gh-96734: Update to Unicode 15.0.0. (GH-96809)Benjamin Peterson2022-09-131-2/+3
|
* GH-96172 fix unicodedata.east_asian_width being wrong on unassigned code ↵Carl Friedrich Bolz-Tereick2022-08-261-7/+25
| | | | points (#96207)
* gh-96019: Fix caching of decompositions in makeunicodedata (GH-96020)Carl Friedrich Bolz-Tereick2022-08-191-3/+7
|
* gh-84508: tool to generate cjk traditional chinese mappings (gh-93272)Davide Rizzo2022-06-111-0/+239
|
* Revert "gh-84508: Add mapping files for Korean and Japanese. (gh-93309)" ↵Dong-hee Na2022-05-294-38406/+0
| | | | | (#93320) This reverts commit dec1e9346d82fa4a4761389c81d36ef9d01f332b.
* gh-84508: Add mapping files for Korean and Japanese. (gh-93309)Dong-hee Na2022-05-284-0/+38406
|
* closes bpo-45190: Update Unicode data to version 14.0.0. (GH-28336)Benjamin Peterson2021-09-141-4/+4
|
* bpo-40328: Add tool for generating cjk mapping headers (GH-19602)Dong-hee Na2020-04-299-0/+51008
|
* Update some www.unicode.org URLs to use HTTPS. (GH-18912)Benjamin Peterson2020-03-111-2/+2
|
* closes bpo-39926: Update Unicode to 13.0.0. (GH-18910)Benjamin Peterson2020-03-111-4/+5
|
* bpo-37760: Convert from length-18 lists to a dataclass, in makeunicodedata. ↵Greg Price2019-09-121-62/+88
| | | | | | | | | (GH-15265) Now the fields have names! Much easier to keep straight as a reader than the elements of an 18-tuple. Runs about 10-15% slower: from 10.8s to 12.3s, on my laptop. Fortunately that's perfectly fine for this maintenance script.
* bpo-37758: Cut always-constant conditionals on sys.maxunicode. (GH-15302)Greg Price2019-09-091-4/+1
| | | | | | | | | Since PEP 393 in Python 3.3, this value is always 0x10ffff, the maximum codepoint in Unicode; there's no longer such a thing as a UCS-2 build of Python, which couldn't properly represent some characters. There are a couple of spots left where we still condition on the value of this constant. Take them out.
* bpo-37760: Avoid cluttering work tree with downloaded Unicode files. (GH-15128)Greg Price2019-08-151-2/+5
|
* bpo-37760: Factor out standard range-expanding logic in makeunicodedata. ↵Greg Price2019-08-141-33/+35
| | | | | | | | (GH-15248) Much like the lower-level logic in commit ef2af1ad4, we had 4 copies of this logic, written in a couple of different ways. They're all implementing the same standard, so write it just once.
* bpo-37760: Constant-fold some old options in makeunicodedata. (GH-15129)Greg Price2019-08-131-24/+20
| | | | | | | | The `expand` option was introduced in 2000 in commit fad27aee1. It appears to have been always set since it was committed, and what it does is tell the code to do something essential. So, just always do that, and cut the option. Also cut the `linebreakprops` option, which isn't consulted anymore.
* bpo-37760: Factor out the basic UCD parsing logic of makeunicodedata. (GH-15130)Greg Price2019-08-131-133/+109
| | | | | | There were 10 copies of this, and almost as many distinct versions of exactly how it was written. They're all implementing the same standard. Pull them out to the top, so the more interesting logic that remains becomes easier to read.
* Clean up and reduce visual clutter in the makeunicode.py script. (GH-7558)Stefan Behnel2019-06-011-263/+275
|
* closes bpo-36861: Update Unicode database to 12.1.0. (GH-13214)Benjamin Peterson2019-05-091-1/+1
| | | Adds ㋿.
* bpo-36642: make unicodedata const (GH-12855)Inada Naoki2019-04-161-1/+1
|
* bpo-22831: Use "with" to avoid possible fd leaks in tools (part 2). (GH-10927)Serhiy Storchaka2019-03-302-12/+10
|
* closes bpo-33376: Update to Unicode 12.0.0. (GH-12256)Benjamin Peterson2019-03-101-1/+1
|
* update to Unicode 11.0.0 (closes bpo-33778) (GH-7439)Benjamin Peterson2018-06-071-20/+19
| | | Also, standardize indentation of generated tables.
* bpo-30736: upgrade to Unicode 10.0 (#2344)Benjamin Peterson2017-06-231-4/+5
| | | Straightforward. While we're at it, though, strip trailing whitespace from generated tables.
* bpo-27425: Be more explicit in .gitattributes (GH-840)Zachary Ware2017-06-101-7/+7
| | | Updates checked-in line endings on several files.
* bpo-30296 Remove unnecessary tuples, lists, sets, and dicts (#1489)Jon Dufresne2017-05-181-1/+1
| | | | | | | | * Replaced list(<generator expression>) with list comprehension * Replaced dict(<generator expression>) with dict comprehension * Replaced set(<list literal>) with set literal * Replaced builtin func(<list comprehension>) with func(<generator expression>) when supported (e.g. any(), all(), tuple(), min(), & max())
* Unicode 9.0.0Benjamin Peterson2016-09-151-3/+8
| | | | | Not completely mechanical since support for East Asian Width changes—emoji codepoints became Wide—had to be added to unicodedata.
* #27364: fix "incorrect" uses of escape character in the stdlib.R David Murray2016-09-081-5/+5
| | | | | | | And most of the tools. Patch by Emanual Barry, reviewed by me, Serhiy Storchaka, and Martin Panter.
* upgrade to Unicode 8.0.0Benjamin Peterson2015-06-271-3/+4
|
* Issue #16261: Converted some bare except statements to except statementsSerhiy Storchaka2015-05-201-1/+1
| | | | with specified exception type. Original patch by Ramchandra Apte.
* Closes #17202: Merge with 3.4Zachary Ware2015-04-131-7/+7
|\
| * Issue #17202: Add .bat to .hgeol to force them to CRLF.Zachary Ware2015-04-131-7/+7
| | | | | | | | | | Using LF can a script to fail if it tries to use a label that is split across 512 byte blocks. Who knows why.
* | Issue #23181: More "codepoint" -> "code point".Serhiy Storchaka2015-01-181-1/+1
|\ \ | |/
| * Issue #23181: More "codepoint" -> "code point".Serhiy Storchaka2015-01-181-1/+1
| |
* | Merge: #18176: Change generic UCD PropList link to version specific link.R David Murray2014-10-101-1/+1
|\ \ | |/
| * #18176: Change generic UCD PropList link to version specific link.R David Murray2014-10-101-1/+1
| |
* | Merge: #18176: fix another reference and add it to the makeunicodedata comment.R David Murray2014-10-091-0/+1
|\ \ | |/
| * #18176: fix another reference and add it to the makeunicodedata comment.R David Murray2014-10-091-0/+1
| |
* | Merge: #18176: updated stdtypes UCD link, added reminder to makeunicodedata.R David Murray2014-10-091-0/+4
|\ \ | |/
| * #18176: updated stdtypes UCD link, added reminder to makeunicodedata.R David Murray2014-10-091-0/+4
| | | | | | | | Patch by Alexander Belopolsky.
* | upgrade to unicode 7.0.0Benjamin Peterson2014-07-061-1/+1
|/
* Issue #19936: Added executable bits or shebang lines to Python scripts whichSerhiy Storchaka2014-01-161-0/+0
|\ | | | | | | | | | | | | requires them. Disable executable bits and shebang lines in test and benchmark files in order to prevent using a random system python, and in source files of modules which don't provide command line interface. Fixed shebang lines in the unittestgui and checkpip scripts.
| * Issue #19936: Added executable bits or shebang lines to Python scripts whichSerhiy Storchaka2014-01-161-0/+0
| | | | | | | | | | | | | | requires them. Disable executable bits and shebang lines in test and benchmark files in order to prevent using a random system python, and in source files of modules which don't provide command line interface. Fixed shebang line to use python3 executable in the unittestgui script.
* | #1097797: add the original mapping fileAndrew Kuchling2013-11-111-0/+258
| |
* | Fix some PEP8-formatting problems in the generated codeAndrew Kuchling2013-11-111-9/+9
| |
* | upgrade unicode db to 6.3.0 (closes #19221)Benjamin Peterson2013-10-101-2/+2
| |
* | #18803: merge with 3.3.Ezio Melotti2013-08-251-1/+1
|\ \ | |/