| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
| |
|
|
|
|
|
|
|
|
|
| |
(GH-15265)
Now the fields have names! Much easier to keep straight as a
reader than the elements of an 18-tuple.
Runs about 10-15% slower: from 10.8s to 12.3s, on my laptop.
Fortunately that's perfectly fine for this maintenance script.
|
| |
|
|
|
|
|
|
|
|
| |
(GH-15248)
Much like the lower-level logic in commit ef2af1ad4, we had
4 copies of this logic, written in a couple of different ways.
They're all implementing the same standard, so write it just once.
|
|
|
|
|
|
|
|
| |
The `expand` option was introduced in 2000 in commit fad27aee1.
It appears to have been always set since it was committed, and
what it does is tell the code to do something essential. So,
just always do that, and cut the option.
Also cut the `linebreakprops` option, which isn't consulted anymore.
|
|
|
|
|
|
| |
There were 10 copies of this, and almost as many distinct versions of
exactly how it was written. They're all implementing the same
standard. Pull them out to the top, so the more interesting logic
that remains becomes easier to read.
|
| |
|
|
|
| |
Adds ㋿.
|
| |
|
| |
|
|
|
| |
Also, standardize indentation of generated tables.
|
|
|
| |
Straightforward. While we're at it, though, strip trailing whitespace from generated tables.
|
|
|
|
|
|
|
|
| |
* Replaced list(<generator expression>) with list comprehension
* Replaced dict(<generator expression>) with dict comprehension
* Replaced set(<list literal>) with set literal
* Replaced builtin func(<list comprehension>) with func(<generator
expression>) when supported (e.g. any(), all(), tuple(), min(), &
max())
|
|
|
|
|
| |
Not completely mechanical since support for East Asian Width changes—emoji
codepoints became Wide—had to be added to unicodedata.
|
| |
|
|\ |
|
| | |
|
|\ \
| |/ |
|
| | |
|
|\ \
| |/ |
|
| |
| |
| |
| | |
Patch by Alexander Belopolsky.
|
|/ |
|
| |
|
|\ |
|
| | |
|
| | |
|
|/ |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
Also broaden the category of characters that count as lowercase/uppercase.
|
| |
|
| |
|
|\ |
|
| | |
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| | |
makeunicodedata.py: download all data files from unicode.org,
switch to extracting Unihan data from zip file.
Read linebreakprops and derivednormalizationprops even for
old versions, even though they are not used in delta records.
test:unicode.py: U+11000 is now assigned, use U+14000 instead.
|
| |
| |
| |
| |
| |
| |
| |
| | |
The internal unicode database is now always used.
(after 5 years: see
http://mail.python.org/pipermail/python-dev/2004-December/050193.html
)
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Database (Py_UNICODE_TOLOWER, Py_UNICODE_ISDECIMAL, and others) now accept
and return characters from the full Unicode range (Py_UCS4).
The differences from Python code are few:
- unicodedata.numeric(), unicodedata.decimal() and unicodedata.digit()
now return the correct value for large code points
- repr() may consider more characters as printable.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
svn+ssh://pythondev@svn.python.org/python/trunk
........
r79494 | florent.xicluna | 2010-03-30 10:24:06 +0200 (mar, 30 mar 2010) | 2 lines
#7643: Unicode codepoints VT (0x0B) and FF (0x0C) are linebreaks according to Unicode Standard Annex #14.
........
r79496 | florent.xicluna | 2010-03-30 18:29:03 +0200 (mar, 30 mar 2010) | 2 lines
Highlight the change of behavior related to r79494. Now VT and FF are linebreaks.
........
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
svn+ssh://pythondev@svn.python.org/python/trunk
........
r78982 | florent.xicluna | 2010-03-15 15:00:58 +0100 (lun, 15 mar 2010) | 2 lines
Remove py3k deprecation warnings from these Unicode tools.
........
r78986 | florent.xicluna | 2010-03-15 19:08:58 +0100 (lun, 15 mar 2010) | 3 lines
Issue #7783 and #7787: open_urlresource invalidates the outdated files from the local cache.
Use this feature to fix test_normalization.
........
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Merged revision 79059 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk
........
r79059 | florent.xicluna | 2010-03-18 22:50:06 +0100 (jeu, 18 mar 2010) | 2 lines
Issue #8024: Update the Unicode database to 5.2
........
|
| |
| |
| |
| | |
and gave failures in test_bigmem. Revert 79062, 79065 and 79083.
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
svn+ssh://pythondev@svn.python.org/python/trunk
........
r75396 | amaury.forgeotdarc | 2009-10-13 23:29:34 +0200 (mar., 13 oct. 2009) | 3 lines
#7112: Fix compilation warning in unicodetype_db.h
makeunicodedata now generates double literals
........
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
svn+ssh://pythondev@svn.python.org/python/trunk
........
r75272 | amaury.forgeotdarc | 2009-10-06 21:56:32 +0200 (mar., 06 oct. 2009) | 5 lines
#1571184: makeunicodedata.py now generates the functions _PyUnicode_ToNumeric,
_PyUnicode_IsLinebreak and _PyUnicode_IsWhitespace.
It now also parses the Unihan.txt for numeric values.
........
r75273 | amaury.forgeotdarc | 2009-10-06 22:02:09 +0200 (mar., 06 oct. 2009) | 2 lines
Add Anders Chrigstrom to Misc/ACKS for his work on unicodedata.
........
|
|
|
|
|
|
|
|
|
|
|
|
| |
svn+ssh://pythondev@svn.python.org/python/trunk
........
r72054 | antoine.pitrou | 2009-04-27 23:53:26 +0200 (lun., 27 avril 2009) | 5 lines
Issue #1734234: Massively speedup `unicodedata.normalize()` when the
string is already in normalized form, by performing a quick check beforehand.
Original patch by Rauli Ruohonen.
........
|
|
|
|
|
|
|
|
|
|
|
|
| |
svn+ssh://pythondev@svn.python.org/python/trunk
........
r71894 | walter.doerwald | 2009-04-25 16:03:16 +0200 (Sa, 25 Apr 2009) | 4 lines
Issue #5828 (Invalid behavior of unicode.lower): Fixed bogus logic in
makeunicodedata.py and regenerated the Unicode database (This fixes
u'\u1d79'.lower() == '\x00').
........
|