summaryrefslogtreecommitdiffstats
path: root/Tools/unicode
Commit message (Collapse)AuthorAgeFilesLines
* #7643: Unicode codepoints VT (0x0B) and FF (0x0C) are linebreaks according ↵Florent Xicluna2010-03-301-6/+38
| | | | to Unicode Standard Annex #14.
* Issue #8024: Update the Unicode database to 5.2Florent Xicluna2010-03-181-1/+1
|
* Remove py3k deprecation warnings from these Unicode tools.Florent Xicluna2010-03-153-42/+27
|
* set svn:eol-style on various filesBenjamin Peterson2010-03-081-61/+61
|
* #7112: Fix compilation warning in unicodetype_db.hAmaury Forgeot d'Arc2009-10-131-0/+5
| | | | makeunicodedata now generates double literals
* #1571184: makeunicodedata.py now generates the functions _PyUnicode_ToNumeric,Amaury Forgeot d'Arc2009-10-061-8/+123
| | | | | | _PyUnicode_IsLinebreak and _PyUnicode_IsWhitespace. It now also parses the Unihan.txt for numeric values.
* #1616979: Add the cp720 (Arabic DOS) encoding.Amaury Forgeot d'Arc2009-07-132-0/+68
| | | | | Since there is no official mapping file from unicode.org, the codec file is generated on Windows with the new genwincodec.py script.
* Issue #1734234: Massively speedup `unicodedata.normalize()` when theAntoine Pitrou2009-04-271-5/+32
| | | | | string is already in normalized form, by performing a quick check beforehand. Original patch by Rauli Ruohonen.
* Issue #5828 (Invalid behavior of unicode.lower): Fixed bogus logic inWalter Dörwald2009-04-251-22/+21
| | | | | makeunicodedata.py and regenerated the Unicode database (This fixes u'\u1d79'.lower() == '\x00').
* Issue #3811: The Unicode database was updated to 5.1.Martin v. Löwis2008-09-101-10/+30
| | | | Reviewed by Fredrik Lundh and Marc-Andre Lemburg.
* Make more symbols static.Martin v. Löwis2008-06-131-2/+2
|
* Patch #2167 from calvin: Remove unused importsChristian Heimes2008-02-231-1/+1
|
* Patch #1359618: Speed-up charmap encoder.Martin v. Löwis2006-06-042-26/+27
|
* when generating python code prefer to generate valid python codeJack Diederich2006-05-261-3/+3
|
* Don't add multiple empty lines at the end of the codec. With this aWalter Dörwald2006-03-311-1/+1
| | | | regenerated codec should survive reindent.py unchanged.
* Whitespace for generated code.Walter Dörwald2006-03-271-0/+3
|
* Patch #1443155: Add the incremental codecs support for CJK codecs.Hye-Shik Chang2006-03-262-1/+69
| | | | (reviewed by Walter Dörwald)
* Patch #1436130: codecs.lookup() now returns a CodecInfo object (a subclassWalter Dörwald2006-03-152-22/+43
| | | | | | | of tuple) that provides incremental decoders and encoders (a way to use stateful codecs without the stream API). Functions codecs.getincrementaldecoder() and codecs.getincrementalencoder() have been added.
* Add changelog entry.Martin v. Löwis2006-03-111-0/+1
|
* Whitespace normalization.Tim Peters2006-03-101-1/+1
|
* Update Unicode database to Unicode 4.1.Martin v. Löwis2006-03-091-11/+141
|
* Whitespace normalization.Tim Peters2005-12-251-3/+3
|
* Add Makefile which allows easily rebuilding the charmap codecs.Marc-André Lemburg2005-10-251-0/+81
|
* Add custom mapping files used for generating some of the charmapMarc-André Lemburg2005-10-253-0/+873
| | | | codecs.
* Apply some cosmetic fixes to the output of the script.Marc-André Lemburg2005-10-251-15/+28
| | | | Only include the decoding map if no table can be generated.
* Add two new tools to compare codecs and show differences and toMarc-André Lemburg2005-10-212-0/+94
| | | | list all installed codecs.
* Moved gencodec.py to the Tools/unicode/ directory.Marc-André Lemburg2005-10-211-0/+391
| | | | | | Added new support for decoding tables. Cleaned up the implementation a bit.
* SF #989185: Drop unicode.iswide() and unicode.width() and addHye-Shik Chang2004-08-041-6/+12
| | | | | | | | | | | | unicodedata.east_asian_width(). You can still implement your own simple width() function using it like this: def width(u): w = 0 for c in unicodedata.normalize('NFC', u): cwidth = unicodedata.east_asian_width(c) if cwidth in ('W', 'F'): w += 2 else: w += 1 return w
* Whitespace normalization, via reindent.py.Tim Peters2004-07-181-1/+0
|
* - SF #962502: Add two more methods for unicode type; width() andHye-Shik Chang2004-06-021-4/+29
| | | | | | | iswide() for east asian width manipulation. (Inspired by David Goodger, Reviewed by Martin v. Loewis) - Move _PyUnicode_TypeRecord.flags to the end of the struct so that no padding is added for UCS-4 builds. (Suggested by Martin v. Loewis)
* Applying SF patch #949329 on behalf of Raymond Hettinger.Armin Rigo2004-05-191-27/+26
|
* Implement IDNA (Internationalized Domain Names in Applications).Martin v. Löwis2003-04-181-0/+433
|
* Add unidata_version. Bump generator version number.Martin v. Löwis2002-11-251-2/+6
|
* Sort names independent of the Python version. Fix hex constant warning.Martin v. Löwis2002-11-241-7/+11
| | | | Include all First/Last blocks.
* Patch #626485: Support Unicode normalization.Martin v. Löwis2002-11-231-3/+90
|
* Verify that lower-higher case delta are 16-bit.Martin v. Löwis2002-10-181-3/+11
|
* Update to Unicode 3.2 database.Martin v. Löwis2002-10-181-2/+2
|
* Apply diff2.txt from SF patch http://www.python.org/sf/572113Walter Dörwald2002-09-111-7/+7
| | | | | | | | (with one small bugfix in bgen/bgen/scantools.py) This replaces string module functions with string methods for the stuff in the Tools directory. Several uses of string.letters etc. are still remaining.
* Unicode nits: Don't include unicodedatabase.h no more. And make sureFredrik Lundh2001-01-211-2/+2
| | | | to build *all* tables in makeunicodedata.py.
* compress unicode decomposition tables (this saves another 55k)Fredrik Lundh2001-01-211-41/+76
|
* forgot to check in the new makeunicodedata.py scriptFredrik Lundh2001-01-211-17/+271
|
* Added 38,642 missing characters to the Unicode database (first-lastFredrik Lundh2000-11-031-11/+39
| | | | | | | ranges) -- but thanks to the 2.0 compression scheme, this doesn't add a single byte to the resulting binaries (!) Closes bug #117524
* Remove bogus stdout redirection and use of sys.__stdout__; useFred Drake2000-10-261-46/+42
| | | | augmented print statement instead.
* - don't set the titlecase flag for uppercase letters (sorry, tim)Fredrik Lundh2000-09-251-2/+2
|
* unicode database compression, step 3:Fredrik Lundh2000-09-251-4/+19
| | | | - added decimal digit and digit properties to the unidb tables
* unicode database compression, step 3:Fredrik Lundh2000-09-251-9/+97
| | | | | | | - use unidb compression for the unicodectype module. smaller, faster, and slightly more portable... - also mention the unicode directory in Tools/README
* unicode database compression, step 2:Fredrik Lundh2000-09-251-15/+47
| | | | | | | | | | - fixed attributions - moved decomposition data to a separate table, in preparation for step 3 (which won't happen before 2.0 final, promise!) - use relative paths in the generator script I have a lot more stuff in the works for 2.1, but let's leave that for another day...
* Fiddled w/ /F's cool new splitbins function: documented it, generalized itTim Peters2000-09-251-26/+54
| | | | | | | | | | a bit, sped it a lot primarily by removing the unused assumption that None was a legit bin entry (the function doesn't really need to assume that there's anything special about 0), added an optional "trace" argument, and in __debug__ mode added exhaustive verification that the decomposition is both correct and doesn't overstep any array bounds (which wasn't obvious to me from staring at the generated C code -- now I feel safe!). Did not commit a new unicodedata_db.h, as the one produced by this version is identical to the one already checked in.
* unicode database compression, step 1:Fredrik Lundh2000-09-241-0/+202
- use unidb compression for the unicodedata module. on Windows, the new unidatabase module is 120k, down from nearly 600k.