Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | #7643: Unicode codepoints VT (0x0B) and FF (0x0C) are linebreaks according ↵ | Florent Xicluna | 2010-03-30 | 1 | -6/+38 |
| | | | | to Unicode Standard Annex #14. | ||||
* | Issue #8024: Update the Unicode database to 5.2 | Florent Xicluna | 2010-03-18 | 1 | -1/+1 |
| | |||||
* | Remove py3k deprecation warnings from these Unicode tools. | Florent Xicluna | 2010-03-15 | 3 | -42/+27 |
| | |||||
* | set svn:eol-style on various files | Benjamin Peterson | 2010-03-08 | 1 | -61/+61 |
| | |||||
* | #7112: Fix compilation warning in unicodetype_db.h | Amaury Forgeot d'Arc | 2009-10-13 | 1 | -0/+5 |
| | | | | makeunicodedata now generates double literals | ||||
* | #1571184: makeunicodedata.py now generates the functions _PyUnicode_ToNumeric, | Amaury Forgeot d'Arc | 2009-10-06 | 1 | -8/+123 |
| | | | | | | _PyUnicode_IsLinebreak and _PyUnicode_IsWhitespace. It now also parses the Unihan.txt for numeric values. | ||||
* | #1616979: Add the cp720 (Arabic DOS) encoding. | Amaury Forgeot d'Arc | 2009-07-13 | 2 | -0/+68 |
| | | | | | Since there is no official mapping file from unicode.org, the codec file is generated on Windows with the new genwincodec.py script. | ||||
* | Issue #1734234: Massively speedup `unicodedata.normalize()` when the | Antoine Pitrou | 2009-04-27 | 1 | -5/+32 |
| | | | | | string is already in normalized form, by performing a quick check beforehand. Original patch by Rauli Ruohonen. | ||||
* | Issue #5828 (Invalid behavior of unicode.lower): Fixed bogus logic in | Walter Dörwald | 2009-04-25 | 1 | -22/+21 |
| | | | | | makeunicodedata.py and regenerated the Unicode database (This fixes u'\u1d79'.lower() == '\x00'). | ||||
* | Issue #3811: The Unicode database was updated to 5.1. | Martin v. Löwis | 2008-09-10 | 1 | -10/+30 |
| | | | | Reviewed by Fredrik Lundh and Marc-Andre Lemburg. | ||||
* | Make more symbols static. | Martin v. Löwis | 2008-06-13 | 1 | -2/+2 |
| | |||||
* | Patch #2167 from calvin: Remove unused imports | Christian Heimes | 2008-02-23 | 1 | -1/+1 |
| | |||||
* | Patch #1359618: Speed-up charmap encoder. | Martin v. Löwis | 2006-06-04 | 2 | -26/+27 |
| | |||||
* | when generating python code prefer to generate valid python code | Jack Diederich | 2006-05-26 | 1 | -3/+3 |
| | |||||
* | Don't add multiple empty lines at the end of the codec. With this a | Walter Dörwald | 2006-03-31 | 1 | -1/+1 |
| | | | | regenerated codec should survive reindent.py unchanged. | ||||
* | Whitespace for generated code. | Walter Dörwald | 2006-03-27 | 1 | -0/+3 |
| | |||||
* | Patch #1443155: Add the incremental codecs support for CJK codecs. | Hye-Shik Chang | 2006-03-26 | 2 | -1/+69 |
| | | | | (reviewed by Walter Dörwald) | ||||
* | Patch #1436130: codecs.lookup() now returns a CodecInfo object (a subclass | Walter Dörwald | 2006-03-15 | 2 | -22/+43 |
| | | | | | | | of tuple) that provides incremental decoders and encoders (a way to use stateful codecs without the stream API). Functions codecs.getincrementaldecoder() and codecs.getincrementalencoder() have been added. | ||||
* | Add changelog entry. | Martin v. Löwis | 2006-03-11 | 1 | -0/+1 |
| | |||||
* | Whitespace normalization. | Tim Peters | 2006-03-10 | 1 | -1/+1 |
| | |||||
* | Update Unicode database to Unicode 4.1. | Martin v. Löwis | 2006-03-09 | 1 | -11/+141 |
| | |||||
* | Whitespace normalization. | Tim Peters | 2005-12-25 | 1 | -3/+3 |
| | |||||
* | Add Makefile which allows easily rebuilding the charmap codecs. | Marc-André Lemburg | 2005-10-25 | 1 | -0/+81 |
| | |||||
* | Add custom mapping files used for generating some of the charmap | Marc-André Lemburg | 2005-10-25 | 3 | -0/+873 |
| | | | | codecs. | ||||
* | Apply some cosmetic fixes to the output of the script. | Marc-André Lemburg | 2005-10-25 | 1 | -15/+28 |
| | | | | Only include the decoding map if no table can be generated. | ||||
* | Add two new tools to compare codecs and show differences and to | Marc-André Lemburg | 2005-10-21 | 2 | -0/+94 |
| | | | | list all installed codecs. | ||||
* | Moved gencodec.py to the Tools/unicode/ directory. | Marc-André Lemburg | 2005-10-21 | 1 | -0/+391 |
| | | | | | | Added new support for decoding tables. Cleaned up the implementation a bit. | ||||
* | SF #989185: Drop unicode.iswide() and unicode.width() and add | Hye-Shik Chang | 2004-08-04 | 1 | -6/+12 |
| | | | | | | | | | | | | unicodedata.east_asian_width(). You can still implement your own simple width() function using it like this: def width(u): w = 0 for c in unicodedata.normalize('NFC', u): cwidth = unicodedata.east_asian_width(c) if cwidth in ('W', 'F'): w += 2 else: w += 1 return w | ||||
* | Whitespace normalization, via reindent.py. | Tim Peters | 2004-07-18 | 1 | -1/+0 |
| | |||||
* | - SF #962502: Add two more methods for unicode type; width() and | Hye-Shik Chang | 2004-06-02 | 1 | -4/+29 |
| | | | | | | | iswide() for east asian width manipulation. (Inspired by David Goodger, Reviewed by Martin v. Loewis) - Move _PyUnicode_TypeRecord.flags to the end of the struct so that no padding is added for UCS-4 builds. (Suggested by Martin v. Loewis) | ||||
* | Applying SF patch #949329 on behalf of Raymond Hettinger. | Armin Rigo | 2004-05-19 | 1 | -27/+26 |
| | |||||
* | Implement IDNA (Internationalized Domain Names in Applications). | Martin v. Löwis | 2003-04-18 | 1 | -0/+433 |
| | |||||
* | Add unidata_version. Bump generator version number. | Martin v. Löwis | 2002-11-25 | 1 | -2/+6 |
| | |||||
* | Sort names independent of the Python version. Fix hex constant warning. | Martin v. Löwis | 2002-11-24 | 1 | -7/+11 |
| | | | | Include all First/Last blocks. | ||||
* | Patch #626485: Support Unicode normalization. | Martin v. Löwis | 2002-11-23 | 1 | -3/+90 |
| | |||||
* | Verify that lower-higher case delta are 16-bit. | Martin v. Löwis | 2002-10-18 | 1 | -3/+11 |
| | |||||
* | Update to Unicode 3.2 database. | Martin v. Löwis | 2002-10-18 | 1 | -2/+2 |
| | |||||
* | Apply diff2.txt from SF patch http://www.python.org/sf/572113 | Walter Dörwald | 2002-09-11 | 1 | -7/+7 |
| | | | | | | | | (with one small bugfix in bgen/bgen/scantools.py) This replaces string module functions with string methods for the stuff in the Tools directory. Several uses of string.letters etc. are still remaining. | ||||
* | Unicode nits: Don't include unicodedatabase.h no more. And make sure | Fredrik Lundh | 2001-01-21 | 1 | -2/+2 |
| | | | | to build *all* tables in makeunicodedata.py. | ||||
* | compress unicode decomposition tables (this saves another 55k) | Fredrik Lundh | 2001-01-21 | 1 | -41/+76 |
| | |||||
* | forgot to check in the new makeunicodedata.py script | Fredrik Lundh | 2001-01-21 | 1 | -17/+271 |
| | |||||
* | Added 38,642 missing characters to the Unicode database (first-last | Fredrik Lundh | 2000-11-03 | 1 | -11/+39 |
| | | | | | | | ranges) -- but thanks to the 2.0 compression scheme, this doesn't add a single byte to the resulting binaries (!) Closes bug #117524 | ||||
* | Remove bogus stdout redirection and use of sys.__stdout__; use | Fred Drake | 2000-10-26 | 1 | -46/+42 |
| | | | | augmented print statement instead. | ||||
* | - don't set the titlecase flag for uppercase letters (sorry, tim) | Fredrik Lundh | 2000-09-25 | 1 | -2/+2 |
| | |||||
* | unicode database compression, step 3: | Fredrik Lundh | 2000-09-25 | 1 | -4/+19 |
| | | | | - added decimal digit and digit properties to the unidb tables | ||||
* | unicode database compression, step 3: | Fredrik Lundh | 2000-09-25 | 1 | -9/+97 |
| | | | | | | | - use unidb compression for the unicodectype module. smaller, faster, and slightly more portable... - also mention the unicode directory in Tools/README | ||||
* | unicode database compression, step 2: | Fredrik Lundh | 2000-09-25 | 1 | -15/+47 |
| | | | | | | | | | | - fixed attributions - moved decomposition data to a separate table, in preparation for step 3 (which won't happen before 2.0 final, promise!) - use relative paths in the generator script I have a lot more stuff in the works for 2.1, but let's leave that for another day... | ||||
* | Fiddled w/ /F's cool new splitbins function: documented it, generalized it | Tim Peters | 2000-09-25 | 1 | -26/+54 |
| | | | | | | | | | | a bit, sped it a lot primarily by removing the unused assumption that None was a legit bin entry (the function doesn't really need to assume that there's anything special about 0), added an optional "trace" argument, and in __debug__ mode added exhaustive verification that the decomposition is both correct and doesn't overstep any array bounds (which wasn't obvious to me from staring at the generated C code -- now I feel safe!). Did not commit a new unicodedata_db.h, as the one produced by this version is identical to the one already checked in. | ||||
* | unicode database compression, step 1: | Fredrik Lundh | 2000-09-24 | 1 | -0/+202 |
- use unidb compression for the unicodedata module. on Windows, the new unidatabase module is 120k, down from nearly 600k. |