diff options
author | Andrew M. Kuchling <amk@amk.ca> | 2010-06-20 21:45:45 (GMT) |
---|---|---|
committer | Andrew M. Kuchling <amk@amk.ca> | 2010-06-20 21:45:45 (GMT) |
commit | 08982665b722059ca69ddf864ef22d2f8eb47b61 (patch) | |
tree | d1c439d94ed77be697f27233477e0d6e98cba4cb | |
parent | 801923681c6b03ea5f9ab916fa3fdd01335469ab (diff) | |
download | cpython-08982665b722059ca69ddf864ef22d2f8eb47b61.zip cpython-08982665b722059ca69ddf864ef22d2f8eb47b61.tar.gz cpython-08982665b722059ca69ddf864ef22d2f8eb47b61.tar.bz2 |
Note that Python 3.x isn't covered; add forward ref. for UTF-8; note error in 2.5 and up
-rw-r--r-- | Doc/howto/unicode.rst | 33 |
1 files changed, 24 insertions, 9 deletions
diff --git a/Doc/howto/unicode.rst b/Doc/howto/unicode.rst index 4fb8873..ff3c721 100644 --- a/Doc/howto/unicode.rst +++ b/Doc/howto/unicode.rst @@ -2,10 +2,12 @@ Unicode HOWTO ***************** -:Release: 1.02 +:Release: 1.03 -This HOWTO discusses Python's support for Unicode, and explains various problems -that people commonly encounter when trying to work with Unicode. +This HOWTO discusses Python 2.x's support for Unicode, and explains +various problems that people commonly encounter when trying to work +with Unicode. (This HOWTO has not yet been updated to cover the 3.x +versions of Python.) Introduction to Unicode ======================= @@ -144,8 +146,9 @@ problems. 4. Many Internet standards are defined in terms of textual data, and can't handle content with embedded zero bytes. -Generally people don't use this encoding, instead choosing other encodings that -are more efficient and convenient. +Generally people don't use this encoding, instead choosing other +encodings that are more efficient and convenient. UTF-8 is probably +the most commonly supported encoding; it will be discussed below. Encodings don't have to handle every possible Unicode character, and most encodings don't. For example, Python's default encoding is the 'ascii' @@ -222,8 +225,8 @@ Wikipedia entries are often helpful; see the entries for "character encoding" <http://en.wikipedia.org/wiki/UTF-8>, for example. -Python's Unicode Support -======================== +Python 2.x's Unicode Support +============================ Now that you've learned the rudiments of Unicode, we can look at Python's Unicode features. @@ -272,7 +275,7 @@ Unicode result). The following examples show the differences:: >>> unicode('\x80abc', errors='ignore') u'abc' -Encodings are specified as strings containing the encoding's name. Python 2.4 +Encodings are specified as strings containing the encoding's name. Python 2.7 comes with roughly 100 different encodings; see the Python Library Reference at :ref:`standard-encodings` for a list. Some encodings have multiple names; for example, 'latin-1', 'iso_8859_1' and '8859' are all @@ -427,11 +430,19 @@ encoding declaration:: When you run it with Python 2.4, it will output the following warning:: - amk:~$ python p263.py + amk:~$ python2.4 p263.py sys:1: DeprecationWarning: Non-ASCII character '\xe9' in file p263.py on line 2, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details +Python 2.5 and higher are stricter and will produce a syntax error:: + + amk:~$ python2.5 p263.py + File "/tmp/p263.py", line 2 + SyntaxError: Non-ASCII character '\xc3' in file /tmp/p263.py + on line 2, but no encoding declared; see + http://www.python.org/peps/pep-0263.html for details + Unicode Properties ------------------ @@ -693,7 +704,11 @@ several links. Version 1.02: posted August 16 2005. Corrects factual errors. +Version 1.03: posted June 20 2010. Notes that Python 3.x is not covered, +and that the HOWTO only covers 2.x. + +.. comment Describe Python 3.x support (new section? new document?) .. comment Additional topic: building Python w/ UCS2 or UCS4 support .. comment Describe obscure -U switch somewhere? .. comment Describe use of codecs.StreamRecoder and StreamReaderWriter |