summaryrefslogtreecommitdiffstats
path: root/Doc/howto/unicode.rst
diff options
context:
space:
mode:
authorGeorg Brandl <georg@python.org>2009-01-03 21:26:05 (GMT)
committerGeorg Brandl <georg@python.org>2009-01-03 21:26:05 (GMT)
commita1c6a1cea5af1d3c7682a8e99b001b0904480e4d (patch)
tree85461d1ba5237440b36f253b197779190226a294 /Doc/howto/unicode.rst
parent48310cd3f2e02ced9ae836ccbcb67e9af3097d62 (diff)
downloadcpython-a1c6a1cea5af1d3c7682a8e99b001b0904480e4d.zip
cpython-a1c6a1cea5af1d3c7682a8e99b001b0904480e4d.tar.gz
cpython-a1c6a1cea5af1d3c7682a8e99b001b0904480e4d.tar.bz2
Merged revisions 68221 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk ........ r68221 | georg.brandl | 2009-01-03 22:04:55 +0100 (Sat, 03 Jan 2009) | 2 lines Remove tabs from the documentation. ........
Diffstat (limited to 'Doc/howto/unicode.rst')
-rw-r--r--Doc/howto/unicode.rst68
1 files changed, 34 insertions, 34 deletions
diff --git a/Doc/howto/unicode.rst b/Doc/howto/unicode.rst
index 8200723..60f7d7d 100644
--- a/Doc/howto/unicode.rst
+++ b/Doc/howto/unicode.rst
@@ -32,8 +32,8 @@ For a while people just wrote programs that didn't display accents. I remember
looking at Apple ][ BASIC programs, published in French-language publications in
the mid-1980s, that had lines like these::
- PRINT "FICHIER EST COMPLETE."
- PRINT "CARACTERE NON ACCEPTE."
+ PRINT "FICHIER EST COMPLETE."
+ PRINT "CARACTERE NON ACCEPTE."
Those messages should contain accents, and they just look wrong to someone who
can read French.
@@ -91,11 +91,11 @@ standard, a code point is written using the notation U+12ca to mean the
character with value 0x12ca (4810 decimal). The Unicode standard contains a lot
of tables listing characters and their corresponding code points::
- 0061 'a'; LATIN SMALL LETTER A
- 0062 'b'; LATIN SMALL LETTER B
- 0063 'c'; LATIN SMALL LETTER C
- ...
- 007B '{'; LEFT CURLY BRACKET
+ 0061 'a'; LATIN SMALL LETTER A
+ 0062 'b'; LATIN SMALL LETTER B
+ 0063 'c'; LATIN SMALL LETTER C
+ ...
+ 007B '{'; LEFT CURLY BRACKET
Strictly, these definitions imply that it's meaningless to say 'this is
character U+12ca'. U+12ca is a code point, which represents some particular
@@ -527,19 +527,19 @@ path will return the byte string versions of the filenames. For example,
assuming the default filesystem encoding is UTF-8, running the following
program::
- fn = 'filename\u4500abc'
- f = open(fn, 'w')
- f.close()
+ fn = 'filename\u4500abc'
+ f = open(fn, 'w')
+ f.close()
- import os
- print(os.listdir(b'.'))
- print(os.listdir('.'))
+ import os
+ print(os.listdir(b'.'))
+ print(os.listdir('.'))
will produce the following output::
- amk:~$ python t.py
- [b'.svn', b'filename\xe4\x94\x80abc', ...]
- ['.svn', 'filename\u4500abc', ...]
+ amk:~$ python t.py
+ [b'.svn', b'filename\xe4\x94\x80abc', ...]
+ ['.svn', 'filename\u4500abc', ...]
The first list contains UTF-8-encoded filenames, and the second list contains
the Unicode versions.
@@ -636,26 +636,26 @@ Version 1.1: Feb-Nov 2008. Updates the document with respect to Python 3 change
- [ ] Unicode introduction
- [ ] ASCII
- [ ] Terms
- - [ ] Character
- - [ ] Code point
- - [ ] Encodings
- - [ ] Common encodings: ASCII, Latin-1, UTF-8
+ - [ ] Character
+ - [ ] Code point
+ - [ ] Encodings
+ - [ ] Common encodings: ASCII, Latin-1, UTF-8
- [ ] Unicode Python type
- - [ ] Writing unicode literals
- - [ ] Obscurity: -U switch
- - [ ] Built-ins
- - [ ] unichr()
- - [ ] ord()
- - [ ] unicode() constructor
- - [ ] Unicode type
- - [ ] encode(), decode() methods
+ - [ ] Writing unicode literals
+ - [ ] Obscurity: -U switch
+ - [ ] Built-ins
+ - [ ] unichr()
+ - [ ] ord()
+ - [ ] unicode() constructor
+ - [ ] Unicode type
+ - [ ] encode(), decode() methods
- [ ] Unicodedata module for character properties
- [ ] I/O
- - [ ] Reading/writing Unicode data into files
- - [ ] Byte-order marks
- - [ ] Unicode filenames
+ - [ ] Reading/writing Unicode data into files
+ - [ ] Byte-order marks
+ - [ ] Unicode filenames
- [ ] Writing Unicode programs
- - [ ] Do everything in Unicode
- - [ ] Declaring source code encodings (PEP 263)
+ - [ ] Do everything in Unicode
+ - [ ] Declaring source code encodings (PEP 263)
- [ ] Other issues
- - [ ] Building Python (UCS2, UCS4)
+ - [ ] Building Python (UCS2, UCS4)