summaryrefslogtreecommitdiffstats
path: root/Doc
diff options
context:
space:
mode:
authorEzio Melotti <ezio.melotti@gmail.com>2011-09-01 05:13:46 (GMT)
committerEzio Melotti <ezio.melotti@gmail.com>2011-09-01 05:13:46 (GMT)
commitc2d9a0226e785c6a77fc0de776f2eece92eebadb (patch)
treea95bd3ec0d2d9dd53c1f28d08bba30387f4fcacb /Doc
parent7194efe8b403548c8f1efd6a933cefaf716f23af (diff)
parent222b20844f28f37dbe5431eb293ef2b35df71ae7 (diff)
downloadcpython-c2d9a0226e785c6a77fc0de776f2eece92eebadb.zip
cpython-c2d9a0226e785c6a77fc0de776f2eece92eebadb.tar.gz
cpython-c2d9a0226e785c6a77fc0de776f2eece92eebadb.tar.bz2
Merge doc fix with 3.2.
Diffstat (limited to 'Doc')
-rw-r--r--Doc/library/codecs.rst9
1 files changed, 2 insertions, 7 deletions
diff --git a/Doc/library/codecs.rst b/Doc/library/codecs.rst
index 90bd0dd..84593f2 100644
--- a/Doc/library/codecs.rst
+++ b/Doc/library/codecs.rst
@@ -840,7 +840,7 @@ There's another encoding that is able to encoding the full range of Unicode
characters: UTF-8. UTF-8 is an 8-bit encoding, which means there are no issues
with byte order in UTF-8. Each byte in a UTF-8 byte sequence consists of two
parts: Marker bits (the most significant bits) and payload bits. The marker bits
-are a sequence of zero to six 1 bits followed by a 0 bit. Unicode characters are
+are a sequence of zero to four ``1`` bits followed by a ``0`` bit. Unicode characters are
encoded like this (with x being payload bits, which when concatenated give the
Unicode character):
@@ -853,12 +853,7 @@ Unicode character):
+-----------------------------------+----------------------------------------------+
| ``U-00000800`` ... ``U-0000FFFF`` | 1110xxxx 10xxxxxx 10xxxxxx |
+-----------------------------------+----------------------------------------------+
-| ``U-00010000`` ... ``U-001FFFFF`` | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx |
-+-----------------------------------+----------------------------------------------+
-| ``U-00200000`` ... ``U-03FFFFFF`` | 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx |
-+-----------------------------------+----------------------------------------------+
-| ``U-04000000`` ... ``U-7FFFFFFF`` | 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx |
-| | 10xxxxxx |
+| ``U-00010000`` ... ``U-0010FFFF`` | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx |
+-----------------------------------+----------------------------------------------+
The least significant bit of the Unicode character is the rightmost x bit.