summaryrefslogtreecommitdiffstats
path: root/Doc/library/codecs.rst
diff options
context:
space:
mode:
authorGeorg Brandl <georg@python.org>2015-01-14 07:26:30 (GMT)
committerGeorg Brandl <georg@python.org>2015-01-14 07:26:30 (GMT)
commit3be472b5f777fe5ebc0c1f4b6c0d96c73352db9c (patch)
treeaddfeeb14af6240b6454926ef9a4cce2a51f9207 /Doc/library/codecs.rst
parent1a8ada89f9b3d9b10654adce979046d865906a44 (diff)
downloadcpython-3be472b5f777fe5ebc0c1f4b6c0d96c73352db9c.zip
cpython-3be472b5f777fe5ebc0c1f4b6c0d96c73352db9c.tar.gz
cpython-3be472b5f777fe5ebc0c1f4b6c0d96c73352db9c.tar.bz2
Closes #23181: codepoint -> code point
Diffstat (limited to 'Doc/library/codecs.rst')
-rw-r--r--Doc/library/codecs.rst12
1 files changed, 6 insertions, 6 deletions
diff --git a/Doc/library/codecs.rst b/Doc/library/codecs.rst
index b67e653..3510f69 100644
--- a/Doc/library/codecs.rst
+++ b/Doc/library/codecs.rst
@@ -841,7 +841,7 @@ methods and attributes from the underlying stream.
Encodings and Unicode
---------------------
-Strings are stored internally as sequences of codepoints in
+Strings are stored internally as sequences of code points in
range ``0x0``-``0x10FFFF``. (See :pep:`393` for
more details about the implementation.)
Once a string object is used outside of CPU and memory, endianness
@@ -852,23 +852,23 @@ There are a variety of different text serialisation codecs, which are
collectivity referred to as :term:`text encodings <text encoding>`.
The simplest text encoding (called ``'latin-1'`` or ``'iso-8859-1'``) maps
-the codepoints 0-255 to the bytes ``0x0``-``0xff``, which means that a string
-object that contains codepoints above ``U+00FF`` can't be encoded with this
+the code points 0-255 to the bytes ``0x0``-``0xff``, which means that a string
+object that contains code points above ``U+00FF`` can't be encoded with this
codec. Doing so will raise a :exc:`UnicodeEncodeError` that looks
like the following (although the details of the error message may differ):
``UnicodeEncodeError: 'latin-1' codec can't encode character '\u1234' in
position 3: ordinal not in range(256)``.
There's another group of encodings (the so called charmap encodings) that choose
-a different subset of all Unicode code points and how these codepoints are
+a different subset of all Unicode code points and how these code points are
mapped to the bytes ``0x0``-``0xff``. To see how this is done simply open
e.g. :file:`encodings/cp1252.py` (which is an encoding that is used primarily on
Windows). There's a string constant with 256 characters that shows you which
character is mapped to which byte value.
-All of these encodings can only encode 256 of the 1114112 codepoints
+All of these encodings can only encode 256 of the 1114112 code points
defined in Unicode. A simple and straightforward way that can store each Unicode
-code point, is to store each codepoint as four consecutive bytes. There are two
+code point, is to store each code point as four consecutive bytes. There are two
possibilities: store the bytes in big endian or in little endian order. These
two encodings are called ``UTF-32-BE`` and ``UTF-32-LE`` respectively. Their
disadvantage is that if e.g. you use ``UTF-32-BE`` on a little endian machine you