summaryrefslogtreecommitdiffstats
path: root/Doc/reference
diff options
context:
space:
mode:
Diffstat (limited to 'Doc/reference')
-rw-r--r--Doc/reference/datamodel.rst23
-rw-r--r--Doc/reference/lexical_analysis.rst17
2 files changed, 25 insertions, 15 deletions
diff --git a/Doc/reference/datamodel.rst b/Doc/reference/datamodel.rst
index d01b1f2..a93c09a 100644
--- a/Doc/reference/datamodel.rst
+++ b/Doc/reference/datamodel.rst
@@ -276,16 +276,16 @@ Sequences
single: integer
single: Unicode
- The items of a string object are Unicode code units. A Unicode code
- unit is represented by a string object of one item and can hold either
- a 16-bit or 32-bit value representing a Unicode ordinal (the maximum
- value for the ordinal is given in ``sys.maxunicode``, and depends on
- how Python is configured at compile time). Surrogate pairs may be
- present in the Unicode object, and will be reported as two separate
- items. The built-in functions :func:`chr` and :func:`ord` convert
- between code units and nonnegative integers representing the Unicode
- ordinals as defined in the Unicode Standard 3.0. Conversion from and to
- other encodings are possible through the string method :meth:`encode`.
+ A string is a sequence of values that represent Unicode codepoints.
+ All the codepoints in range ``U+0000 - U+10FFFF`` can be represented
+ in a string. Python doesn't have a :c:type:`chr` type, and
+ every character in the string is represented as a string object
+ with length ``1``. The built-in function :func:`ord` converts a
+ character to its codepoint (as an integer); :func:`chr` converts
+ an integer in range ``0 - 10FFFF`` to the corresponding character.
+ :meth:`str.encode` can be used to convert a :class:`str` to
+ :class:`bytes` using the given encoding, and :meth:`bytes.decode` can
+ be used to achieve the opposite.
Tuples
.. index::
@@ -1351,7 +1351,8 @@ access (use of, assignment to, or deletion of ``x.name``) for class instances.
.. method:: object.__dir__(self)
- Called when :func:`dir` is called on the object. A list must be returned.
+ Called when :func:`dir` is called on the object. A sequence must be
+ returned. :func:`dir` converts the returned sequence to a list and sorts it.
.. _descriptors:
diff --git a/Doc/reference/lexical_analysis.rst b/Doc/reference/lexical_analysis.rst
index 4b49738..5900daa 100644
--- a/Doc/reference/lexical_analysis.rst
+++ b/Doc/reference/lexical_analysis.rst
@@ -492,13 +492,13 @@ Escape sequences only recognized in string literals are:
+-----------------+---------------------------------+-------+
| Escape Sequence | Meaning | Notes |
+=================+=================================+=======+
-| ``\N{name}`` | Character named *name* in the | |
+| ``\N{name}`` | Character named *name* in the | \(4) |
| | Unicode database | |
+-----------------+---------------------------------+-------+
-| ``\uxxxx`` | Character with 16-bit hex value | \(4) |
+| ``\uxxxx`` | Character with 16-bit hex value | \(5) |
| | *xxxx* | |
+-----------------+---------------------------------+-------+
-| ``\Uxxxxxxxx`` | Character with 32-bit hex value | \(5) |
+| ``\Uxxxxxxxx`` | Character with 32-bit hex value | \(6) |
| | *xxxxxxxx* | |
+-----------------+---------------------------------+-------+
@@ -516,10 +516,14 @@ Notes:
with the given value.
(4)
+ .. versionchanged:: 3.3
+ Support for name aliases [#]_ has been added.
+
+(5)
Individual code units which form parts of a surrogate pair can be encoded using
this escape sequence. Exactly four hex digits are required.
-(5)
+(6)
Any Unicode character can be encoded this way, but characters outside the Basic
Multilingual Plane (BMP) will be encoded using a surrogate pair if Python is
compiled to use 16-bit code units (the default). Exactly eight hex digits
@@ -706,3 +710,8 @@ The following printing ASCII characters are not used in Python. Their
occurrence outside string literals and comments is an unconditional error::
$ ? `
+
+
+.. rubric:: Footnotes
+
+.. [#] http://www.unicode.org/Public/6.0.0/ucd/NameAliases.txt