diff options
Diffstat (limited to 'Doc')
-rw-r--r-- | Doc/reference/datamodel.rst | 100 |
1 files changed, 36 insertions, 64 deletions
diff --git a/Doc/reference/datamodel.rst b/Doc/reference/datamodel.rst index c2d3aee..6f6f42c 100644 --- a/Doc/reference/datamodel.rst +++ b/Doc/reference/datamodel.rst @@ -289,52 +289,21 @@ Sequences .. index:: builtin: chr builtin: ord - object: string - single: character - single: byte - single: ASCII@ASCII - - The items of a string are characters. There is no separate character type; a - character is represented by a string of one item. Characters represent (at - least) 8-bit bytes. The built-in functions :func:`chr` and :func:`ord` convert - between characters and nonnegative integers representing the byte values. Bytes - with the values 0-127 usually represent the corresponding ASCII values, but the - interpretation of values is up to the program. The string data type is also - used to represent arrays of bytes, e.g., to hold data read from a file. - - .. index:: - single: ASCII@ASCII - single: EBCDIC - single: character set - pair: string; comparison - builtin: chr - builtin: ord - - (On systems whose native character set is not ASCII, strings may use EBCDIC in - their internal representation, provided the functions :func:`chr` and - :func:`ord` implement a mapping between ASCII and EBCDIC, and string comparison - preserves the ASCII order. Or perhaps someone can propose a better rule?) - - Unicode - .. index:: - builtin: unichr - builtin: ord - builtin: unicode - object: unicode + builtin: str single: character single: integer single: Unicode - The items of a Unicode object are Unicode code units. A Unicode code unit is - represented by a Unicode object of one item and can hold either a 16-bit or - 32-bit value representing a Unicode ordinal (the maximum value for the ordinal - is given in ``sys.maxunicode``, and depends on how Python is configured at - compile time). Surrogate pairs may be present in the Unicode object, and will - be reported as two separate items. The built-in functions :func:`unichr` and - :func:`ord` convert between code units and nonnegative integers representing the - Unicode ordinals as defined in the Unicode Standard 3.0. Conversion from and to - other encodings are possible through the Unicode method :meth:`encode` and the - built-in function :func:`unicode`. + The items of a string object are Unicode code units. A Unicode code + unit is represented by a string object of one item and can hold either + a 16-bit or 32-bit value representing a Unicode ordinal (the maximum + value for the ordinal is given in ``sys.maxunicode``, and depends on + how Python is configured at compile time). Surrogate pairs may be + present in the Unicode object, and will be reported as two separate + items. The built-in functions :func:`chr` and :func:`ord` convert + between code units and nonnegative integers representing the Unicode + ordinals as defined in the Unicode Standard 3.0. Conversion from and to + other encodings are possible through the string method :meth:`encode`. Tuples .. index:: @@ -342,11 +311,12 @@ Sequences pair: singleton; tuple pair: empty; tuple - The items of a tuple are arbitrary Python objects. Tuples of two or more items - are formed by comma-separated lists of expressions. A tuple of one item (a - 'singleton') can be formed by affixing a comma to an expression (an expression - by itself does not create a tuple, since parentheses must be usable for grouping - of expressions). An empty tuple can be formed by an empty pair of parentheses. + The items of a tuple are arbitrary Python objects. Tuples of two or + more items are formed by comma-separated lists of expressions. A tuple + of one item (a 'singleton') can be formed by affixing a comma to an + expression (an expression by itself does not create a tuple, since + parentheses must be usable for grouping of expressions). An empty + tuple can be formed by an empty pair of parentheses. .. % Immutable sequences @@ -369,14 +339,23 @@ Sequences Lists .. index:: object: list - The items of a list are arbitrary Python objects. Lists are formed by placing a - comma-separated list of expressions in square brackets. (Note that there are no - special cases needed to form lists of length 0 or 1.) + The items of a list are arbitrary Python objects. Lists are formed by + placing a comma-separated list of expressions in square brackets. (Note + that there are no special cases needed to form lists of length 0 or 1.) + + Bytes + .. index:: bytes, byte + + A bytes object is a mutable array. The items are 8-bit bytes, + represented by integers in the range 0 <= x < 256. Bytes literals + (like ``b'abc'`` and the built-in function :func:`bytes` can be used to + construct bytes objects. Also, bytes objects can be decoded to strings + via the :meth:`decode` method. .. index:: module: array - The extension module :mod:`array` provides an additional example of a mutable - sequence type. + The extension module :mod:`array` provides an additional example of a + mutable sequence type. .. % Mutable sequences @@ -1230,12 +1209,14 @@ Basic customization builtin: str builtin: print - Called by the :func:`str` built-in function and by the :func:`print` - function to compute the "informal" string representation of an object. This - differs from :meth:`__repr__` in that it does not have to be a valid Python + Called by the :func:`str` built-in function and by the :func:`print` function + to compute the "informal" string representation of an object. This differs + from :meth:`__repr__` in that it does not have to be a valid Python expression: a more convenient or concise representation may be used instead. The return value must be a string object. + .. XXX what about subclasses of string? + .. method:: object.__format__(self, format_spec) @@ -1355,15 +1336,6 @@ Basic customization :meth:`__bool__`, all its instances are considered true. -.. method:: object.__unicode__(self) - - .. index:: builtin: unicode - - Called to implement :func:`unicode` builtin; should return a Unicode object. - When this method is not defined, string conversion is attempted, and the result - of string conversion is converted to Unicode using the system default encoding. - - .. _attribute-access: Customizing attribute access |