diff options
author | Georg Brandl <georg@python.org> | 2007-08-31 09:22:56 (GMT) |
---|---|---|
committer | Georg Brandl <georg@python.org> | 2007-08-31 09:22:56 (GMT) |
commit | 4b49131f2bac48850671ca2aad29dc81b3c228b9 (patch) | |
tree | a031e2ac9b6244f63f80da5f7fae28b05e7a704a /Doc/library/stdtypes.rst | |
parent | 20594ccf07bc9907854dc751175899e3a673f89e (diff) | |
download | cpython-4b49131f2bac48850671ca2aad29dc81b3c228b9.zip cpython-4b49131f2bac48850671ca2aad29dc81b3c228b9.tar.gz cpython-4b49131f2bac48850671ca2aad29dc81b3c228b9.tar.bz2 |
Commit #1068: new docs for PEP 3101. Also document the old string formatting as "old", and begin documenting str/unicode unification.
Diffstat (limited to 'Doc/library/stdtypes.rst')
-rw-r--r-- | Doc/library/stdtypes.rst | 185 |
1 files changed, 87 insertions, 98 deletions
diff --git a/Doc/library/stdtypes.rst b/Doc/library/stdtypes.rst index 34c943c..e7569ad 100644 --- a/Doc/library/stdtypes.rst +++ b/Doc/library/stdtypes.rst @@ -480,19 +480,18 @@ object) supplying the :meth:`__iter__` and :meth:`__next__` methods. .. _typesseq: -Sequence Types --- :class:`str`, :class:`unicode`, :class:`list`, :class:`tuple`, :class:`buffer`, :class:`range` -================================================================================================================= +Sequence Types --- :class:`str`, :class:`bytes`, :class:`list`, :class:`tuple`, :class:`buffer`, :class:`range` +=============================================================================================================== -There are six sequence types: strings, Unicode strings, lists, tuples, buffers, -and range objects. -(For other containers see the built in :class:`dict`, :class:`list`, -:class:`set`, and :class:`tuple` classes, and the :mod:`collections` -module.) - +There are five sequence types: strings, byte sequences, lists, tuples, buffers, +and range objects. (For other containers see the built in :class:`dict`, +:class:`list`, :class:`set`, and :class:`tuple` classes, and the +:mod:`collections` module.) .. index:: object: sequence object: string + object: bytes object: tuple object: list object: buffer @@ -501,21 +500,32 @@ module.) String literals are written in single or double quotes: ``'xyzzy'``, ``"frobozz"``. See :ref:`strings` for more about string literals. In addition to the functionality described here, there are also string-specific methods -described in the :ref:`string-methods` section. Lists are constructed with -square brackets, separating items with commas: ``[a, b, c]``. Tuples are -constructed by the comma operator (not within square brackets), with or without -enclosing parentheses, but an empty tuple must have the enclosing parentheses, -such as ``a, b, c`` or ``()``. A single item tuple must have a trailing comma, -such as ``(d,)``. +described in the :ref:`string-methods` section. Bytes objects can be +constructed from literals too; use a ``b`` prefix with normal string syntax: +``b'xyzzy'``. + +.. caveat:: + + While string objects are sequences of characters (represented by strings of + length 1), bytes objects are sequences of *integers* (between 0 and 255), + representing the ASCII value of single bytes. That means that for a bytes + object *b*, ``b[0]`` will be an integer, while ``b[0:1]`` will be a bytes + object of length 1. + +Lists are constructed with square brackets, separating items with commas: ``[a, +b, c]``. Tuples are constructed by the comma operator (not within square +brackets), with or without enclosing parentheses, but an empty tuple must have +the enclosing parentheses, such as ``a, b, c`` or ``()``. A single item tuple +must have a trailing comma, such as ``(d,)``. Buffer objects are not directly supported by Python syntax, but can be created by calling the builtin function :func:`buffer`. They don't support concatenation or repetition. -Objects of type range are similar to buffers in that there is no specific syntax to -create them, but they are created using the :func:`range` function. They don't -support slicing, concatenation or repetition, and using ``in``, ``not in``, -:func:`min` or :func:`max` on them is inefficient. +Objects of type range are similar to buffers in that there is no specific syntax +to create them, but they are created using the :func:`range` function. They +don't support slicing, concatenation or repetition, and using ``in``, ``not +in``, :func:`min` or :func:`max` on them is inefficient. Most sequence types support the following operations. The ``in`` and ``not in`` operations have the same priorities as the comparison operations. The ``+`` and @@ -555,12 +565,11 @@ are sequences of the same type; *n*, *i* and *j* are integers: | ``max(s)`` | largest item of *s* | | +------------------+--------------------------------+----------+ -Sequence types also support comparisons. In particular, tuples and lists -are compared lexicographically by comparing corresponding -elements. This means that to compare equal, every element must compare -equal and the two sequences must be of the same type and have the same -length. (For full details see :ref:`comparisons` in the language -reference.) +Sequence types also support comparisons. In particular, tuples and lists are +compared lexicographically by comparing corresponding elements. This means that +to compare equal, every element must compare equal and the two sequences must be +of the same type and have the same length. (For full details see +:ref:`comparisons` in the language reference.) .. index:: triple: operations on; sequence; types @@ -578,10 +587,8 @@ reference.) Notes: (1) - When *s* is a string or Unicode string object the ``in`` and ``not in`` - operations act like a substring test. In Python versions before 2.3, *x* had to - be a string of length 1. In Python 2.3 and beyond, *x* may be a string of any - length. + When *s* is a string object, the ``in`` and ``not in`` operations act like a + substring test. (2) Values of *n* less than ``0`` are treated as ``0`` (which yields an empty @@ -642,6 +649,8 @@ Notes: Formerly, string concatenation never occurred in-place. +.. XXX add bytes methods + .. _string-methods: String Methods @@ -649,19 +658,15 @@ String Methods .. index:: pair: string; methods -Below are listed the string methods which both 8-bit strings and Unicode objects -support. In addition, Python's strings support the sequence type methods -described in the :ref:`typesseq` section. To output formatted strings -use template strings or the ``%`` operator described in the -:ref:`string-formatting` section. Also, see the :mod:`re` module for -string functions based on regular expressions. +String objects support the methods listed below. In addition, Python's strings +support the sequence type methods described in the :ref:`typesseq` section. To +output formatted strings, see the :ref:`string-formatting` section. Also, see +the :mod:`re` module for string functions based on regular expressions. .. method:: str.capitalize() Return a copy of the string with only its first character capitalized. - For 8-bit strings, this method is locale-dependent. - .. method:: str.center(width[, fillchar]) @@ -679,6 +684,7 @@ string functions based on regular expressions. slice notation. +.. XXX what about str.decode??? .. method:: str.decode([encoding[, errors]]) Decodes the string using the codec registered for *encoding*. *encoding* @@ -737,6 +743,24 @@ string functions based on regular expressions. found. +.. method:: str.format(format_string, *args, **ksargs) + + Perform a string formatting operation. The *format_string* argument can + contain literal text or replacement fields delimited by braces ``{}``. Each + replacement field contains either the numeric index of a positional argument, + or the name of a keyword argument. Returns a copy of *format_string* where + each replacement field is replaced with the string value of the corresponding + argument. + + >>> "The sum of 1 + 2 is {0}".format(1+2) + 'The sum of 1 + 2 is 3' + + See :ref:`formatstrings` for a description of the various formatting options + that can be specified in format strings. + + .. versionadded:: 3.0 + + .. method:: str.index(sub[, start[, end]]) Like :meth:`find`, but raise :exc:`ValueError` when the substring is not found. @@ -747,31 +771,23 @@ string functions based on regular expressions. Return true if all characters in the string are alphanumeric and there is at least one character, false otherwise. - For 8-bit strings, this method is locale-dependent. - .. method:: str.isalpha() Return true if all characters in the string are alphabetic and there is at least one character, false otherwise. - For 8-bit strings, this method is locale-dependent. - .. method:: str.isdigit() Return true if all characters in the string are digits and there is at least one character, false otherwise. - For 8-bit strings, this method is locale-dependent. - .. method:: str.isidentifier() Return true if the string is a valid identifier according to the language - definition. - - .. XXX link to the definition? + definition, section :ref:`identifiers`. .. method:: str.islower() @@ -779,16 +795,12 @@ string functions based on regular expressions. Return true if all cased characters in the string are lowercase and there is at least one cased character, false otherwise. - For 8-bit strings, this method is locale-dependent. - .. method:: str.isspace() Return true if there are only whitespace characters in the string and there is at least one character, false otherwise. - For 8-bit strings, this method is locale-dependent. - .. method:: str.istitle() @@ -796,16 +808,12 @@ string functions based on regular expressions. character, for example uppercase characters may only follow uncased characters and lowercase characters only cased ones. Return false otherwise. - For 8-bit strings, this method is locale-dependent. - .. method:: str.isupper() Return true if all cased characters in the string are uppercase and there is at least one cased character, false otherwise. - For 8-bit strings, this method is locale-dependent. - .. method:: str.join(seq) @@ -827,8 +835,6 @@ string functions based on regular expressions. Return a copy of the string converted to lowercase. - For 8-bit strings, this method is locale-dependent. - .. method:: str.lstrip([chars]) @@ -984,50 +990,31 @@ string functions based on regular expressions. Return a copy of the string with uppercase characters converted to lowercase and vice versa. - For 8-bit strings, this method is locale-dependent. - .. method:: str.title() Return a titlecased version of the string: words start with uppercase characters, all remaining cased characters are lowercase. - For 8-bit strings, this method is locale-dependent. - - -.. method:: str.translate(table[, deletechars]) - Return a copy of the string where all characters occurring in the optional - argument *deletechars* are removed, and the remaining characters have been - mapped through the given translation table, which must be a string of length - 256. +.. method:: str.translate(map) - You can use the :func:`maketrans` helper function in the :mod:`string` module to - create a translation table. For string objects, set the *table* argument to - ``None`` for translations that only delete characters:: + Returns a copy of the *s* where all characters have been mapped through the + *map* which must be a mapping of Unicode ordinals (integers) to Unicode + ordinals, strings or ``None``. Unmapped characters are left + untouched. Characters mapped to ``None`` are deleted. - >>> 'read this short text'.translate(None, 'aeiou') - 'rd ths shrt txt' - - .. versionadded:: 2.6 - Support for a ``None`` *table* argument. + .. note:: - For Unicode objects, the :meth:`translate` method does not accept the optional - *deletechars* argument. Instead, it returns a copy of the *s* where all - characters have been mapped through the given translation table which must be a - mapping of Unicode ordinals to Unicode ordinals, Unicode strings or ``None``. - Unmapped characters are left untouched. Characters mapped to ``None`` are - deleted. Note, a more flexible approach is to create a custom character mapping - codec using the :mod:`codecs` module (see :mod:`encodings.cp1251` for an - example). + A more flexible approach is to create a custom character mapping codec + using the :mod:`codecs` module (see :mod:`encodings.cp1251` for an + example). .. method:: str.upper() Return a copy of the string converted to uppercase. - For 8-bit strings, this method is locale-dependent. - .. method:: str.zfill(width) @@ -1037,10 +1024,10 @@ string functions based on regular expressions. .. versionadded:: 2.2.2 -.. _string-formatting: +.. _old-string-formatting: -String Formatting Operations ----------------------------- +Old String Formatting Operations +-------------------------------- .. index:: single: formatting, string (%) @@ -1052,14 +1039,18 @@ String Formatting Operations single: % formatting single: % interpolation -String and Unicode objects have one unique built-in operation: the ``%`` -operator (modulo). This is also known as the string *formatting* or -*interpolation* operator. Given ``format % values`` (where *format* is a string -or Unicode object), ``%`` conversion specifications in *format* are replaced -with zero or more elements of *values*. The effect is similar to the using -:cfunc:`sprintf` in the C language. If *format* is a Unicode object, or if any -of the objects being converted using the ``%s`` conversion are Unicode objects, -the result will also be a Unicode object. +.. XXX better? + +.. note:: + + The formatting operations described here are obsolete and my go away in future + versions of Python. Use the new :ref:`string-formatting` in new code. + +String objects have one unique built-in operation: the ``%`` operator (modulo). +This is also known as the string *formatting* or *interpolation* operator. +Given ``format % values`` (where *format* is a string), ``%`` conversion +specifications in *format* are replaced with zero or more elements of *values*. +The effect is similar to the using :cfunc:`sprintf` in the C language. If *format* requires a single argument, *values* may be a single non-tuple object. [#]_ Otherwise, *values* must be a tuple with exactly the number of @@ -1164,7 +1155,7 @@ The conversion types are: | ``'r'`` | String (converts any python object using | \(5) | | | :func:`repr`). | | +------------+-----------------------------------------------------+-------+ -| ``'s'`` | String (converts any python object using | \(6) | +| ``'s'`` | String (converts any python object using | | | | :func:`str`). | | +------------+-----------------------------------------------------+-------+ | ``'%'`` | No argument is converted, results in a ``'%'`` | | @@ -1203,9 +1194,6 @@ Notes: The precision determines the maximal number of characters used. -(6) - If the object or format provided is a :class:`unicode` string, the resulting - string will also be :class:`unicode`. The precision determines the maximal number of characters used. @@ -2019,6 +2007,7 @@ the particular object. on all file-like objects. +.. XXX does this still apply? .. attribute:: file.encoding The encoding that this file uses. When Unicode strings are written to a file, |