diff options
author | Georg Brandl <georg@python.org> | 2007-08-31 09:22:56 (GMT) |
---|---|---|
committer | Georg Brandl <georg@python.org> | 2007-08-31 09:22:56 (GMT) |
commit | 4b49131f2bac48850671ca2aad29dc81b3c228b9 (patch) | |
tree | a031e2ac9b6244f63f80da5f7fae28b05e7a704a | |
parent | 20594ccf07bc9907854dc751175899e3a673f89e (diff) | |
download | cpython-4b49131f2bac48850671ca2aad29dc81b3c228b9.zip cpython-4b49131f2bac48850671ca2aad29dc81b3c228b9.tar.gz cpython-4b49131f2bac48850671ca2aad29dc81b3c228b9.tar.bz2 |
Commit #1068: new docs for PEP 3101. Also document the old string formatting as "old", and begin documenting str/unicode unification.
-rw-r--r-- | Doc/library/fpformat.rst | 4 | ||||
-rw-r--r-- | Doc/library/functions.rst | 23 | ||||
-rw-r--r-- | Doc/library/logging.rst | 8 | ||||
-rw-r--r-- | Doc/library/stdtypes.rst | 185 | ||||
-rw-r--r-- | Doc/library/string.rst | 612 | ||||
-rw-r--r-- | Doc/library/strings.rst | 11 | ||||
-rw-r--r-- | Doc/reference/datamodel.rst | 27 | ||||
-rw-r--r-- | Doc/reference/expressions.rst | 28 | ||||
-rw-r--r-- | Doc/tutorial/introduction.rst | 4 |
9 files changed, 513 insertions, 389 deletions
diff --git a/Doc/library/fpformat.rst b/Doc/library/fpformat.rst index 6627c81..1a84b72 100644 --- a/Doc/library/fpformat.rst +++ b/Doc/library/fpformat.rst @@ -12,8 +12,8 @@ numbers representations in 100% pure Python. .. note:: - This module is unnecessary: everything here can be done using the ``%`` string - interpolation operator described in the :ref:`string-formatting` section. + This module is unnecessary: everything here can be done using the string + formatting functions described in the :ref:`string-formatting` section. The :mod:`fpformat` module defines the following functions and an exception: diff --git a/Doc/library/functions.rst b/Doc/library/functions.rst index b0a5577c..168be0b 100644 --- a/Doc/library/functions.rst +++ b/Doc/library/functions.rst @@ -449,6 +449,22 @@ available. They are listed here in alphabetical order. The float type is described in :ref:`typesnumeric`. +.. function:: format(value[, format_spec]) + + .. index:: + pair: str; format + single: __format__ + + Convert a string or a number to a "formatted" representation, as controlled + by *format_spec*. The interpretation of *format_spec* will depend on the + type of the *value* argument, however there is a standard formatting syntax + that is used by most built-in types: :ref:`formatspec`. + + .. note:: + + ``format(value, format_spec)`` merely calls ``value.__format__(format_spec)``. + + .. function:: frozenset([iterable]) :noindex: @@ -990,10 +1006,9 @@ available. They are listed here in alphabetical order. For more information on strings see :ref:`typesseq` which describes sequence functionality (strings are sequences), and also the string-specific methods - described in the :ref:`string-methods` section. To output formatted strings - use template strings or the ``%`` operator described in the - :ref:`string-formatting` section. In addition see the :ref:`stringservices` - section. See also :func:`unicode`. + described in the :ref:`string-methods` section. To output formatted strings, + see the :ref:`string-formatting` section. In addition see the + :ref:`stringservices` section. .. function:: sum(iterable[, start]) diff --git a/Doc/library/logging.rst b/Doc/library/logging.rst index 7e8703a..e740682 100644 --- a/Doc/library/logging.rst +++ b/Doc/library/logging.rst @@ -611,8 +611,10 @@ This time, all messages with a severity of DEBUG or above were handled, and the format of the messages was also changed, and output went to the specified file rather than the console. -Formatting uses standard Python string formatting - see section -:ref:`string-formatting`. The format string takes the following common +.. XXX logging should probably be updated! + +Formatting uses the old Python string formatting - see section +:ref:`old-string-formatting`. The format string takes the following common specifiers. For a complete list of specifiers, consult the :class:`Formatter` documentation. @@ -1483,7 +1485,7 @@ A Formatter can be initialized with a format string which makes use of knowledge of the :class:`LogRecord` attributes - such as the default value mentioned above making use of the fact that the user's message and arguments are pre-formatted into a :class:`LogRecord`'s *message* attribute. This format string contains -standard python %-style mapping keys. See section :ref:`string-formatting` +standard python %-style mapping keys. See section :ref:`old-string-formatting` for more information on string formatting. Currently, the useful mapping keys in a :class:`LogRecord` are: diff --git a/Doc/library/stdtypes.rst b/Doc/library/stdtypes.rst index 34c943c..e7569ad 100644 --- a/Doc/library/stdtypes.rst +++ b/Doc/library/stdtypes.rst @@ -480,19 +480,18 @@ object) supplying the :meth:`__iter__` and :meth:`__next__` methods. .. _typesseq: -Sequence Types --- :class:`str`, :class:`unicode`, :class:`list`, :class:`tuple`, :class:`buffer`, :class:`range` -================================================================================================================= +Sequence Types --- :class:`str`, :class:`bytes`, :class:`list`, :class:`tuple`, :class:`buffer`, :class:`range` +=============================================================================================================== -There are six sequence types: strings, Unicode strings, lists, tuples, buffers, -and range objects. -(For other containers see the built in :class:`dict`, :class:`list`, -:class:`set`, and :class:`tuple` classes, and the :mod:`collections` -module.) - +There are five sequence types: strings, byte sequences, lists, tuples, buffers, +and range objects. (For other containers see the built in :class:`dict`, +:class:`list`, :class:`set`, and :class:`tuple` classes, and the +:mod:`collections` module.) .. index:: object: sequence object: string + object: bytes object: tuple object: list object: buffer @@ -501,21 +500,32 @@ module.) String literals are written in single or double quotes: ``'xyzzy'``, ``"frobozz"``. See :ref:`strings` for more about string literals. In addition to the functionality described here, there are also string-specific methods -described in the :ref:`string-methods` section. Lists are constructed with -square brackets, separating items with commas: ``[a, b, c]``. Tuples are -constructed by the comma operator (not within square brackets), with or without -enclosing parentheses, but an empty tuple must have the enclosing parentheses, -such as ``a, b, c`` or ``()``. A single item tuple must have a trailing comma, -such as ``(d,)``. +described in the :ref:`string-methods` section. Bytes objects can be +constructed from literals too; use a ``b`` prefix with normal string syntax: +``b'xyzzy'``. + +.. caveat:: + + While string objects are sequences of characters (represented by strings of + length 1), bytes objects are sequences of *integers* (between 0 and 255), + representing the ASCII value of single bytes. That means that for a bytes + object *b*, ``b[0]`` will be an integer, while ``b[0:1]`` will be a bytes + object of length 1. + +Lists are constructed with square brackets, separating items with commas: ``[a, +b, c]``. Tuples are constructed by the comma operator (not within square +brackets), with or without enclosing parentheses, but an empty tuple must have +the enclosing parentheses, such as ``a, b, c`` or ``()``. A single item tuple +must have a trailing comma, such as ``(d,)``. Buffer objects are not directly supported by Python syntax, but can be created by calling the builtin function :func:`buffer`. They don't support concatenation or repetition. -Objects of type range are similar to buffers in that there is no specific syntax to -create them, but they are created using the :func:`range` function. They don't -support slicing, concatenation or repetition, and using ``in``, ``not in``, -:func:`min` or :func:`max` on them is inefficient. +Objects of type range are similar to buffers in that there is no specific syntax +to create them, but they are created using the :func:`range` function. They +don't support slicing, concatenation or repetition, and using ``in``, ``not +in``, :func:`min` or :func:`max` on them is inefficient. Most sequence types support the following operations. The ``in`` and ``not in`` operations have the same priorities as the comparison operations. The ``+`` and @@ -555,12 +565,11 @@ are sequences of the same type; *n*, *i* and *j* are integers: | ``max(s)`` | largest item of *s* | | +------------------+--------------------------------+----------+ -Sequence types also support comparisons. In particular, tuples and lists -are compared lexicographically by comparing corresponding -elements. This means that to compare equal, every element must compare -equal and the two sequences must be of the same type and have the same -length. (For full details see :ref:`comparisons` in the language -reference.) +Sequence types also support comparisons. In particular, tuples and lists are +compared lexicographically by comparing corresponding elements. This means that +to compare equal, every element must compare equal and the two sequences must be +of the same type and have the same length. (For full details see +:ref:`comparisons` in the language reference.) .. index:: triple: operations on; sequence; types @@ -578,10 +587,8 @@ reference.) Notes: (1) - When *s* is a string or Unicode string object the ``in`` and ``not in`` - operations act like a substring test. In Python versions before 2.3, *x* had to - be a string of length 1. In Python 2.3 and beyond, *x* may be a string of any - length. + When *s* is a string object, the ``in`` and ``not in`` operations act like a + substring test. (2) Values of *n* less than ``0`` are treated as ``0`` (which yields an empty @@ -642,6 +649,8 @@ Notes: Formerly, string concatenation never occurred in-place. +.. XXX add bytes methods + .. _string-methods: String Methods @@ -649,19 +658,15 @@ String Methods .. index:: pair: string; methods -Below are listed the string methods which both 8-bit strings and Unicode objects -support. In addition, Python's strings support the sequence type methods -described in the :ref:`typesseq` section. To output formatted strings -use template strings or the ``%`` operator described in the -:ref:`string-formatting` section. Also, see the :mod:`re` module for -string functions based on regular expressions. +String objects support the methods listed below. In addition, Python's strings +support the sequence type methods described in the :ref:`typesseq` section. To +output formatted strings, see the :ref:`string-formatting` section. Also, see +the :mod:`re` module for string functions based on regular expressions. .. method:: str.capitalize() Return a copy of the string with only its first character capitalized. - For 8-bit strings, this method is locale-dependent. - .. method:: str.center(width[, fillchar]) @@ -679,6 +684,7 @@ string functions based on regular expressions. slice notation. +.. XXX what about str.decode??? .. method:: str.decode([encoding[, errors]]) Decodes the string using the codec registered for *encoding*. *encoding* @@ -737,6 +743,24 @@ string functions based on regular expressions. found. +.. method:: str.format(format_string, *args, **ksargs) + + Perform a string formatting operation. The *format_string* argument can + contain literal text or replacement fields delimited by braces ``{}``. Each + replacement field contains either the numeric index of a positional argument, + or the name of a keyword argument. Returns a copy of *format_string* where + each replacement field is replaced with the string value of the corresponding + argument. + + >>> "The sum of 1 + 2 is {0}".format(1+2) + 'The sum of 1 + 2 is 3' + + See :ref:`formatstrings` for a description of the various formatting options + that can be specified in format strings. + + .. versionadded:: 3.0 + + .. method:: str.index(sub[, start[, end]]) Like :meth:`find`, but raise :exc:`ValueError` when the substring is not found. @@ -747,31 +771,23 @@ string functions based on regular expressions. Return true if all characters in the string are alphanumeric and there is at least one character, false otherwise. - For 8-bit strings, this method is locale-dependent. - .. method:: str.isalpha() Return true if all characters in the string are alphabetic and there is at least one character, false otherwise. - For 8-bit strings, this method is locale-dependent. - .. method:: str.isdigit() Return true if all characters in the string are digits and there is at least one character, false otherwise. - For 8-bit strings, this method is locale-dependent. - .. method:: str.isidentifier() Return true if the string is a valid identifier according to the language - definition. - - .. XXX link to the definition? + definition, section :ref:`identifiers`. .. method:: str.islower() @@ -779,16 +795,12 @@ string functions based on regular expressions. Return true if all cased characters in the string are lowercase and there is at least one cased character, false otherwise. - For 8-bit strings, this method is locale-dependent. - .. method:: str.isspace() Return true if there are only whitespace characters in the string and there is at least one character, false otherwise. - For 8-bit strings, this method is locale-dependent. - .. method:: str.istitle() @@ -796,16 +808,12 @@ string functions based on regular expressions. character, for example uppercase characters may only follow uncased characters and lowercase characters only cased ones. Return false otherwise. - For 8-bit strings, this method is locale-dependent. - .. method:: str.isupper() Return true if all cased characters in the string are uppercase and there is at least one cased character, false otherwise. - For 8-bit strings, this method is locale-dependent. - .. method:: str.join(seq) @@ -827,8 +835,6 @@ string functions based on regular expressions. Return a copy of the string converted to lowercase. - For 8-bit strings, this method is locale-dependent. - .. method:: str.lstrip([chars]) @@ -984,50 +990,31 @@ string functions based on regular expressions. Return a copy of the string with uppercase characters converted to lowercase and vice versa. - For 8-bit strings, this method is locale-dependent. - .. method:: str.title() Return a titlecased version of the string: words start with uppercase characters, all remaining cased characters are lowercase. - For 8-bit strings, this method is locale-dependent. - - -.. method:: str.translate(table[, deletechars]) - Return a copy of the string where all characters occurring in the optional - argument *deletechars* are removed, and the remaining characters have been - mapped through the given translation table, which must be a string of length - 256. +.. method:: str.translate(map) - You can use the :func:`maketrans` helper function in the :mod:`string` module to - create a translation table. For string objects, set the *table* argument to - ``None`` for translations that only delete characters:: + Returns a copy of the *s* where all characters have been mapped through the + *map* which must be a mapping of Unicode ordinals (integers) to Unicode + ordinals, strings or ``None``. Unmapped characters are left + untouched. Characters mapped to ``None`` are deleted. - >>> 'read this short text'.translate(None, 'aeiou') - 'rd ths shrt txt' - - .. versionadded:: 2.6 - Support for a ``None`` *table* argument. + .. note:: - For Unicode objects, the :meth:`translate` method does not accept the optional - *deletechars* argument. Instead, it returns a copy of the *s* where all - characters have been mapped through the given translation table which must be a - mapping of Unicode ordinals to Unicode ordinals, Unicode strings or ``None``. - Unmapped characters are left untouched. Characters mapped to ``None`` are - deleted. Note, a more flexible approach is to create a custom character mapping - codec using the :mod:`codecs` module (see :mod:`encodings.cp1251` for an - example). + A more flexible approach is to create a custom character mapping codec + using the :mod:`codecs` module (see :mod:`encodings.cp1251` for an + example). .. method:: str.upper() Return a copy of the string converted to uppercase. - For 8-bit strings, this method is locale-dependent. - .. method:: str.zfill(width) @@ -1037,10 +1024,10 @@ string functions based on regular expressions. .. versionadded:: 2.2.2 -.. _string-formatting: +.. _old-string-formatting: -String Formatting Operations ----------------------------- +Old String Formatting Operations +-------------------------------- .. index:: single: formatting, string (%) @@ -1052,14 +1039,18 @@ String Formatting Operations single: % formatting single: % interpolation -String and Unicode objects have one unique built-in operation: the ``%`` -operator (modulo). This is also known as the string *formatting* or -*interpolation* operator. Given ``format % values`` (where *format* is a string -or Unicode object), ``%`` conversion specifications in *format* are replaced -with zero or more elements of *values*. The effect is similar to the using -:cfunc:`sprintf` in the C language. If *format* is a Unicode object, or if any -of the objects being converted using the ``%s`` conversion are Unicode objects, -the result will also be a Unicode object. +.. XXX better? + +.. note:: + + The formatting operations described here are obsolete and my go away in future + versions of Python. Use the new :ref:`string-formatting` in new code. + +String objects have one unique built-in operation: the ``%`` operator (modulo). +This is also known as the string *formatting* or *interpolation* operator. +Given ``format % values`` (where *format* is a string), ``%`` conversion +specifications in *format* are replaced with zero or more elements of *values*. +The effect is similar to the using :cfunc:`sprintf` in the C language. If *format* requires a single argument, *values* may be a single non-tuple object. [#]_ Otherwise, *values* must be a tuple with exactly the number of @@ -1164,7 +1155,7 @@ The conversion types are: | ``'r'`` | String (converts any python object using | \(5) | | | :func:`repr`). | | +------------+-----------------------------------------------------+-------+ -| ``'s'`` | String (converts any python object using | \(6) | +| ``'s'`` | String (converts any python object using | | | | :func:`str`). | | +------------+-----------------------------------------------------+-------+ | ``'%'`` | No argument is converted, results in a ``'%'`` | | @@ -1203,9 +1194,6 @@ Notes: The precision determines the maximal number of characters used. -(6) - If the object or format provided is a :class:`unicode` string, the resulting - string will also be :class:`unicode`. The precision determines the maximal number of characters used. @@ -2019,6 +2007,7 @@ the particular object. on all file-like objects. +.. XXX does this still apply? .. attribute:: file.encoding The encoding that this file uses. When Unicode strings are written to a file, diff --git a/Doc/library/string.rst b/Doc/library/string.rst index aa2494b..4d79749 100644 --- a/Doc/library/string.rst +++ b/Doc/library/string.rst @@ -8,15 +8,13 @@ .. index:: module: re -The :mod:`string` module contains a number of useful constants and -classes, as well as some deprecated legacy functions that are also -available as methods on strings. In addition, Python's built-in string -classes support the sequence type methods described in the -:ref:`typesseq` section, and also the string-specific methods described -in the :ref:`string-methods` section. To output formatted strings use -template strings or the ``%`` operator described in the -:ref:`string-formatting` section. Also, see the :mod:`re` module for -string functions based on regular expressions. +The :mod:`string` module contains a number of useful constants and classes, as +well as some deprecated legacy functions that are also available as methods on +strings. In addition, Python's built-in string classes support the sequence type +methods described in the :ref:`typesseq` section, and also the string-specific +methods described in the :ref:`string-methods` section. To output formatted +strings, see the :ref:`string-formatting` section. Also, see the :mod:`re` +module for string functions based on regular expressions. String constants @@ -78,6 +76,354 @@ The constants defined in this module are: vertical tab. +.. _string-formatting: + +String Formatting +----------------- + +Starting in Python 3.0, the built-in string class provides the ability to do +complex variable substitutions and value formatting via the :func:`format` +method described in :pep:`3101`. The :class:`Formatter` class in the +:mod:`string` module allows you to create and customize your own string +formatting behaviors using the same implementation as the built-in +:meth:`format` method. + +.. class:: Formatter + + The :class:`Formatter` class has the following public methods: + + .. method:: format(format_string, *args, *kwargs) + + :meth:`format` is the primary API method. It takes a format template + string, and an arbitrary set of positional and keyword argument. + :meth:`format` is just a wrapper that calls :meth:`vformat`. + + .. method:: vformat(format_string, args, kwargs) + + This function does the actual work of formatting. It is exposed as a + separate function for cases where you want to pass in a predefined + dictionary of arguments, rather than unpacking and repacking the + dictionary as individual arguments using the ``*args`` and ``**kwds`` + syntax. :meth:`vformat` does the work of breaking up the format template + string into character data and replacement fields. It calls the various + methods described below. + + In addition, the :class:`Formatter` defines a number of methods that are + intended to be replaced by subclasses: + + .. method:: parse(format_string) + + Loop over the format_string and return an iterable of tuples + (*literal_text*, *field_name*, *format_spec*, *conversion*). This is used + by :meth:`vformat` to break the string in to either literal text, or + replacement fields. + + The values in the tuple conceptually represent a span of literal text + followed by a single replacement field. If there is no literal text + (which can happen if two replacement fields occur consecutively), then + *literal_text* will be a zero-length string. If there is no replacement + field, then the values of *field_name*, *format_spec* and *conversion* + will be ``None``. + + .. method:: get_field(field_name, args, kwargs, used_args) + + Given *field_name* as returned by :meth:`parse` (see above), convert it to + an object to be formatted. The default version takes strings of the form + defined in :pep:`3101`, such as "0[name]" or "label.title". It records + which args have been used in *used_args*. *args* and *kwargs* are as + passed in to :meth:`vformat`. + + .. method:: get_value(key, args, kwargs) + + Retrieve a given field value. The *key* argument will be either an + integer or a string. If it is an integer, it represents the index of the + positional argument in *args*; if it is a string, then it represents a + named argument in *kwargs*. + + The *args* parameter is set to the list of positional arguments to + :meth:`vformat`, and the *kwargs* parameter is set to the dictionary of + keyword arguments. + + For compound field names, these functions are only called for the first + component of the field name; Subsequent components are handled through + normal attribute and indexing operations. + + So for example, the field expression '0.name' would cause + :meth:`get_value` to be called with a *key* argument of 0. The ``name`` + attribute will be looked up after :meth:`get_value` returns by calling the + built-in :func:`getattr` function. + + If the index or keyword refers to an item that does not exist, then an + :exc:`IndexError` or :exc:`KeyError` should be raised. + + .. method:: check_unused_args(used_args, args, kwargs) + + Implement checking for unused arguments if desired. The arguments to this + function is the set of all argument keys that were actually referred to in + the format string (integers for positional arguments, and strings for + named arguments), and a reference to the *args* and *kwargs* that was + passed to vformat. The set of unused args can be calculated from these + parameters. :meth:`check_unused_args` is assumed to throw an exception if + the check fails. + + .. method:: format_field(value, format_spec) + + :meth:`format_field` simply calls the global :func:`format` built-in. The + method is provided so that subclasses can override it. + + .. method:: convert_field(value, conversion) + + Converts the value (returned by :meth:`get_field`) given a conversion type + (as in the tuple returned by the :meth:`parse` method.) The default + version understands 'r' (repr) and 's' (str) conversion types. + + .. versionadded:: 3.0 + +.. _formatstrings: + +Format String Syntax +-------------------- + +The :meth:`str.format` method and the :class:`Formatter` class share the same +syntax for format strings (although in the case of :class:`Formatter`, +subclasses can define their own format string syntax.) + +Format strings contain "replacement fields" surrounded by curly braces ``{}``. +Anything that is not contained in braces is considered literal text, which is +copied unchanged to the output. If you need to include a brace character in the +literal text, it can be escaped by doubling: ``{{`` and ``}}``. + +The grammar for a replacement field is as follows: + + .. productionlist:: sf + replacement_field: "{" `field_name` ["!" `conversion`] [":" `format_spec`] "}" + field_name: (`identifier` | `integer`) ("." `attribute_name` | "[" element_index "]")* + attribute_name: `identifier` + element_index: `integer` + conversion: "r" | "s" + format_spec: <described in the next section> + +In less formal terms, the replacement field starts with a *field_name*, which +can either be a number (for a positional argument), or an identifier (for +keyword arguments). Following this is an optional *conversion* field, which is +preceded by an exclamation point ``'!'``, and a *format_spec*, which is preceded +by a colon ``':'``. + +The *field_name* itself begins with either a number or a keyword. If it's a +number, it refers to a positional argument, and if it's a keyword it refers to a +named keyword argument. This can be followed by any number of index or +attribute expressions. An expression of the form ``'.name'`` selects the named +attribute using :func:`getattr`, while an expression of the form ``'[index]'`` +does an index lookup using :func:`__getitem__`. + +Some simple format string examples:: + + "First, thou shalt count to {0}" # References first positional argument + "My quest is {name}" # References keyword argument 'name' + "Weight in tons {0.weight}" # 'weight' attribute of first positional arg + "Units destroyed: {players[0]}" # First element of keyword argument 'players'. + +The *conversion* field causes a type coercion before formatting. Normally, the +job of formatting a value is done by the :meth:`__format__` method of the value +itself. However, in some cases it is desirable to force a type to be formatted +as a string, overriding its own definition of formatting. By converting the +value to a string before calling :meth:`__format__`, the normal formatting logic +is bypassed. + +Two conversion flags are currently supported: ``'!s'`` which calls :func:`str()` +on the value, and ``'!r'`` which calls :func:`repr()`. + +Some examples:: + + "Harold's a clever {0!s}" # Calls str() on the argument first + "Bring out the holy {name!r}" # Calls repr() on the argument first + +The *format_spec* field contains a specification of how the value should be +presented, including such details as field width, alignment, padding, decimal +precision and so on. Each value type can define it's own "formatting +mini-language" or interpretation of the *format_spec*. + +Most built-in types support a common formatting mini-language, which is +described in the next section. + +A *format_spec* field can also include nested replacement fields within it. +These nested replacement fields can contain only a field name; conversion flags +and format specifications are not allowed. The replacement fields within the +format_spec are substituted before the *format_spec* string is interpreted. +This allows the formatting of a value to be dynamically specified. + +For example, suppose you wanted to have a replacement field whose field width is +determined by another variable:: + + "A man with two {0:{1}}".format("noses", 10) + +This would first evaluate the inner replacement field, making the format string +effectively:: + + "A man with two {0:10}" + +Then the outer replacement field would be evaluated, producing:: + + "noses " + +Which is subsitituted into the string, yielding:: + + "A man with two noses " + +(The extra space is because we specified a field width of 10, and because left +alignment is the default for strings.) + +.. versionadded:: 3.0 + +.. _formatspec: + +Format Specification Mini-Language +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +"Format specifications" are used within replacement fields contained within a +format string to define how individual values are presented (see +:ref:`formatstrings`.) They can also be passed directly to the builtin +:func:`format` function. Each formattable type may define how the format +specification is to be interpreted. + +Most built-in types implement the following options for format specifications, +although some of the formatting options are only supported by the numeric types. + +A general convention is that an empty format string (``""``) produces the same +result as if you had called :func:`str()` on the value. + +The general form of a *standard format specifier* is: + +.. productionlist:: sf + format_spec: [[`fill`]`align`][`sign`][0][`width`][.`precision`][`type`] + fill: <a character other than '}'> + align: "<" | ">" | "=" | "^" + sign: "+" | "-" | " " + width: `integer` + precision: `integer` + type: "b" | "c" | "d" | "e" | "E" | "f" | "F" | "g" | "G" | "n" | "o" | "x" | "X" | "%" + +The *fill* character can be any character other than '}' (which signifies the +end of the field). The presence of a fill character is signaled by the *next* +character, which must be one of the alignment options. If the second character +of *format_spec* is not a valid alignment option, then it is assumed that both +the fill character and the alignment option are absent. + +The meaning of the various alignment options is as follows: + + +---------+----------------------------------------------------------+ + | Option | Meaning | + +=========+==========================================================+ + | ``'<'`` | Forces the field to be left-aligned within the available | + | | space (This is the default.) | + +---------+----------------------------------------------------------+ + | ``'>'`` | Forces the field to be right-aligned within the | + | | available space. | + +---------+----------------------------------------------------------+ + | ``'='`` | Forces the padding to be placed after the sign (if any) | + | | but before the digits. This is used for printing fields | + | | in the form '+000000120'. This alignment option is only | + | | valid for numeric types. | + +---------+----------------------------------------------------------+ + | ``'^'`` | Forces the field to be centered within the available | + | | space. | + +---------+----------------------------------------------------------+ + +Note that unless a minimum field width is defined, the field width will always +be the same size as the data to fill it, so that the alignment option has no +meaning in this case. + +The *sign* option is only valid for number types, and can be one of the +following: + + +---------+----------------------------------------------------------+ + | Option | Meaning | + +=========+==========================================================+ + | ``'+'`` | indicates that a sign should be used for both | + | | positive as well as negative numbers. | + +---------+----------------------------------------------------------+ + | ``'-'`` | indicates that a sign should be used only for negative | + | | numbers (this is the default behavior). | + +---------+----------------------------------------------------------+ + | space | indicates that a leading space should be used on | + | | positive numbers, and a minus sign on negative numbers. | + +---------+----------------------------------------------------------+ + +*width* is a decimal integer defining the minimum field width. If not +specified, then the field width will be determined by the content. + +If the *width* field is preceded by a zero (``'0'``) character, this enables +zero-padding. This is equivalent to an *alignment* type of ``'='`` and a *fill* +character of ``'0'``. + +The *precision* is a decimal number indicating how many digits should be +displayed after the decimal point for a floating point value. For non-number +types the field indicates the maximum field size - in other words, how many +characters will be used from the field content. The *precision* is ignored for +integer values. + +Finally, the *type* determines how the data should be presented. + +The available integer presentation types are: + + +---------+----------------------------------------------------------+ + | Type | Meaning | + +=========+==========================================================+ + | ``'b'`` | Binary. Outputs the number in base 2. | + +---------+----------------------------------------------------------+ + | ``'c'`` | Character. Converts the integer to the corresponding | + | | unicode character before printing. | + +---------+----------------------------------------------------------+ + | ``'d'`` | Decimal Integer. Outputs the number in base 10. | + +---------+----------------------------------------------------------+ + | ``'o'`` | Octal format. Outputs the number in base 8. | + +---------+----------------------------------------------------------+ + | ``'x'`` | Hex format. Outputs the number in base 16, using lower- | + | | case letters for the digits above 9. | + +---------+----------------------------------------------------------+ + | ``'X'`` | Hex format. Outputs the number in base 16, using upper- | + | | case letters for the digits above 9. | + +---------+----------------------------------------------------------+ + | None | the same as ``'d'`` | + +---------+----------------------------------------------------------+ + +The available presentation types for floating point and decimal values are: + + +---------+----------------------------------------------------------+ + | Type | Meaning | + +=========+==========================================================+ + | ``'e'`` | Exponent notation. Prints the number in scientific | + | | notation using the letter 'e' to indicate the exponent. | + +---------+----------------------------------------------------------+ + | ``'E'`` | Exponent notation. Same as ``'e'`` except it uses an | + | | upper case 'E' as the separator character. | + +---------+----------------------------------------------------------+ + | ``'f'`` | Fixed point. Displays the number as a fixed-point | + | | number. | + +---------+----------------------------------------------------------+ + | ``'F'`` | Fixed point. Same as ``'f'``. | + +---------+----------------------------------------------------------+ + | ``'g'`` | General format. This prints the number as a fixed-point | + | | number, unless the number is too large, in which case | + | | it switches to ``'e'`` exponent notation. | + +---------+----------------------------------------------------------+ + | ``'G'`` | General format. Same as ``'g'`` except switches to | + | | ``'E'`` if the number gets to large. | + +---------+----------------------------------------------------------+ + | ``'n'`` | Number. This is the same as ``'g'``, except that it uses | + | | the current locale setting to insert the appropriate | + | | number separator characters. | + +---------+----------------------------------------------------------+ + | ``'%'`` | Percentage. Multiplies the number by 100 and displays | + | | in fixed (``'f'``) format, followed by a percent sign. | + +---------+----------------------------------------------------------+ + | None | similar to ``'g'``, except that it prints at least one | + | | digit after the decimal point. | + +---------+----------------------------------------------------------+ + + +.. _template-strings: + Template strings ---------------- @@ -208,6 +554,7 @@ They are not available as string methods. leading and trailing whitespace. +.. XXX is obsolete with unicode.translate .. function:: maketrans(from, to) Return a translation table suitable for passing to :func:`translate`, that will @@ -219,250 +566,3 @@ They are not available as string methods. Don't use strings derived from :const:`lowercase` and :const:`uppercase` as arguments; in some locales, these don't have the same length. For case conversions, always use :func:`lower` and :func:`upper`. - - -Deprecated string functions ---------------------------- - -The following list of functions are also defined as methods of string and -Unicode objects; see section :ref:`string-methods` for more information on -those. You should consider these functions as deprecated, although they will -not be removed until Python 3.0. The functions defined in this module are: - - -.. function:: atof(s) - - .. deprecated:: 2.0 - Use the :func:`float` built-in function. - - .. index:: builtin: float - - Convert a string to a floating point number. The string must have the standard - syntax for a floating point literal in Python, optionally preceded by a sign - (``+`` or ``-``). Note that this behaves identical to the built-in function - :func:`float` when passed a string. - - .. note:: - - .. index:: - single: NaN - single: Infinity - - When passing in a string, values for NaN and Infinity may be returned, depending - on the underlying C library. The specific set of strings accepted which cause - these values to be returned depends entirely on the C library and is known to - vary. - - -.. function:: atoi(s[, base]) - - .. deprecated:: 2.0 - Use the :func:`int` built-in function. - - .. index:: builtin: eval - - Convert string *s* to an integer in the given *base*. The string must consist - of one or more digits, optionally preceded by a sign (``+`` or ``-``). The - *base* defaults to 10. If it is 0, a default base is chosen depending on the - leading characters of the string (after stripping the sign): ``0x`` or ``0X`` - means 16, ``0`` means 8, anything else means 10. If *base* is 16, a leading - ``0x`` or ``0X`` is always accepted, though not required. This behaves - identically to the built-in function :func:`int` when passed a string. (Also - note: for a more flexible interpretation of numeric literals, use the built-in - function :func:`eval`.) - - -.. function:: atol(s[, base]) - - .. deprecated:: 2.0 - Use the :func:`long` built-in function. - - .. index:: builtin: long - - Convert string *s* to a long integer in the given *base*. The string must - consist of one or more digits, optionally preceded by a sign (``+`` or ``-``). - The *base* argument has the same meaning as for :func:`atoi`. A trailing ``l`` - or ``L`` is not allowed, except if the base is 0. Note that when invoked - without *base* or with *base* set to 10, this behaves identical to the built-in - function :func:`long` when passed a string. - - -.. function:: capitalize(word) - - Return a copy of *word* with only its first character capitalized. - - -.. function:: expandtabs(s[, tabsize]) - - Expand tabs in a string replacing them by one or more spaces, depending on the - current column and the given tab size. The column number is reset to zero after - each newline occurring in the string. This doesn't understand other non-printing - characters or escape sequences. The tab size defaults to 8. - - -.. function:: find(s, sub[, start[,end]]) - - Return the lowest index in *s* where the substring *sub* is found such that - *sub* is wholly contained in ``s[start:end]``. Return ``-1`` on failure. - Defaults for *start* and *end* and interpretation of negative values is the same - as for slices. - - -.. function:: rfind(s, sub[, start[, end]]) - - Like :func:`find` but find the highest index. - - -.. function:: index(s, sub[, start[, end]]) - - Like :func:`find` but raise :exc:`ValueError` when the substring is not found. - - -.. function:: rindex(s, sub[, start[, end]]) - - Like :func:`rfind` but raise :exc:`ValueError` when the substring is not found. - - -.. function:: count(s, sub[, start[, end]]) - - Return the number of (non-overlapping) occurrences of substring *sub* in string - ``s[start:end]``. Defaults for *start* and *end* and interpretation of negative - values are the same as for slices. - - -.. function:: lower(s) - - Return a copy of *s*, but with upper case letters converted to lower case. - - -.. function:: split(s[, sep[, maxsplit]]) - - Return a list of the words of the string *s*. If the optional second argument - *sep* is absent or ``None``, the words are separated by arbitrary strings of - whitespace characters (space, tab, newline, return, formfeed). If the second - argument *sep* is present and not ``None``, it specifies a string to be used as - the word separator. The returned list will then have one more item than the - number of non-overlapping occurrences of the separator in the string. The - optional third argument *maxsplit* defaults to 0. If it is nonzero, at most - *maxsplit* number of splits occur, and the remainder of the string is returned - as the final element of the list (thus, the list will have at most - ``maxsplit+1`` elements). - - The behavior of split on an empty string depends on the value of *sep*. If *sep* - is not specified, or specified as ``None``, the result will be an empty list. - If *sep* is specified as any string, the result will be a list containing one - element which is an empty string. - - -.. function:: rsplit(s[, sep[, maxsplit]]) - - Return a list of the words of the string *s*, scanning *s* from the end. To all - intents and purposes, the resulting list of words is the same as returned by - :func:`split`, except when the optional third argument *maxsplit* is explicitly - specified and nonzero. When *maxsplit* is nonzero, at most *maxsplit* number of - splits -- the *rightmost* ones -- occur, and the remainder of the string is - returned as the first element of the list (thus, the list will have at most - ``maxsplit+1`` elements). - - .. versionadded:: 2.4 - - -.. function:: splitfields(s[, sep[, maxsplit]]) - - This function behaves identically to :func:`split`. (In the past, :func:`split` - was only used with one argument, while :func:`splitfields` was only used with - two arguments.) - - -.. function:: join(words[, sep]) - - Concatenate a list or tuple of words with intervening occurrences of *sep*. - The default value for *sep* is a single space character. It is always true that - ``string.join(string.split(s, sep), sep)`` equals *s*. - - -.. function:: joinfields(words[, sep]) - - This function behaves identically to :func:`join`. (In the past, :func:`join` - was only used with one argument, while :func:`joinfields` was only used with two - arguments.) Note that there is no :meth:`joinfields` method on string objects; - use the :meth:`join` method instead. - - -.. function:: lstrip(s[, chars]) - - Return a copy of the string with leading characters removed. If *chars* is - omitted or ``None``, whitespace characters are removed. If given and not - ``None``, *chars* must be a string; the characters in the string will be - stripped from the beginning of the string this method is called on. - - .. versionchanged:: 2.2.3 - The *chars* parameter was added. The *chars* parameter cannot be passed in - earlier 2.2 versions. - - -.. function:: rstrip(s[, chars]) - - Return a copy of the string with trailing characters removed. If *chars* is - omitted or ``None``, whitespace characters are removed. If given and not - ``None``, *chars* must be a string; the characters in the string will be - stripped from the end of the string this method is called on. - - .. versionchanged:: 2.2.3 - The *chars* parameter was added. The *chars* parameter cannot be passed in - earlier 2.2 versions. - - -.. function:: strip(s[, chars]) - - Return a copy of the string with leading and trailing characters removed. If - *chars* is omitted or ``None``, whitespace characters are removed. If given and - not ``None``, *chars* must be a string; the characters in the string will be - stripped from the both ends of the string this method is called on. - - .. versionchanged:: 2.2.3 - The *chars* parameter was added. The *chars* parameter cannot be passed in - earlier 2.2 versions. - - -.. function:: swapcase(s) - - Return a copy of *s*, but with lower case letters converted to upper case and - vice versa. - - -.. function:: translate(s, table[, deletechars]) - - Delete all characters from *s* that are in *deletechars* (if present), and then - translate the characters using *table*, which must be a 256-character string - giving the translation for each character value, indexed by its ordinal. If - *table* is ``None``, then only the character deletion step is performed. - - -.. function:: upper(s) - - Return a copy of *s*, but with lower case letters converted to upper case. - - -.. function:: ljust(s, width) - rjust(s, width) - center(s, width) - - These functions respectively left-justify, right-justify and center a string in - a field of given width. They return a string that is at least *width* - characters wide, created by padding the string *s* with spaces until the given - width on the right, left or both sides. The string is never truncated. - - -.. function:: zfill(s, width) - - Pad a numeric string on the left with zero digits until the given width is - reached. Strings starting with a sign are handled correctly. - - -.. function:: replace(str, old, new[, maxreplace]) - - Return a copy of string *str* with all occurrences of substring *old* replaced - by *new*. If the optional argument *maxreplace* is given, the first - *maxreplace* occurrences are replaced. - diff --git a/Doc/library/strings.rst b/Doc/library/strings.rst index 5c8ec4b..dfb272f 100644 --- a/Doc/library/strings.rst +++ b/Doc/library/strings.rst @@ -8,12 +8,11 @@ String Services The modules described in this chapter provide a wide range of string manipulation operations. -In addition, Python's built-in string classes support the sequence type -methods described in the :ref:`typesseq` section, and also the -string-specific methods described in the :ref:`string-methods` section. -To output formatted strings use template strings or the ``%`` operator -described in the :ref:`string-formatting` section. Also, see the -:mod:`re` module for string functions based on regular expressions. +In addition, Python's built-in string classes support the sequence type methods +described in the :ref:`typesseq` section, and also the string-specific methods +described in the :ref:`string-methods` section. To output formatted strings, +see the :ref:`string-formatting` section. Also, see the :mod:`re` module for +string functions based on regular expressions. .. toctree:: diff --git a/Doc/reference/datamodel.rst b/Doc/reference/datamodel.rst index baa6eaa..ea48148 100644 --- a/Doc/reference/datamodel.rst +++ b/Doc/reference/datamodel.rst @@ -1279,15 +1279,36 @@ Basic customization .. index:: builtin: str - statement: print + builtin: print - Called by the :func:`str` built-in function and by the :keyword:`print` - statement to compute the "informal" string representation of an object. This + Called by the :func:`str` built-in function and by the :func:`print` + function to compute the "informal" string representation of an object. This differs from :meth:`__repr__` in that it does not have to be a valid Python expression: a more convenient or concise representation may be used instead. The return value must be a string object. +.. method:: object.__format__(self, format_spec) + + .. index:: + pair: string; conversion + builtin: str + builtin: print + + Called by the :func:`format` built-in function (and by extension, the + :meth:`format` method of class :class:`str`) to produce a "formatted" + string representation of an object. The ``format_spec`` argument is + a string that contains a description of the formatting options desired. + The interpretation of the ``format_spec`` argument is up to the type + implementing :meth:`__format__`, however most classes will either + delegate formatting to one of the built-in types, or use a similar + formatting option syntax. + + See :ref:`formatspec` for a description of the standard formatting syntax. + + The return value must be a string object. + + .. method:: object.__lt__(self, other) object.__le__(self, other) object.__eq__(self, other) diff --git a/Doc/reference/expressions.rst b/Doc/reference/expressions.rst index ef71a80..8dbdc31 100644 --- a/Doc/reference/expressions.rst +++ b/Doc/reference/expressions.rst @@ -5,12 +5,10 @@ Expressions *********** -.. index:: single: expression +.. index:: expression, BNF This chapter explains the meaning of the elements of expressions in Python. -.. index:: single: BNF - **Syntax Notes:** In this and the following chapters, extended BNF notation will be used to describe syntax, not lexical analysis. When (one alternative of) a syntax rule has the form @@ -18,8 +16,6 @@ syntax rule has the form .. productionlist:: * name: `othername` -.. index:: single: syntax - and no semantics are given, the semantics of this form of ``name`` are the same as for ``othername``. @@ -852,9 +848,9 @@ identities hold approximately where ``x/y`` is replaced by ``floor(x/y)`` or ``floor(x/y) - 1`` [#]_. In addition to performing the modulo operation on numbers, the ``%`` operator is -also overloaded by string and unicode objects to perform string formatting (also +also overloaded by string objects to perform string formatting (also known as interpolation). The syntax for string formatting is described in the -Python Library Reference, section :ref:`string-formatting`. +Python Library Reference, section :ref:`old-string-formatting`. The floor division operator, the modulo operator, and the :func:`divmod` function are not defined for complex numbers. Instead, convert to a @@ -985,9 +981,12 @@ Comparison of objects of the same type depends on the type: * Numbers are compared arithmetically. +* Bytes objects are compared lexicographically using the numeric values of + their elements. + * Strings are compared lexicographically using the numeric equivalents (the - result of the built-in function :func:`ord`) of their characters. Unicode and - 8-bit strings are fully interoperable in this behavior. [#]_ + result of the built-in function :func:`ord`) of their characters. [#]_ + String and bytes object can't be compared! * Tuples and lists are compared lexicographically using comparison of corresponding elements. This means that to compare equal, each element must @@ -1020,11 +1019,10 @@ particular, dictionaries support membership testing as a nicer way of spelling For the list and tuple types, ``x in y`` is true if and only if there exists an index *i* such that ``x == y[i]`` is true. -For the Unicode and string types, ``x in y`` is true if and only if *x* is a -substring of *y*. An equivalent test is ``y.find(x) != -1``. Note, *x* and *y* -need not be the same type; consequently, ``u'ab' in 'abc'`` will return -``True``. Empty strings are always considered to be a substring of any other -string, so ``"" in "abc"`` will return ``True``. +For the string and bytes types, ``x in y`` is true if and only if *x* is a +substring of *y*. An equivalent test is ``y.find(x) != -1``. Empty strings are +always considered to be a substring of any other string, so ``"" in "abc"`` will +return ``True``. .. versionchanged:: 2.3 Previously, *x* was required to be a string of length ``1``. @@ -1272,7 +1270,7 @@ groups from right to left). cases, Python returns the latter result, in order to preserve that ``divmod(x,y)[0] * y + x % y`` be very close to ``x``. -.. [#] While comparisons between unicode strings make sense at the byte +.. [#] While comparisons between strings make sense at the byte level, they may be counter-intuitive to users. For example, the strings ``u"\u00C7"`` and ``u"\u0327\u0043"`` compare differently, even though they both represent the same unicode character (LATIN diff --git a/Doc/tutorial/introduction.rst b/Doc/tutorial/introduction.rst index 29178e8..456307d 100644 --- a/Doc/tutorial/introduction.rst +++ b/Doc/tutorial/introduction.rst @@ -399,8 +399,8 @@ The built-in function :func:`len` returns the length of a string:: basic transformations and searching. :ref:`string-formatting` - The formatting operations invoked when strings are the - left operand of the ``%`` operator are described in more detail here. + The formatting operations invoked by the :meth:`format` string method are + described in more detail here. .. _tut-unicodestrings: |