From e321c2f37d20fe4c3dc81f966d29216efe486917 Mon Sep 17 00:00:00 2001 From: Georg Brandl Date: Mon, 12 May 2008 16:45:43 +0000 Subject: #2836: backport new string formatting docs. --- Doc/library/stdtypes.rst | 22 +++ Doc/library/string.rst | 349 ++++++++++++++++++++++++++++++++++++++++++++++- Doc/whatsnew/2.6.rst | 3 +- 3 files changed, 371 insertions(+), 3 deletions(-) diff --git a/Doc/library/stdtypes.rst b/Doc/library/stdtypes.rst index 5288212..c6679fd 100644 --- a/Doc/library/stdtypes.rst +++ b/Doc/library/stdtypes.rst @@ -783,6 +783,28 @@ string functions based on regular expressions. found. +.. method:: str.format(format_string, *args, **kwargs) + + Perform a string formatting operation. The *format_string* argument can + contain literal text or replacement fields delimited by braces ``{}``. Each + replacement field contains either the numeric index of a positional argument, + or the name of a keyword argument. Returns a copy of *format_string* where + each replacement field is replaced with the string value of the corresponding + argument. + + >>> "The sum of 1 + 2 is {0}".format(1+2) + 'The sum of 1 + 2 is 3' + + See :ref:`formatstrings` for a description of the various formatting options + that can be specified in format strings. + + This method of string formatting is the new standard in Python 3.0, and + should be preferred to the ``%`` formatting described in + :ref:`string-formatting` in new code. + + .. versionadded:: 2.6 + + .. method:: str.index(sub[, start[, end]]) Like :meth:`find`, but raise :exc:`ValueError` when the substring is not found. diff --git a/Doc/library/string.rst b/Doc/library/string.rst index c188b82..09f9182 100644 --- a/Doc/library/string.rst +++ b/Doc/library/string.rst @@ -1,4 +1,3 @@ - :mod:`string` --- Common string operations ========================================== @@ -104,6 +103,354 @@ The constants defined in this module are: :func:`strip` and :func:`split` is undefined. +.. _string-formatting: + +String Formatting +----------------- + +Starting in Python 2.6, the built-in str and unicode classes provide the ability +to do complex variable substitutions and value formatting via the :func:`format` +method described in :pep:`3101`. The :class:`Formatter` class in the +:mod:`string` module allows you to create and customize your own string +formatting behaviors using the same implementation as the built-in +:meth:`format` method. + +.. class:: Formatter + + The :class:`Formatter` class has the following public methods: + + .. method:: format(format_string, *args, *kwargs) + + :meth:`format` is the primary API method. It takes a format template + string, and an arbitrary set of positional and keyword argument. + :meth:`format` is just a wrapper that calls :meth:`vformat`. + + .. method:: vformat(format_string, args, kwargs) + + This function does the actual work of formatting. It is exposed as a + separate function for cases where you want to pass in a predefined + dictionary of arguments, rather than unpacking and repacking the + dictionary as individual arguments using the ``*args`` and ``**kwds`` + syntax. :meth:`vformat` does the work of breaking up the format template + string into character data and replacement fields. It calls the various + methods described below. + + In addition, the :class:`Formatter` defines a number of methods that are + intended to be replaced by subclasses: + + .. method:: parse(format_string) + + Loop over the format_string and return an iterable of tuples + (*literal_text*, *field_name*, *format_spec*, *conversion*). This is used + by :meth:`vformat` to break the string in to either literal text, or + replacement fields. + + The values in the tuple conceptually represent a span of literal text + followed by a single replacement field. If there is no literal text + (which can happen if two replacement fields occur consecutively), then + *literal_text* will be a zero-length string. If there is no replacement + field, then the values of *field_name*, *format_spec* and *conversion* + will be ``None``. + + .. method:: get_field(field_name, args, kwargs) + + Given *field_name* as returned by :meth:`parse` (see above), convert it to + an object to be formatted. Returns a tuple (obj, used_key). The default + version takes strings of the form defined in :pep:`3101`, such as + "0[name]" or "label.title". *args* and *kwargs* are as passed in to + :meth:`vformat`. The return value *used_key* has the same meaning as the + *key* parameter to :meth:`get_value`. + + .. method:: get_value(key, args, kwargs) + + Retrieve a given field value. The *key* argument will be either an + integer or a string. If it is an integer, it represents the index of the + positional argument in *args*; if it is a string, then it represents a + named argument in *kwargs*. + + The *args* parameter is set to the list of positional arguments to + :meth:`vformat`, and the *kwargs* parameter is set to the dictionary of + keyword arguments. + + For compound field names, these functions are only called for the first + component of the field name; Subsequent components are handled through + normal attribute and indexing operations. + + So for example, the field expression '0.name' would cause + :meth:`get_value` to be called with a *key* argument of 0. The ``name`` + attribute will be looked up after :meth:`get_value` returns by calling the + built-in :func:`getattr` function. + + If the index or keyword refers to an item that does not exist, then an + :exc:`IndexError` or :exc:`KeyError` should be raised. + + .. method:: check_unused_args(used_args, args, kwargs) + + Implement checking for unused arguments if desired. The arguments to this + function is the set of all argument keys that were actually referred to in + the format string (integers for positional arguments, and strings for + named arguments), and a reference to the *args* and *kwargs* that was + passed to vformat. The set of unused args can be calculated from these + parameters. :meth:`check_unused_args` is assumed to throw an exception if + the check fails. + + .. method:: format_field(value, format_spec) + + :meth:`format_field` simply calls the global :func:`format` built-in. The + method is provided so that subclasses can override it. + + .. method:: convert_field(value, conversion) + + Converts the value (returned by :meth:`get_field`) given a conversion type + (as in the tuple returned by the :meth:`parse` method.) The default + version understands 'r' (repr) and 's' (str) conversion types. + + +.. _formatstrings: + +Format String Syntax +-------------------- + +The :meth:`str.format` method and the :class:`Formatter` class share the same +syntax for format strings (although in the case of :class:`Formatter`, +subclasses can define their own format string syntax.) + +Format strings contain "replacement fields" surrounded by curly braces ``{}``. +Anything that is not contained in braces is considered literal text, which is +copied unchanged to the output. If you need to include a brace character in the +literal text, it can be escaped by doubling: ``{{`` and ``}}``. + +The grammar for a replacement field is as follows: + + .. productionlist:: sf + replacement_field: "{" `field_name` ["!" `conversion`] [":" `format_spec`] "}" + field_name: (`identifier` | `integer`) ("." `attribute_name` | "[" element_index "]")* + attribute_name: `identifier` + element_index: `integer` + conversion: "r" | "s" + format_spec: + +In less formal terms, the replacement field starts with a *field_name*, which +can either be a number (for a positional argument), or an identifier (for +keyword arguments). Following this is an optional *conversion* field, which is +preceded by an exclamation point ``'!'``, and a *format_spec*, which is preceded +by a colon ``':'``. + +The *field_name* itself begins with either a number or a keyword. If it's a +number, it refers to a positional argument, and if it's a keyword it refers to a +named keyword argument. This can be followed by any number of index or +attribute expressions. An expression of the form ``'.name'`` selects the named +attribute using :func:`getattr`, while an expression of the form ``'[index]'`` +does an index lookup using :func:`__getitem__`. + +Some simple format string examples:: + + "First, thou shalt count to {0}" # References first positional argument + "My quest is {name}" # References keyword argument 'name' + "Weight in tons {0.weight}" # 'weight' attribute of first positional arg + "Units destroyed: {players[0]}" # First element of keyword argument 'players'. + +The *conversion* field causes a type coercion before formatting. Normally, the +job of formatting a value is done by the :meth:`__format__` method of the value +itself. However, in some cases it is desirable to force a type to be formatted +as a string, overriding its own definition of formatting. By converting the +value to a string before calling :meth:`__format__`, the normal formatting logic +is bypassed. + +Two conversion flags are currently supported: ``'!s'`` which calls :func:`str` +on the value, and ``'!r'`` which calls :func:`repr`. + +Some examples:: + + "Harold's a clever {0!s}" # Calls str() on the argument first + "Bring out the holy {name!r}" # Calls repr() on the argument first + +The *format_spec* field contains a specification of how the value should be +presented, including such details as field width, alignment, padding, decimal +precision and so on. Each value type can define it's own "formatting +mini-language" or interpretation of the *format_spec*. + +Most built-in types support a common formatting mini-language, which is +described in the next section. + +A *format_spec* field can also include nested replacement fields within it. +These nested replacement fields can contain only a field name; conversion flags +and format specifications are not allowed. The replacement fields within the +format_spec are substituted before the *format_spec* string is interpreted. +This allows the formatting of a value to be dynamically specified. + +For example, suppose you wanted to have a replacement field whose field width is +determined by another variable:: + + "A man with two {0:{1}}".format("noses", 10) + +This would first evaluate the inner replacement field, making the format string +effectively:: + + "A man with two {0:10}" + +Then the outer replacement field would be evaluated, producing:: + + "noses " + +Which is subsitituted into the string, yielding:: + + "A man with two noses " + +(The extra space is because we specified a field width of 10, and because left +alignment is the default for strings.) + + +.. _formatspec: + +Format Specification Mini-Language +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +"Format specifications" are used within replacement fields contained within a +format string to define how individual values are presented (see +:ref:`formatstrings`.) They can also be passed directly to the builtin +:func:`format` function. Each formattable type may define how the format +specification is to be interpreted. + +Most built-in types implement the following options for format specifications, +although some of the formatting options are only supported by the numeric types. + +A general convention is that an empty format string (``""``) produces the same +result as if you had called :func:`str` on the value. + +The general form of a *standard format specifier* is: + +.. productionlist:: sf + format_spec: [[`fill`]`align`][`sign`][0][`width`][.`precision`][`type`] + fill: + align: "<" | ">" | "=" | "^" + sign: "+" | "-" | " " + width: `integer` + precision: `integer` + type: "b" | "c" | "d" | "e" | "E" | "f" | "F" | "g" | "G" | "n" | "o" | "x" | "X" | "%" + +The *fill* character can be any character other than '}' (which signifies the +end of the field). The presence of a fill character is signaled by the *next* +character, which must be one of the alignment options. If the second character +of *format_spec* is not a valid alignment option, then it is assumed that both +the fill character and the alignment option are absent. + +The meaning of the various alignment options is as follows: + + +---------+----------------------------------------------------------+ + | Option | Meaning | + +=========+==========================================================+ + | ``'<'`` | Forces the field to be left-aligned within the available | + | | space (This is the default.) | + +---------+----------------------------------------------------------+ + | ``'>'`` | Forces the field to be right-aligned within the | + | | available space. | + +---------+----------------------------------------------------------+ + | ``'='`` | Forces the padding to be placed after the sign (if any) | + | | but before the digits. This is used for printing fields | + | | in the form '+000000120'. This alignment option is only | + | | valid for numeric types. | + +---------+----------------------------------------------------------+ + | ``'^'`` | Forces the field to be centered within the available | + | | space. | + +---------+----------------------------------------------------------+ + +Note that unless a minimum field width is defined, the field width will always +be the same size as the data to fill it, so that the alignment option has no +meaning in this case. + +The *sign* option is only valid for number types, and can be one of the +following: + + +---------+----------------------------------------------------------+ + | Option | Meaning | + +=========+==========================================================+ + | ``'+'`` | indicates that a sign should be used for both | + | | positive as well as negative numbers. | + +---------+----------------------------------------------------------+ + | ``'-'`` | indicates that a sign should be used only for negative | + | | numbers (this is the default behavior). | + +---------+----------------------------------------------------------+ + | space | indicates that a leading space should be used on | + | | positive numbers, and a minus sign on negative numbers. | + +---------+----------------------------------------------------------+ + +*width* is a decimal integer defining the minimum field width. If not +specified, then the field width will be determined by the content. + +If the *width* field is preceded by a zero (``'0'``) character, this enables +zero-padding. This is equivalent to an *alignment* type of ``'='`` and a *fill* +character of ``'0'``. + +The *precision* is a decimal number indicating how many digits should be +displayed after the decimal point for a floating point value. For non-number +types the field indicates the maximum field size - in other words, how many +characters will be used from the field content. The *precision* is ignored for +integer values. + +Finally, the *type* determines how the data should be presented. + +The available integer presentation types are: + + +---------+----------------------------------------------------------+ + | Type | Meaning | + +=========+==========================================================+ + | ``'b'`` | Binary. Outputs the number in base 2. | + +---------+----------------------------------------------------------+ + | ``'c'`` | Character. Converts the integer to the corresponding | + | | unicode character before printing. | + +---------+----------------------------------------------------------+ + | ``'d'`` | Decimal Integer. Outputs the number in base 10. | + +---------+----------------------------------------------------------+ + | ``'o'`` | Octal format. Outputs the number in base 8. | + +---------+----------------------------------------------------------+ + | ``'x'`` | Hex format. Outputs the number in base 16, using lower- | + | | case letters for the digits above 9. | + +---------+----------------------------------------------------------+ + | ``'X'`` | Hex format. Outputs the number in base 16, using upper- | + | | case letters for the digits above 9. | + +---------+----------------------------------------------------------+ + | ``'n'`` | Number. This is the same as ``'d'``, except that it uses | + | | the current locale setting to insert the appropriate | + | | number separator characters. | + +---------+----------------------------------------------------------+ + | None | the same as ``'d'`` | + +---------+----------------------------------------------------------+ + +The available presentation types for floating point and decimal values are: + + +---------+----------------------------------------------------------+ + | Type | Meaning | + +=========+==========================================================+ + | ``'e'`` | Exponent notation. Prints the number in scientific | + | | notation using the letter 'e' to indicate the exponent. | + +---------+----------------------------------------------------------+ + | ``'E'`` | Exponent notation. Same as ``'e'`` except it uses an | + | | upper case 'E' as the separator character. | + +---------+----------------------------------------------------------+ + | ``'f'`` | Fixed point. Displays the number as a fixed-point | + | | number. | + +---------+----------------------------------------------------------+ + | ``'F'`` | Fixed point. Same as ``'f'``. | + +---------+----------------------------------------------------------+ + | ``'g'`` | General format. This prints the number as a fixed-point | + | | number, unless the number is too large, in which case | + | | it switches to ``'e'`` exponent notation. | + +---------+----------------------------------------------------------+ + | ``'G'`` | General format. Same as ``'g'`` except switches to | + | | ``'E'`` if the number gets to large. | + +---------+----------------------------------------------------------+ + | ``'n'`` | Number. This is the same as ``'g'``, except that it uses | + | | the current locale setting to insert the appropriate | + | | number separator characters. | + +---------+----------------------------------------------------------+ + | ``'%'`` | Percentage. Multiplies the number by 100 and displays | + | | in fixed (``'f'``) format, followed by a percent sign. | + +---------+----------------------------------------------------------+ + | None | the same as ``'g'`` | + +---------+----------------------------------------------------------+ + + Template strings ---------------- diff --git a/Doc/whatsnew/2.6.rst b/Doc/whatsnew/2.6.rst index e274020..c0316f6 100644 --- a/Doc/whatsnew/2.6.rst +++ b/Doc/whatsnew/2.6.rst @@ -612,8 +612,7 @@ can be formatted as a general number or in exponential notation: '3.750000e+00' A variety of presentation types are available. Consult the 2.6 -documentation for a complete list (XXX add link, once it's in the 2.6 -docs), but here's a sample:: +documentation for a :ref:`complete list `; here's a sample:: 'b' - Binary. Outputs the number in base 2. 'c' - Character. Converts the integer to the corresponding -- cgit v0.12