#2836: backport new string formatting docs.

author: Georg Brandl <georg@python.org> 2008-05-12 16:45:43 (GMT)
committer: Georg Brandl <georg@python.org> 2008-05-12 16:45:43 (GMT)
commit: e321c2f37d20fe4c3dc81f966d29216efe486917 (patch)
tree: a04cfc57154ea0e27599d5d9696ccef03f505a9c
parent: 23da6e654586bd59af566c6ed5d3e89bc55e8b23 (diff)
download: cpython-e321c2f37d20fe4c3dc81f966d29216efe486917.zip
cpython-e321c2f37d20fe4c3dc81f966d29216efe486917.tar.gz
cpython-e321c2f37d20fe4c3dc81f966d29216efe486917.tar.bz2
3 files changed, 371 insertions, 3 deletions
diff --git a/Doc/library/stdtypes.rst b/Doc/library/stdtypes.rst
index 5288212..c6679fd 100644
--- a/Doc/library/stdtypes.rst
+++ b/Doc/library/stdtypes.rst
@@ -783,6 +783,28 @@ string functions based on regular expressions.
    found.
 
 
+.. method:: str.format(format_string, *args, **kwargs)
+
+   Perform a string formatting operation.  The *format_string* argument can
+   contain literal text or replacement fields delimited by braces ``{}``.  Each
+   replacement field contains either the numeric index of a positional argument,
+   or the name of a keyword argument.  Returns a copy of *format_string* where
+   each replacement field is replaced with the string value of the corresponding
+   argument.
+
+      >>> "The sum of 1 + 2 is {0}".format(1+2)
+      'The sum of 1 + 2 is 3'
+
+   See :ref:`formatstrings` for a description of the various formatting options
+   that can be specified in format strings.
+
+   This method of string formatting is the new standard in Python 3.0, and
+   should be preferred to the ``%`` formatting described in
+   :ref:`string-formatting` in new code.
+
+   .. versionadded:: 2.6
+
+
 .. method:: str.index(sub[, start[, end]])
 
    Like :meth:`find`, but raise :exc:`ValueError` when the substring is not found.
diff --git a/Doc/library/string.rst b/Doc/library/string.rst
index c188b82..09f9182 100644
--- a/Doc/library/string.rst
+++ b/Doc/library/string.rst
@@ -1,4 +1,3 @@
-
 :mod:`string` --- Common string operations
 ==========================================
 
@@ -104,6 +103,354 @@ The constants defined in this module are:
    :func:`strip` and :func:`split` is undefined.
 
 
+.. _string-formatting:
+
+String Formatting
+-----------------
+
+Starting in Python 2.6, the built-in str and unicode classes provide the ability
+to do complex variable substitutions and value formatting via the :func:`format`
+method described in :pep:`3101`.  The :class:`Formatter` class in the
+:mod:`string` module allows you to create and customize your own string
+formatting behaviors using the same implementation as the built-in
+:meth:`format` method.
+
+.. class:: Formatter
+
+   The :class:`Formatter` class has the following public methods:
+
+   .. method:: format(format_string, *args, *kwargs)
+
+      :meth:`format` is the primary API method.  It takes a format template
+      string, and an arbitrary set of positional and keyword argument.
+      :meth:`format` is just a wrapper that calls :meth:`vformat`.
+
+   .. method:: vformat(format_string, args, kwargs)
+   
+      This function does the actual work of formatting.  It is exposed as a
+      separate function for cases where you want to pass in a predefined
+      dictionary of arguments, rather than unpacking and repacking the
+      dictionary as individual arguments using the ``*args`` and ``**kwds``
+      syntax.  :meth:`vformat` does the work of breaking up the format template
+      string into character data and replacement fields.  It calls the various
+      methods described below.
+
+   In addition, the :class:`Formatter` defines a number of methods that are
+   intended to be replaced by subclasses:
+
+   .. method:: parse(format_string)
+   
+      Loop over the format_string and return an iterable of tuples
+      (*literal_text*, *field_name*, *format_spec*, *conversion*).  This is used
+      by :meth:`vformat` to break the string in to either literal text, or
+      replacement fields.
+      
+      The values in the tuple conceptually represent a span of literal text
+      followed by a single replacement field.  If there is no literal text
+      (which can happen if two replacement fields occur consecutively), then
+      *literal_text* will be a zero-length string.  If there is no replacement
+      field, then the values of *field_name*, *format_spec* and *conversion*
+      will be ``None``.
+
+   .. method:: get_field(field_name, args, kwargs)
+
+      Given *field_name* as returned by :meth:`parse` (see above), convert it to
+      an object to be formatted.  Returns a tuple (obj, used_key).  The default
+      version takes strings of the form defined in :pep:`3101`, such as
+      "0[name]" or "label.title".  *args* and *kwargs* are as passed in to
+      :meth:`vformat`.  The return value *used_key* has the same meaning as the
+      *key* parameter to :meth:`get_value`.
+
+   .. method:: get_value(key, args, kwargs)
+   
+      Retrieve a given field value.  The *key* argument will be either an
+      integer or a string.  If it is an integer, it represents the index of the
+      positional argument in *args*; if it is a string, then it represents a
+      named argument in *kwargs*.
+
+      The *args* parameter is set to the list of positional arguments to
+      :meth:`vformat`, and the *kwargs* parameter is set to the dictionary of
+      keyword arguments.
+
+      For compound field names, these functions are only called for the first
+      component of the field name; Subsequent components are handled through
+      normal attribute and indexing operations.
+
+      So for example, the field expression '0.name' would cause
+      :meth:`get_value` to be called with a *key* argument of 0.  The ``name``
+      attribute will be looked up after :meth:`get_value` returns by calling the
+      built-in :func:`getattr` function.
+
+      If the index or keyword refers to an item that does not exist, then an
+      :exc:`IndexError` or :exc:`KeyError` should be raised.
+
+   .. method:: check_unused_args(used_args, args, kwargs)
+
+      Implement checking for unused arguments if desired.  The arguments to this
+      function is the set of all argument keys that were actually referred to in
+      the format string (integers for positional arguments, and strings for
+      named arguments), and a reference to the *args* and *kwargs* that was
+      passed to vformat.  The set of unused args can be calculated from these
+      parameters.  :meth:`check_unused_args` is assumed to throw an exception if
+      the check fails.
+
+   .. method:: format_field(value, format_spec)
+
+      :meth:`format_field` simply calls the global :func:`format` built-in.  The
+      method is provided so that subclasses can override it.
+
+   .. method:: convert_field(value, conversion)
+   
+      Converts the value (returned by :meth:`get_field`) given a conversion type
+      (as in the tuple returned by the :meth:`parse` method.)  The default
+      version understands 'r' (repr) and 's' (str) conversion types.
+
+
+.. _formatstrings:
+
+Format String Syntax
+--------------------
+
+The :meth:`str.format` method and the :class:`Formatter` class share the same
+syntax for format strings (although in the case of :class:`Formatter`,
+subclasses can define their own format string syntax.)
+
+Format strings contain "replacement fields" surrounded by curly braces ``{}``.
+Anything that is not contained in braces is considered literal text, which is
+copied unchanged to the output.  If you need to include a brace character in the
+literal text, it can be escaped by doubling: ``{{`` and ``}}``.
+
+The grammar for a replacement field is as follows:
+
+   .. productionlist:: sf
+      replacement_field: "{" `field_name` ["!" `conversion`] [":" `format_spec`] "}"
+      field_name: (`identifier` | `integer`) ("." `attribute_name` | "[" element_index "]")*
+      attribute_name: `identifier`
+      element_index: `integer`
+      conversion: "r" | "s"
+      format_spec: <described in the next section>
+      
+In less formal terms, the replacement field starts with a *field_name*, which
+can either be a number (for a positional argument), or an identifier (for
+keyword arguments).  Following this is an optional *conversion* field, which is
+preceded by an exclamation point ``'!'``, and a *format_spec*, which is preceded
+by a colon ``':'``.
+
+The *field_name* itself begins with either a number or a keyword.  If it's a
+number, it refers to a positional argument, and if it's a keyword it refers to a
+named keyword argument.  This can be followed by any number of index or
+attribute expressions. An expression of the form ``'.name'`` selects the named
+attribute using :func:`getattr`, while an expression of the form ``'[index]'``
+does an index lookup using :func:`__getitem__`.
+
+Some simple format string examples::
+
+   "First, thou shalt count to {0}" # References first positional argument
+   "My quest is {name}"             # References keyword argument 'name'
+   "Weight in tons {0.weight}"      # 'weight' attribute of first positional arg
+   "Units destroyed: {players[0]}"  # First element of keyword argument 'players'.
+   
+The *conversion* field causes a type coercion before formatting.  Normally, the
+job of formatting a value is done by the :meth:`__format__` method of the value
+itself.  However, in some cases it is desirable to force a type to be formatted
+as a string, overriding its own definition of formatting.  By converting the
+value to a string before calling :meth:`__format__`, the normal formatting logic
+is bypassed.
+
+Two conversion flags are currently supported: ``'!s'`` which calls :func:`str`
+on the value, and ``'!r'`` which calls :func:`repr`.
+
+Some examples::
+
+   "Harold's a clever {0!s}"        # Calls str() on the argument first
+   "Bring out the holy {name!r}"    # Calls repr() on the argument first
+
+The *format_spec* field contains a specification of how the value should be
+presented, including such details as field width, alignment, padding, decimal
+precision and so on.  Each value type can define it's own "formatting
+mini-language" or interpretation of the *format_spec*.
+
+Most built-in types support a common formatting mini-language, which is
+described in the next section.
+
+A *format_spec* field can also include nested replacement fields within it.
+These nested replacement fields can contain only a field name; conversion flags
+and format specifications are not allowed.  The replacement fields within the
+format_spec are substituted before the *format_spec* string is interpreted.
+This allows the formatting of a value to be dynamically specified.
+
+For example, suppose you wanted to have a replacement field whose field width is
+determined by another variable::
+
+   "A man with two {0:{1}}".format("noses", 10)
+
+This would first evaluate the inner replacement field, making the format string
+effectively::
+
+   "A man with two {0:10}"
+
+Then the outer replacement field would be evaluated, producing::
+
+   "noses     "
+   
+Which is subsitituted into the string, yielding::
+   
+   "A man with two noses     "
+   
+(The extra space is because we specified a field width of 10, and because left
+alignment is the default for strings.)
+
+
+.. _formatspec:
+
+Format Specification Mini-Language
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+"Format specifications" are used within replacement fields contained within a
+format string to define how individual values are presented (see
+:ref:`formatstrings`.)  They can also be passed directly to the builtin
+:func:`format` function.  Each formattable type may define how the format
+specification is to be interpreted.
+
+Most built-in types implement the following options for format specifications,
+although some of the formatting options are only supported by the numeric types.
+
+A general convention is that an empty format string (``""``) produces the same
+result as if you had called :func:`str` on the value.
+
+The general form of a *standard format specifier* is:
+
+.. productionlist:: sf
+   format_spec: [[`fill`]`align`][`sign`][0][`width`][.`precision`][`type`]
+   fill: <a character other than '}'>
+   align: "<" | ">" | "=" | "^"
+   sign: "+" | "-" | " "
+   width: `integer`
+   precision: `integer`
+   type: "b" | "c" | "d" | "e" | "E" | "f" | "F" | "g" | "G" | "n" | "o" | "x" | "X" | "%"
+   
+The *fill* character can be any character other than '}' (which signifies the
+end of the field).  The presence of a fill character is signaled by the *next*
+character, which must be one of the alignment options. If the second character
+of *format_spec* is not a valid alignment option, then it is assumed that both
+the fill character and the alignment option are absent.
+
+The meaning of the various alignment options is as follows:
+
+   +---------+----------------------------------------------------------+
+   | Option  | Meaning                                                  |
+   +=========+==========================================================+
+   | ``'<'`` | Forces the field to be left-aligned within the available |
+   |         | space (This is the default.)                             |
+   +---------+----------------------------------------------------------+
+   | ``'>'`` | Forces the field to be right-aligned within the          |
+   |         | available space.                                         |
+   +---------+----------------------------------------------------------+
+   | ``'='`` | Forces the padding to be placed after the sign (if any)  |
+   |         | but before the digits.  This is used for printing fields |
+   |         | in the form '+000000120'. This alignment option is only  |
+   |         | valid for numeric types.                                 |
+   +---------+----------------------------------------------------------+
+   | ``'^'`` | Forces the field to be centered within the available     |
+   |         | space.                                                   |
+   +---------+----------------------------------------------------------+
+
+Note that unless a minimum field width is defined, the field width will always
+be the same size as the data to fill it, so that the alignment option has no
+meaning in this case.
+
+The *sign* option is only valid for number types, and can be one of the
+following:
+
+   +---------+----------------------------------------------------------+
+   | Option  | Meaning                                                  |
+   +=========+==========================================================+
+   | ``'+'`` | indicates that a sign should be used for both            |
+   |         | positive as well as negative numbers.                    |
+   +---------+----------------------------------------------------------+
+   | ``'-'`` | indicates that a sign should be used only for negative   |
+   |         | numbers (this is the default behavior).                  |
+   +---------+----------------------------------------------------------+
+   | space   | indicates that a leading space should be used on         |
+   |         | positive numbers, and a minus sign on negative numbers.  |
+   +---------+----------------------------------------------------------+
+
+*width* is a decimal integer defining the minimum field width.  If not
+specified, then the field width will be determined by the content.
+
+If the *width* field is preceded by a zero (``'0'``) character, this enables
+zero-padding.  This is equivalent to an *alignment* type of ``'='`` and a *fill*
+character of ``'0'``.
+
+The *precision* is a decimal number indicating how many digits should be
+displayed after the decimal point for a floating point value.  For non-number
+types the field indicates the maximum field size - in other words, how many
+characters will be used from the field content. The *precision* is ignored for
+integer values.
+
+Finally, the *type* determines how the data should be presented.
+
+The available integer presentation types are:
+
+   +---------+----------------------------------------------------------+
+   | Type    | Meaning                                                  |
+   +=========+==========================================================+
+   | ``'b'`` | Binary. Outputs the number in base 2.                    |
+   +---------+----------------------------------------------------------+
+   | ``'c'`` | Character. Converts the integer to the corresponding     |
+   |         | unicode character before printing.                       |
+   +---------+----------------------------------------------------------+
+   | ``'d'`` | Decimal Integer. Outputs the number in base 10.          |
+   +---------+----------------------------------------------------------+
+   | ``'o'`` | Octal format. Outputs the number in base 8.              |
+   +---------+----------------------------------------------------------+
+   | ``'x'`` | Hex format. Outputs the number in base 16, using lower-  |
+   |         | case letters for the digits above 9.                     |
+   +---------+----------------------------------------------------------+
+   | ``'X'`` | Hex format. Outputs the number in base 16, using upper-  |
+   |         | case letters for the digits above 9.                     |
+   +---------+----------------------------------------------------------+
+   | ``'n'`` | Number. This is the same as ``'d'``, except that it uses |
+   |         | the current locale setting to insert the appropriate     |
+   |         | number separator characters.                             |
+   +---------+----------------------------------------------------------+
+   | None    | the same as ``'d'``                                      |
+   +---------+----------------------------------------------------------+
+                                                                         
+The available presentation types for floating point and decimal values are:
+                                                                         
+   +---------+----------------------------------------------------------+
+   | Type    | Meaning                                                  |
+   +=========+==========================================================+
+   | ``'e'`` | Exponent notation. Prints the number in scientific       |
+   |         | notation using the letter 'e' to indicate the exponent.  |
+   +---------+----------------------------------------------------------+
+   | ``'E'`` | Exponent notation. Same as ``'e'`` except it uses an     |
+   |         | upper case 'E' as the separator character.               |
+   +---------+----------------------------------------------------------+
+   | ``'f'`` | Fixed point. Displays the number as a fixed-point        |
+   |         | number.                                                  |
+   +---------+----------------------------------------------------------+
+   | ``'F'`` | Fixed point. Same as ``'f'``.                            |
+   +---------+----------------------------------------------------------+
+   | ``'g'`` | General format. This prints the number as a fixed-point  |
+   |         | number, unless the number is too large, in which case    |
+   |         | it switches to ``'e'`` exponent notation.                |
+   +---------+----------------------------------------------------------+
+   | ``'G'`` | General format. Same as ``'g'`` except switches to       |
+   |         | ``'E'`` if the number gets to large.                     |
+   +---------+----------------------------------------------------------+
+   | ``'n'`` | Number. This is the same as ``'g'``, except that it uses |
+   |         | the current locale setting to insert the appropriate     |
+   |         | number separator characters.                             |
+   +---------+----------------------------------------------------------+
+   | ``'%'`` | Percentage. Multiplies the number by 100 and displays    |
+   |         | in fixed (``'f'``) format, followed by a percent sign.   |
+   +---------+----------------------------------------------------------+
+   | None    | the same as ``'g'``                                      |
+   +---------+----------------------------------------------------------+
+
+
 Template strings
 ----------------
 
diff --git a/Doc/whatsnew/2.6.rst b/Doc/whatsnew/2.6.rst
index e274020..c0316f6 100644
--- a/Doc/whatsnew/2.6.rst
+++ b/Doc/whatsnew/2.6.rst
@@ -612,8 +612,7 @@ can be formatted as a general number or in exponential notation:
     '3.750000e+00'
 
 A variety of presentation types are available.  Consult the 2.6
-documentation for a complete list (XXX add link, once it's in the 2.6
-docs), but here's a sample::
+documentation for a :ref:`complete list <formatstrings>`; here's a sample::
 
         'b' - Binary. Outputs the number in base 2.
         'c' - Character. Converts the integer to the corresponding
author	Georg Brandl <georg@python.org>	2008-05-12 16:45:43 (GMT)
committer	Georg Brandl <georg@python.org>	2008-05-12 16:45:43 (GMT)
commit	e321c2f37d20fe4c3dc81f966d29216efe486917 (patch)
tree	a04cfc57154ea0e27599d5d9696ccef03f505a9c
parent	23da6e654586bd59af566c6ed5d3e89bc55e8b23 (diff)
download	cpython-e321c2f37d20fe4c3dc81f966d29216efe486917.zip cpython-e321c2f37d20fe4c3dc81f966d29216efe486917.tar.gz cpython-e321c2f37d20fe4c3dc81f966d29216efe486917.tar.bz2