Issue #8593: Fix, reorder and improve the documentation for argument parsing

author: Antoine Pitrou <solipsis@pitrou.net> 2010-05-03 15:57:23 (GMT)
committer: Antoine Pitrou <solipsis@pitrou.net> 2010-05-03 15:57:23 (GMT)
commit: 83fd9b97b8e206260fe31b27b421d617fb5b9e1b (patch)
tree: 515c16276aba7beb3671b915b6dbbd1949e6f546 /Doc/c-api/arg.rst
parent: d23ea06175c62b166d8a6c7a2efc090bf7ead779 (diff)
download: cpython-83fd9b97b8e206260fe31b27b421d617fb5b9e1b.zip
cpython-83fd9b97b8e206260fe31b27b421d617fb5b9e1b.tar.gz
cpython-83fd9b97b8e206260fe31b27b421d617fb5b9e1b.tar.bz2
1 files changed, 142 insertions, 97 deletions
diff --git a/Doc/c-api/arg.rst b/Doc/c-api/arg.rst
index 0eb77ff..bf6ef5c 100644
--- a/Doc/c-api/arg.rst
+++ b/Doc/c-api/arg.rst
@@ -14,6 +14,10 @@ The first three of these functions described, :cfunc:`PyArg_ParseTuple`,
 strings* which are used to tell the function about the expected arguments.  The
 format strings use the same syntax for each of these functions.
 
+-----------------
+Parsing arguments
+-----------------
+
 A format string consists of zero or more "format units."  A format unit
 describes one Python object; it is usually a single character or a parenthesized
 sequence of format units.  With a few exceptions, a format unit that is not a
@@ -23,75 +27,108 @@ unit; the entry in (round) parentheses is the Python object type that matches
 the format unit; and the entry in [square] brackets is the type of the C
 variable(s) whose address should be passed.
 
-``s`` (string or Unicode object) [const char \*]
-   Convert a Python string or Unicode object to a C pointer to a character string.
-   You must not provide storage for the string itself; a pointer to an existing
-   string is stored into the character pointer variable whose address you pass.
-   The C string is NUL-terminated.  The Python string must not contain embedded NUL
-   bytes; if it does, a :exc:`TypeError` exception is raised. Unicode objects are
-   converted to C strings using the default encoding.  If this conversion fails, a
-   :exc:`UnicodeError` is raised.
+Strings and buffers
+-------------------
 
-   Starting with Python 2.5 the type of the length argument can be
-   controlled by defining the macro :cmacro:`PY_SSIZE_T_CLEAN` before
-   including :file:`Python.h`.  If the macro is defined, length is a
-   :ctype:`Py_ssize_t` rather than an int.
-
-``s*`` (string, Unicode, or any buffer compatible object) [Py_buffer]
-   This is similar to ``s``, but the code fills a :ctype:`Py_buffer` structure
-   provided by the caller.  In this case the Python string may contain embedded
-   null bytes.  Unicode objects pass back a pointer to the default encoded
-   string version of the object if such a conversion is possible.  The
-   underlying buffer is locked, so that the caller can subsequently use the
-   buffer even inside a ``Py_BEGIN_ALLOW_THREADS`` block.  **The caller is
-   responsible** for calling ``PyBuffer_Release`` with the structure after it
-   has processed the data.
+These formats do not expect you to provide raw storage for the returned string
+or bytes.  Also, you won't have to release any memory yourself, except with
+the ``es``, ``es#``, ``et`` and ``et#`` formats.
 
-``s#`` (string, Unicode or any read buffer compatible object) [const char \*, int or :ctype:`Py_ssize_t`]
-   This variant on ``s`` stores into two C variables, the first one a pointer to
-   a character string, the second one its length.  In this case the Python
-   string may contain embedded null bytes.  Unicode objects pass back a pointer
-   to the default encoded string version of the object if such a conversion is
-   possible.  All other read-buffer compatible objects pass back a reference to
-   the raw internal data representation.  Since this format doesn't allow
-   writable buffer compatible objects like byte arrays, ``s*`` is to be
-   preferred.
-
-   The type of the length argument (int or :ctype:`Py_ssize_t`) is controlled by
+However, when a :ctype:`Py_buffer` structure gets filled, the underlying
+buffer is locked so that the caller can subsequently use the buffer even
+inside a ``Py_BEGIN_ALLOW_THREADS`` block without the risk of mutable data
+being resized or destroyed.  As a result, **you have to call**
+:cfunc:`PyBuffer_Release` after you have finished processing the data (or
+in any early abort case).
+
+Unless otherwise stated, buffers are not NUL-terminated.
+
+.. note::
+   For all ``#`` variants of formats (``s#``, ``y#``, etc.), the type of
+   the length argument (int or :ctype:`Py_ssize_t`) is controlled by
    defining the macro :cmacro:`PY_SSIZE_T_CLEAN` before including
-   :file:`Python.h`.  If the macro was defined, length is a :ctype:`Py_ssize_t`
-   rather than an int.  This behavior will change in a future Python version to
-   only support :ctype:`Py_ssize_t` and drop int support.  It is best to always
-   define :cmacro:`PY_SSIZE_T_CLEAN`.
+   :file:`Python.h`.  If the macro was defined, length is a
+   :ctype:`Py_ssize_t` rather than an int.  This behavior will change
+   in a future Python version to only support :ctype:`Py_ssize_t` and
+   drop int support.  It is best to always define :cmacro:`PY_SSIZE_T_CLEAN`.
+
+
+``s`` (Unicode object) [const char \*]
+   Convert a Unicode object to a C pointer to a character string.
+   A pointer to an existing string is stored in the character pointer
+   variable whose address you pass.  The C string is NUL-terminated.
+   The Python string must not contain embedded NUL bytes; if it does,
+   a :exc:`TypeError` exception is raised. Unicode objects are converted
+   to C strings using the default encoding.  If this conversion fails, a
+   :exc:`UnicodeError` is raised.
 
-``y`` (bytes object) [const char \*]
-   This variant on ``s`` converts a Python bytes or bytearray object to a C
-   pointer to a character string.  The bytes object must not contain embedded
-   NUL bytes; if it does, a :exc:`TypeError` exception is raised.
+   .. note::
+      This format does not accept bytes-like objects.  If you want to accept
+      filesystem paths and convert them to C character strings, it is
+      preferrable to use the ``O&`` format with :cfunc:`PyUnicode_FSConverter`
+      as *converter*.
 
-``y*`` (bytes object) [Py_buffer \*]
-   This is to ``s*`` as ``y`` is to ``s``.
+``s*`` (Unicode object or any buffer compatible object) [Py_buffer]
+   This format accepts Unicode objects as well as objects supporting the
+   buffer protocol (such as :class:`bytes` or :class:`bytearray` objects).
+   It fills a :ctype:`Py_buffer` structure provided by the caller.
+   Unicode objects are converted to C strings using the default encoding.
+   In this case the resulting C string may contain embedded NUL bytes.
 
-``y#`` (bytes object) [const char \*, int]
-   This variant on ``s#`` stores into two C variables, the first one a pointer
-   to a character string, the second one its length.  This only accepts bytes
-   objects, no byte arrays.
+``s#`` (string, Unicode or any read buffer compatible object) [const char \*, int or :ctype:`Py_ssize_t`]
+   Like ``s*``, except that it doesn't accept mutable buffer-like objects
+   such as :class:`bytearray`.  The result is stored into two C variables,
+   the first one a pointer to a C string, the second one its length.
+   The string may contain embedded null bytes.
 
-``z`` (string or ``None``) [const char \*]
+``z`` (Unicode object or ``None``) [const char \*]
    Like ``s``, but the Python object may also be ``None``, in which case the C
    pointer is set to *NULL*.
 
-``z*`` (string or ``None`` or any buffer compatible object) [Py_buffer]
-   This is to ``s*`` as ``z`` is to ``s``.
+``z*`` (Unicode object or ``None`` or any buffer compatible object) [Py_buffer]
+   Like ``s*``, but the Python object may also be ``None``, in which case the
+   ``buf`` member of the :ctype:`Py_buffer` structure is set to *NULL*.
 
-``z#`` (string or ``None`` or any read buffer compatible object) [const char \*, int]
-   This is to ``s#`` as ``z`` is to ``s``.
+``z#`` (Unicode object or ``None`` or any read buffer compatible object) [const char \*, int]
+   Like ``s#``, but the Python object may also be ``None``, in which case the C
+   pointer is set to *NULL*.
+
+``y`` (bytes object) [const char \*]
+   This format converts a bytes-like object to a C pointer to a character
+   string; it does not accept Unicode objects.  The bytes buffer must not
+   contain embedded NUL bytes; if it does, a :exc:`TypeError`
+   exception is raised.
+
+``y*`` (any buffer compatible object) [Py_buffer \*]
+   This variant on ``s*`` doesn't accept Unicode objects, only objects
+   supporting the buffer protocol.  **This is the recommended way to accept
+   binary data.**
+
+``y#`` (bytes object) [const char \*, int]
+   This variant on ``s#`` doesn't accept Unicode objects, only bytes-like
+   objects.
+
+``S`` (bytes object) [PyBytesObject \*]
+   Requires that the Python object is a :class:`bytes` object, without
+   attempting any conversion.  Raises :exc:`TypeError` if the object is not
+   a bytes object.  The C variable may also be declared as :ctype:`PyObject\*`.
+
+``Y`` (bytearray object) [PyByteArrayObject \*]
+   Requires that the Python object is a :class:`bytearray` object, without
+   attempting any conversion.  Raises :exc:`TypeError` if the object is not
+   a bytearray object.  The C variable may also be declared as :ctype:`PyObject\*`.
 
 ``u`` (Unicode object) [Py_UNICODE \*]
    Convert a Python Unicode object to a C pointer to a NUL-terminated buffer of
-   16-bit Unicode (UTF-16) data.  As with ``s``, there is no need to provide
-   storage for the Unicode data buffer; a pointer to the existing Unicode data is
-   stored into the :ctype:`Py_UNICODE` pointer variable whose address you pass.
+   Unicode characters.  You must pass the address of a :ctype:`Py_UNICODE`
+   pointer variable, which will be filled with the pointer to an existing
+   Unicode buffer.  Please note that the width of a :ctype:`Py_UNICODE`
+   character depends on compilation options (it is either 16 or 32 bits).
+
+   ..note ::
+      Since ``u`` doesn't give you back the length of the string, and it
+      may contain embedded NUL characters, it is recommended to use ``u#``
+      or ``U`` instead.
 
 ``u#`` (Unicode object) [Py_UNICODE \*, int]
    This variant on ``u`` stores into two C variables, the first one a pointer to a
@@ -100,11 +137,40 @@ variable(s) whose address should be passed.
    array.
 
 ``Z`` (Unicode or ``None``) [Py_UNICODE \*]
-   Like ``s``, but the Python object may also be ``None``, in which case the C
-   pointer is set to *NULL*.
+   Like ``u``, but the Python object may also be ``None``, in which case the
+   :ctype:`Py_UNICODE` pointer is set to *NULL*.
 
 ``Z#`` (Unicode or ``None``) [Py_UNICODE \*, int]
-   This is to ``u#`` as ``Z`` is to ``u``.
+   Like ``u#``, but the Python object may also be ``None``, in which case the
+   :ctype:`Py_UNICODE` pointer is set to *NULL*.
+
+``U`` (Unicode object) [PyUnicodeObject \*]
+   Requires that the Python object is a Unicode object, without attempting
+   any conversion.  Raises :exc:`TypeError` if the object is not a Unicode
+   object.  The C variable may also be declared as :ctype:`PyObject\*`.
+
+``t#`` (read-only character buffer) [char \*, int]
+   Like ``s#``, but accepts any object which implements the read-only buffer
+   interface.  The :ctype:`char\*` variable is set to point to the first byte of
+   the buffer, and the :ctype:`int` is set to the length of the buffer.  Only
+   single-segment buffer objects are accepted; :exc:`TypeError` is raised for all
+   others.
+
+``w`` (read-write character buffer) [char \*]
+   Similar to ``s``, but accepts any object which implements the read-write buffer
+   interface.  The caller must determine the length of the buffer by other means,
+   or use ``w#`` instead.  Only single-segment buffer objects are accepted;
+   :exc:`TypeError` is raised for all others.
+
+``w*`` (read-write byte-oriented buffer) [Py_buffer]
+   This is to ``w`` what ``s*`` is to ``s``.
+
+``w#`` (read-write character buffer) [char \*, int]
+   Like ``s#``, but accepts any object which implements the read-write buffer
+   interface.  The :ctype:`char \*` variable is set to point to the first byte
+   of the buffer, and the :ctype:`int` is set to the length of the buffer.
+   Only single-segment buffer objects are accepted; :exc:`TypeError` is raised
+   for all others.
 
 ``es`` (string, Unicode object or character buffer compatible object) [const char \*encoding, char \*\*buffer]
    This variant on ``s`` is used for encoding Unicode and objects convertible to
@@ -165,6 +231,9 @@ variable(s) whose address should be passed.
    them. Instead, the implementation assumes that the string object uses the
    encoding passed in as parameter.
 
+Numbers
+-------
+
 ``b`` (integer) [unsigned char]
    Convert a nonnegative Python integer to an unsigned tiny int, stored in a C
    :ctype:`unsigned char`.
@@ -207,13 +276,13 @@ variable(s) whose address should be passed.
 ``n`` (integer) [Py_ssize_t]
    Convert a Python integer to a C :ctype:`Py_ssize_t`.
 
-``c`` (string of length 1) [char]
-   Convert a Python character, represented as a byte string of length 1, to a C
-   :ctype:`char`.
+``c`` (bytes object of length 1) [char]
+   Convert a Python byte, represented as a :class:`bytes` object of length 1,
+   to a C :ctype:`char`.
 
-``C`` (string of length 1) [int]
-   Convert a Python character, represented as a unicode string of length 1, to a
-   C :ctype:`int`.
+``C`` (Unicode object of length 1) [int]
+   Convert a Python character, represented as a :class:`str`: object of
+   length 1, to a C :ctype:`int`.
 
 ``f`` (float) [float]
    Convert a Python floating point number to a C :ctype:`float`.
@@ -224,6 +293,9 @@ variable(s) whose address should be passed.
 ``D`` (complex) [Py_complex]
    Convert a Python complex number to a C :ctype:`Py_complex` structure.
 
+Other objects
+-------------
+
 ``O`` (object) [PyObject \*]
    Store a Python object (without any conversion) in a C object pointer.  The C
    program thus receives the actual object that was passed.  The object's reference
@@ -258,39 +330,6 @@ variable(s) whose address should be passed.
    .. versionchanged:: 3.1
       Py_CLEANUP_SUPPORTED was added.
 
-``S`` (string) [PyStringObject \*]
-   Like ``O`` but requires that the Python object is a string object.  Raises
-   :exc:`TypeError` if the object is not a string object.  The C variable may also
-   be declared as :ctype:`PyObject\*`.
-
-``U`` (Unicode string) [PyUnicodeObject \*]
-   Like ``O`` but requires that the Python object is a Unicode object.  Raises
-   :exc:`TypeError` if the object is not a Unicode object.  The C variable may also
-   be declared as :ctype:`PyObject\*`.
-
-``t#`` (read-only character buffer) [char \*, int]
-   Like ``s#``, but accepts any object which implements the read-only buffer
-   interface.  The :ctype:`char\*` variable is set to point to the first byte of
-   the buffer, and the :ctype:`int` is set to the length of the buffer.  Only
-   single-segment buffer objects are accepted; :exc:`TypeError` is raised for all
-   others.
-
-``w`` (read-write character buffer) [char \*]
-   Similar to ``s``, but accepts any object which implements the read-write buffer
-   interface.  The caller must determine the length of the buffer by other means,
-   or use ``w#`` instead.  Only single-segment buffer objects are accepted;
-   :exc:`TypeError` is raised for all others.
-
-``w*`` (read-write byte-oriented buffer) [Py_buffer]
-   This is to ``w`` what ``s*`` is to ``s``.
-
-``w#`` (read-write character buffer) [char \*, int]
-   Like ``s#``, but accepts any object which implements the read-write buffer
-   interface.  The :ctype:`char \*` variable is set to point to the first byte
-   of the buffer, and the :ctype:`int` is set to the length of the buffer.
-   Only single-segment buffer objects are accepted; :exc:`TypeError` is raised
-   for all others.
-
 ``(items)`` (tuple) [*matching-items*]
    The object must be a Python sequence whose length is the number of format units
    in *items*.  The C arguments must correspond to the individual format units in
@@ -339,6 +378,8 @@ false and raise an appropriate exception. When the
 of the format units, the variables at the addresses corresponding to that
 and the following format units are left untouched.
 
+API Functions
+-------------
 
 .. cfunction:: int PyArg_ParseTuple(PyObject *args, const char *format, ...)
 
@@ -424,6 +465,10 @@ and the following format units are left untouched.
       PyArg_ParseTuple(args, "O|O:ref", &object, &callback)
 
 
+---------------
+Building values
+---------------
+
 .. cfunction:: PyObject* Py_BuildValue(const char *format, ...)
 
    Create a new value based on a format string similar to those accepted by the
author	Antoine Pitrou <solipsis@pitrou.net>	2010-05-03 15:57:23 (GMT)
committer	Antoine Pitrou <solipsis@pitrou.net>	2010-05-03 15:57:23 (GMT)
commit	83fd9b97b8e206260fe31b27b421d617fb5b9e1b (patch)
tree	515c16276aba7beb3671b915b6dbbd1949e6f546 /Doc/c-api/arg.rst
parent	d23ea06175c62b166d8a6c7a2efc090bf7ead779 (diff)
download	cpython-83fd9b97b8e206260fe31b27b421d617fb5b9e1b.zip cpython-83fd9b97b8e206260fe31b27b421d617fb5b9e1b.tar.gz cpython-83fd9b97b8e206260fe31b27b421d617fb5b9e1b.tar.bz2