Split "Extending & Embedding" into separate files, one per chapter.

author: Fred Drake <fdrake@acm.org> 2001-08-20 19:30:29 (GMT)
committer: Fred Drake <fdrake@acm.org> 2001-08-20 19:30:29 (GMT)
commit: cc8f44b8847d65ba62b3d34bf4b7613414ba0fae (patch)
tree: 701be3a763a37672598eae3a6c3c2e540f608b32 /Doc/ext/extending.tex
parent: 1ba6bada67b2eca079b13f2deebff91696df909b (diff)
download: cpython-cc8f44b8847d65ba62b3d34bf4b7613414ba0fae.zip
cpython-cc8f44b8847d65ba62b3d34bf4b7613414ba0fae.tar.gz
cpython-cc8f44b8847d65ba62b3d34bf4b7613414ba0fae.tar.bz2
1 files changed, 1695 insertions, 0 deletions
diff --git a/Doc/ext/extending.tex b/Doc/ext/extending.tex
new file mode 100644
index 0000000..ee1b678
--- /dev/null
+++ b/Doc/ext/extending.tex
@@ -0,0 +1,1695 @@
+\chapter{Extending Python with C or \Cpp{} \label{intro}}
+
+
+It is quite easy to add new built-in modules to Python, if you know
+how to program in C.  Such \dfn{extension modules} can do two things
+that can't be done directly in Python: they can implement new built-in
+object types, and they can call C library functions and system calls.
+
+To support extensions, the Python API (Application Programmers
+Interface) defines a set of functions, macros and variables that
+provide access to most aspects of the Python run-time system.  The
+Python API is incorporated in a C source file by including the header
+\code{"Python.h"}.
+
+The compilation of an extension module depends on its intended use as
+well as on your system setup; details are given in later chapters.
+
+
+\section{A Simple Example
+         \label{simpleExample}}
+
+Let's create an extension module called \samp{spam} (the favorite food
+of Monty Python fans...) and let's say we want to create a Python
+interface to the C library function \cfunction{system()}.\footnote{An
+interface for this function already exists in the standard module
+\module{os} --- it was chosen as a simple and straightfoward example.}
+This function takes a null-terminated character string as argument and
+returns an integer.  We want this function to be callable from Python
+as follows:
+
+\begin{verbatim}
+>>> import spam
+>>> status = spam.system("ls -l")
+\end{verbatim}
+
+Begin by creating a file \file{spammodule.c}.  (Historically, if a
+module is called \samp{spam}, the C file containing its implementation
+is called \file{spammodule.c}; if the module name is very long, like
+\samp{spammify}, the module name can be just \file{spammify.c}.)
+
+The first line of our file can be:
+
+\begin{verbatim}
+#include <Python.h>
+\end{verbatim}
+
+which pulls in the Python API (you can add a comment describing the
+purpose of the module and a copyright notice if you like).
+
+All user-visible symbols defined by \code{"Python.h"} have a prefix of
+\samp{Py} or \samp{PY}, except those defined in standard header files.
+For convenience, and since they are used extensively by the Python
+interpreter, \code{"Python.h"} includes a few standard header files:
+\code{<stdio.h>}, \code{<string.h>}, \code{<errno.h>}, and
+\code{<stdlib.h>}.  If the latter header file does not exist on your
+system, it declares the functions \cfunction{malloc()},
+\cfunction{free()} and \cfunction{realloc()} directly.
+
+The next thing we add to our module file is the C function that will
+be called when the Python expression \samp{spam.system(\var{string})}
+is evaluated (we'll see shortly how it ends up being called):
+
+\begin{verbatim}
+static PyObject *
+spam_system(self, args)
+    PyObject *self;
+    PyObject *args;
+{
+    char *command;
+    int sts;
+
+    if (!PyArg_ParseTuple(args, "s", &command))
+        return NULL;
+    sts = system(command);
+    return Py_BuildValue("i", sts);
+}
+\end{verbatim}
+
+There is a straightforward translation from the argument list in
+Python (for example, the single expression \code{"ls -l"}) to the
+arguments passed to the C function.  The C function always has two
+arguments, conventionally named \var{self} and \var{args}.
+
+The \var{self} argument is only used when the C function implements a
+built-in method, not a function. In the example, \var{self} will
+always be a \NULL{} pointer, since we are defining a function, not a
+method.  (This is done so that the interpreter doesn't have to
+understand two different types of C functions.)
+
+The \var{args} argument will be a pointer to a Python tuple object
+containing the arguments.  Each item of the tuple corresponds to an
+argument in the call's argument list.  The arguments are Python
+objects --- in order to do anything with them in our C function we have
+to convert them to C values.  The function \cfunction{PyArg_ParseTuple()}
+in the Python API checks the argument types and converts them to C
+values.  It uses a template string to determine the required types of
+the arguments as well as the types of the C variables into which to
+store the converted values.  More about this later.
+
+\cfunction{PyArg_ParseTuple()} returns true (nonzero) if all arguments have
+the right type and its components have been stored in the variables
+whose addresses are passed.  It returns false (zero) if an invalid
+argument list was passed.  In the latter case it also raises an
+appropriate exception so the calling function can return
+\NULL{} immediately (as we saw in the example).
+
+
+\section{Intermezzo: Errors and Exceptions
+         \label{errors}}
+
+An important convention throughout the Python interpreter is the
+following: when a function fails, it should set an exception condition
+and return an error value (usually a \NULL{} pointer).  Exceptions
+are stored in a static global variable inside the interpreter; if this
+variable is \NULL{} no exception has occurred.  A second global
+variable stores the ``associated value'' of the exception (the second
+argument to \keyword{raise}).  A third variable contains the stack
+traceback in case the error originated in Python code.  These three
+variables are the C equivalents of the Python variables
+\code{sys.exc_type}, \code{sys.exc_value} and \code{sys.exc_traceback} (see
+the section on module \module{sys} in the
+\citetitle[../lib/lib.html]{Python Library Reference}).  It is
+important to know about them to understand how errors are passed
+around.
+
+The Python API defines a number of functions to set various types of
+exceptions.
+
+The most common one is \cfunction{PyErr_SetString()}.  Its arguments
+are an exception object and a C string.  The exception object is
+usually a predefined object like \cdata{PyExc_ZeroDivisionError}.  The
+C string indicates the cause of the error and is converted to a
+Python string object and stored as the ``associated value'' of the
+exception.
+
+Another useful function is \cfunction{PyErr_SetFromErrno()}, which only
+takes an exception argument and constructs the associated value by
+inspection of the global variable \cdata{errno}.  The most
+general function is \cfunction{PyErr_SetObject()}, which takes two object
+arguments, the exception and its associated value.  You don't need to
+\cfunction{Py_INCREF()} the objects passed to any of these functions.
+
+You can test non-destructively whether an exception has been set with
+\cfunction{PyErr_Occurred()}.  This returns the current exception object,
+or \NULL{} if no exception has occurred.  You normally don't need
+to call \cfunction{PyErr_Occurred()} to see whether an error occurred in a
+function call, since you should be able to tell from the return value.
+
+When a function \var{f} that calls another function \var{g} detects
+that the latter fails, \var{f} should itself return an error value
+(usually \NULL{} or \code{-1}).  It should \emph{not} call one of the
+\cfunction{PyErr_*()} functions --- one has already been called by \var{g}.
+\var{f}'s caller is then supposed to also return an error indication
+to \emph{its} caller, again \emph{without} calling \cfunction{PyErr_*()},
+and so on --- the most detailed cause of the error was already
+reported by the function that first detected it.  Once the error
+reaches the Python interpreter's main loop, this aborts the currently
+executing Python code and tries to find an exception handler specified
+by the Python programmer.
+
+(There are situations where a module can actually give a more detailed
+error message by calling another \cfunction{PyErr_*()} function, and in
+such cases it is fine to do so.  As a general rule, however, this is
+not necessary, and can cause information about the cause of the error
+to be lost: most operations can fail for a variety of reasons.)
+
+To ignore an exception set by a function call that failed, the exception
+condition must be cleared explicitly by calling \cfunction{PyErr_Clear()}. 
+The only time C code should call \cfunction{PyErr_Clear()} is if it doesn't
+want to pass the error on to the interpreter but wants to handle it
+completely by itself (possibly by trying something else, or pretending
+nothing went wrong).
+
+Every failing \cfunction{malloc()} call must be turned into an
+exception --- the direct caller of \cfunction{malloc()} (or
+\cfunction{realloc()}) must call \cfunction{PyErr_NoMemory()} and
+return a failure indicator itself.  All the object-creating functions
+(for example, \cfunction{PyInt_FromLong()}) already do this, so this
+note is only relevant to those who call \cfunction{malloc()} directly.
+
+Also note that, with the important exception of
+\cfunction{PyArg_ParseTuple()} and friends, functions that return an
+integer status usually return a positive value or zero for success and
+\code{-1} for failure, like \UNIX{} system calls.
+
+Finally, be careful to clean up garbage (by making
+\cfunction{Py_XDECREF()} or \cfunction{Py_DECREF()} calls for objects
+you have already created) when you return an error indicator!
+
+The choice of which exception to raise is entirely yours.  There are
+predeclared C objects corresponding to all built-in Python exceptions,
+such as \cdata{PyExc_ZeroDivisionError}, which you can use directly.
+Of course, you should choose exceptions wisely --- don't use
+\cdata{PyExc_TypeError} to mean that a file couldn't be opened (that
+should probably be \cdata{PyExc_IOError}).  If something's wrong with
+the argument list, the \cfunction{PyArg_ParseTuple()} function usually
+raises \cdata{PyExc_TypeError}.  If you have an argument whose value
+must be in a particular range or must satisfy other conditions,
+\cdata{PyExc_ValueError} is appropriate.
+
+You can also define a new exception that is unique to your module.
+For this, you usually declare a static object variable at the
+beginning of your file:
+
+\begin{verbatim}
+static PyObject *SpamError;
+\end{verbatim}
+
+and initialize it in your module's initialization function
+(\cfunction{initspam()}) with an exception object (leaving out
+the error checking for now):
+
+\begin{verbatim}
+void
+initspam()
+{
+    PyObject *m, *d;
+
+    m = Py_InitModule("spam", SpamMethods);
+    d = PyModule_GetDict(m);
+    SpamError = PyErr_NewException("spam.error", NULL, NULL);
+    PyDict_SetItemString(d, "error", SpamError);
+}
+\end{verbatim}
+
+Note that the Python name for the exception object is
+\exception{spam.error}.  The \cfunction{PyErr_NewException()} function
+may create a class with the base class being \exception{Exception}
+(unless another class is passed in instead of \NULL), described in the
+\citetitle[../lib/lib.html]{Python Library Reference} under ``Built-in
+Exceptions.''
+
+Note also that the \cdata{SpamError} variable retains a reference to
+the newly created exception class; this is intentional!  Since the
+exception could be removed from the module by external code, an owned
+reference to the class is needed to ensure that it will not be
+discarded, causing \cdata{SpamError} to become a dangling pointer.
+Should it become a dangling pointer, C code which raises the exception
+could cause a core dump or other unintended side effects.
+
+
+\section{Back to the Example
+         \label{backToExample}}
+
+Going back to our example function, you should now be able to
+understand this statement:
+
+\begin{verbatim}
+    if (!PyArg_ParseTuple(args, "s", &command))
+        return NULL;
+\end{verbatim}
+
+It returns \NULL{} (the error indicator for functions returning
+object pointers) if an error is detected in the argument list, relying
+on the exception set by \cfunction{PyArg_ParseTuple()}.  Otherwise the
+string value of the argument has been copied to the local variable
+\cdata{command}.  This is a pointer assignment and you are not supposed
+to modify the string to which it points (so in Standard C, the variable
+\cdata{command} should properly be declared as \samp{const char
+*command}).
+
+The next statement is a call to the \UNIX{} function
+\cfunction{system()}, passing it the string we just got from
+\cfunction{PyArg_ParseTuple()}:
+
+\begin{verbatim}
+    sts = system(command);
+\end{verbatim}
+
+Our \function{spam.system()} function must return the value of
+\cdata{sts} as a Python object.  This is done using the function
+\cfunction{Py_BuildValue()}, which is something like the inverse of
+\cfunction{PyArg_ParseTuple()}: it takes a format string and an
+arbitrary number of C values, and returns a new Python object.
+More info on \cfunction{Py_BuildValue()} is given later.
+
+\begin{verbatim}
+    return Py_BuildValue("i", sts);
+\end{verbatim}
+
+In this case, it will return an integer object.  (Yes, even integers
+are objects on the heap in Python!)
+
+If you have a C function that returns no useful argument (a function
+returning \ctype{void}), the corresponding Python function must return
+\code{None}.   You need this idiom to do so:
+
+\begin{verbatim}
+    Py_INCREF(Py_None);
+    return Py_None;
+\end{verbatim}
+
+\cdata{Py_None} is the C name for the special Python object
+\code{None}.  It is a genuine Python object rather than a \NULL{}
+pointer, which means ``error'' in most contexts, as we have seen.
+
+
+\section{The Module's Method Table and Initialization Function
+         \label{methodTable}}
+
+I promised to show how \cfunction{spam_system()} is called from Python
+programs.  First, we need to list its name and address in a ``method
+table'':
+
+\begin{verbatim}
+static PyMethodDef SpamMethods[] = {
+    ...
+    {"system",  spam_system, METH_VARARGS},
+    ...
+    {NULL,      NULL}        /* Sentinel */
+};
+\end{verbatim}
+
+Note the third entry (\samp{METH_VARARGS}).  This is a flag telling
+the interpreter the calling convention to be used for the C
+function.  It should normally always be \samp{METH_VARARGS} or
+\samp{METH_VARARGS | METH_KEYWORDS}; a value of \code{0} means that an
+obsolete variant of \cfunction{PyArg_ParseTuple()} is used.
+
+When using only \samp{METH_VARARGS}, the function should expect
+the Python-level parameters to be passed in as a tuple acceptable for
+parsing via \cfunction{PyArg_ParseTuple()}; more information on this
+function is provided below.
+
+The \constant{METH_KEYWORDS} bit may be set in the third field if
+keyword arguments should be passed to the function.  In this case, the
+C function should accept a third \samp{PyObject *} parameter which
+will be a dictionary of keywords.  Use
+\cfunction{PyArg_ParseTupleAndKeywords()} to parse the arguments to
+such a function.
+
+The method table must be passed to the interpreter in the module's
+initialization function.  The initialization function must be named
+\cfunction{init\var{name}()}, where \var{name} is the name of the
+module, and should be the only non-\keyword{static} item defined in
+the module file:
+
+\begin{verbatim}
+void
+initspam()
+{
+    (void) Py_InitModule("spam", SpamMethods);
+}
+\end{verbatim}
+
+Note that for \Cpp, this method must be declared \code{extern "C"}.
+
+When the Python program imports module \module{spam} for the first
+time, \cfunction{initspam()} is called. (See below for comments about
+embedding Python.)  It calls
+\cfunction{Py_InitModule()}, which creates a ``module object'' (which
+is inserted in the dictionary \code{sys.modules} under the key
+\code{"spam"}), and inserts built-in function objects into the newly
+created module based upon the table (an array of \ctype{PyMethodDef}
+structures) that was passed as its second argument.
+\cfunction{Py_InitModule()} returns a pointer to the module object
+that it creates (which is unused here).  It aborts with a fatal error
+if the module could not be initialized satisfactorily, so the caller
+doesn't need to check for errors.
+
+When embedding Python, the \cfunction{initspam()} function is not
+called automatically unless there's an entry in the
+\cdata{_PyImport_Inittab} table.  The easiest way to handle this is to 
+statically initialize your statically-linked modules by directly
+calling \cfunction{initspam()} after the call to
+\cfunction{Py_Initialize()} or \cfunction{PyMac_Initialize()}:
+
+\begin{verbatim}
+int main(int argc, char **argv)
+{
+    /* Pass argv[0] to the Python interpreter */
+    Py_SetProgramName(argv[0]);
+
+    /* Initialize the Python interpreter.  Required. */
+    Py_Initialize();
+
+    /* Add a static module */
+    initspam();
+\end{verbatim}
+
+An example may be found in the file \file{Demo/embed/demo.c} in the
+Python source distribution.
+
+\strong{Note:}  Removing entries from \code{sys.modules} or importing
+compiled modules into multiple interpreters within a process (or
+following a \cfunction{fork()} without an intervening
+\cfunction{exec()}) can create problems for some extension modules.
+Extension module authors should exercise caution when initializing
+internal data structures.
+Note also that the \function{reload()} function can be used with
+extension modules, and will call the module initialization function
+(\cfunction{initspam()} in the example), but will not load the module
+again if it was loaded from a dynamically loadable object file
+(\file{.so} on \UNIX, \file{.dll} on Windows).
+
+A more substantial example module is included in the Python source
+distribution as \file{Modules/xxmodule.c}.  This file may be used as a 
+template or simply read as an example.  The \program{modulator.py}
+script included in the source distribution or Windows install provides 
+a simple graphical user interface for declaring the functions and
+objects which a module should implement, and can generate a template
+which can be filled in.  The script lives in the
+\file{Tools/modulator/} directory; see the \file{README} file there
+for more information.
+
+
+\section{Compilation and Linkage
+         \label{compilation}}
+
+There are two more things to do before you can use your new extension:
+compiling and linking it with the Python system.  If you use dynamic
+loading, the details depend on the style of dynamic loading your
+system uses; see the chapters about building extension modules on
+\UNIX{} (chapter \ref{building-on-unix}) and Windows (chapter
+\ref{building-on-windows}) for more information about this.
+% XXX Add information about MacOS  
+
+If you can't use dynamic loading, or if you want to make your module a
+permanent part of the Python interpreter, you will have to change the
+configuration setup and rebuild the interpreter.  Luckily, this is
+very simple: just place your file (\file{spammodule.c} for example) in
+the \file{Modules/} directory of an unpacked source distribution, add
+a line to the file \file{Modules/Setup.local} describing your file:
+
+\begin{verbatim}
+spam spammodule.o
+\end{verbatim}
+
+and rebuild the interpreter by running \program{make} in the toplevel
+directory.  You can also run \program{make} in the \file{Modules/}
+subdirectory, but then you must first rebuild \file{Makefile}
+there by running `\program{make} Makefile'.  (This is necessary each
+time you change the \file{Setup} file.)
+
+If your module requires additional libraries to link with, these can
+be listed on the line in the configuration file as well, for instance:
+
+\begin{verbatim}
+spam spammodule.o -lX11
+\end{verbatim}
+
+\section{Calling Python Functions from C
+         \label{callingPython}}
+
+So far we have concentrated on making C functions callable from
+Python.  The reverse is also useful: calling Python functions from C.
+This is especially the case for libraries that support so-called
+``callback'' functions.  If a C interface makes use of callbacks, the
+equivalent Python often needs to provide a callback mechanism to the
+Python programmer; the implementation will require calling the Python
+callback functions from a C callback.  Other uses are also imaginable.
+
+Fortunately, the Python interpreter is easily called recursively, and
+there is a standard interface to call a Python function.  (I won't
+dwell on how to call the Python parser with a particular string as
+input --- if you're interested, have a look at the implementation of
+the \programopt{-c} command line option in \file{Python/pythonmain.c}
+from the Python source code.)
+
+Calling a Python function is easy.  First, the Python program must
+somehow pass you the Python function object.  You should provide a
+function (or some other interface) to do this.  When this function is
+called, save a pointer to the Python function object (be careful to
+\cfunction{Py_INCREF()} it!) in a global variable --- or wherever you
+see fit. For example, the following function might be part of a module
+definition:
+
+\begin{verbatim}
+static PyObject *my_callback = NULL;
+
+static PyObject *
+my_set_callback(dummy, args)
+    PyObject *dummy, *args;
+{
+    PyObject *result = NULL;
+    PyObject *temp;
+
+    if (PyArg_ParseTuple(args, "O:set_callback", &temp)) {
+        if (!PyCallable_Check(temp)) {
+            PyErr_SetString(PyExc_TypeError, "parameter must be callable");
+            return NULL;
+        }
+        Py_XINCREF(temp);         /* Add a reference to new callback */
+        Py_XDECREF(my_callback);  /* Dispose of previous callback */
+        my_callback = temp;       /* Remember new callback */
+        /* Boilerplate to return "None" */
+        Py_INCREF(Py_None);
+        result = Py_None;
+    }
+    return result;
+}
+\end{verbatim}
+
+This function must be registered with the interpreter using the
+\constant{METH_VARARGS} flag; this is described in section
+\ref{methodTable}, ``The Module's Method Table and Initialization
+Function.''  The \cfunction{PyArg_ParseTuple()} function and its
+arguments are documented in section \ref{parseTuple}, ``Extracting
+Parameters in Extension Functions.''
+
+The macros \cfunction{Py_XINCREF()} and \cfunction{Py_XDECREF()}
+increment/decrement the reference count of an object and are safe in
+the presence of \NULL{} pointers (but note that \var{temp} will not be 
+\NULL{} in this context).  More info on them in section
+\ref{refcounts}, ``Reference Counts.''
+
+Later, when it is time to call the function, you call the C function
+\cfunction{PyEval_CallObject()}.  This function has two arguments, both
+pointers to arbitrary Python objects: the Python function, and the
+argument list.  The argument list must always be a tuple object, whose
+length is the number of arguments.  To call the Python function with
+no arguments, pass an empty tuple; to call it with one argument, pass
+a singleton tuple.  \cfunction{Py_BuildValue()} returns a tuple when its
+format string consists of zero or more format codes between
+parentheses.  For example:
+
+\begin{verbatim}
+    int arg;
+    PyObject *arglist;
+    PyObject *result;
+    ...
+    arg = 123;
+    ...
+    /* Time to call the callback */
+    arglist = Py_BuildValue("(i)", arg);
+    result = PyEval_CallObject(my_callback, arglist);
+    Py_DECREF(arglist);
+\end{verbatim}
+
+\cfunction{PyEval_CallObject()} returns a Python object pointer: this is
+the return value of the Python function.  \cfunction{PyEval_CallObject()} is
+``reference-count-neutral'' with respect to its arguments.  In the
+example a new tuple was created to serve as the argument list, which
+is \cfunction{Py_DECREF()}-ed immediately after the call.
+
+The return value of \cfunction{PyEval_CallObject()} is ``new'': either it
+is a brand new object, or it is an existing object whose reference
+count has been incremented.  So, unless you want to save it in a
+global variable, you should somehow \cfunction{Py_DECREF()} the result,
+even (especially!) if you are not interested in its value.
+
+Before you do this, however, it is important to check that the return
+value isn't \NULL{}.  If it is, the Python function terminated by
+raising an exception.  If the C code that called
+\cfunction{PyEval_CallObject()} is called from Python, it should now
+return an error indication to its Python caller, so the interpreter
+can print a stack trace, or the calling Python code can handle the
+exception.  If this is not possible or desirable, the exception should
+be cleared by calling \cfunction{PyErr_Clear()}.  For example:
+
+\begin{verbatim}
+    if (result == NULL)
+        return NULL; /* Pass error back */
+    ...use result...
+    Py_DECREF(result); 
+\end{verbatim}
+
+Depending on the desired interface to the Python callback function,
+you may also have to provide an argument list to
+\cfunction{PyEval_CallObject()}.  In some cases the argument list is
+also provided by the Python program, through the same interface that
+specified the callback function.  It can then be saved and used in the
+same manner as the function object.  In other cases, you may have to
+construct a new tuple to pass as the argument list.  The simplest way
+to do this is to call \cfunction{Py_BuildValue()}.  For example, if
+you want to pass an integral event code, you might use the following
+code:
+
+\begin{verbatim}
+    PyObject *arglist;
+    ...
+    arglist = Py_BuildValue("(l)", eventcode);
+    result = PyEval_CallObject(my_callback, arglist);
+    Py_DECREF(arglist);
+    if (result == NULL)
+        return NULL; /* Pass error back */
+    /* Here maybe use the result */
+    Py_DECREF(result);
+\end{verbatim}
+
+Note the placement of \samp{Py_DECREF(arglist)} immediately after the
+call, before the error check!  Also note that strictly spoken this
+code is not complete: \cfunction{Py_BuildValue()} may run out of
+memory, and this should be checked.
+
+
+\section{Extracting Parameters in Extension Functions
+         \label{parseTuple}}
+
+The \cfunction{PyArg_ParseTuple()} function is declared as follows:
+
+\begin{verbatim}
+int PyArg_ParseTuple(PyObject *arg, char *format, ...);
+\end{verbatim}
+
+The \var{arg} argument must be a tuple object containing an argument
+list passed from Python to a C function.  The \var{format} argument
+must be a format string, whose syntax is explained below.  The
+remaining arguments must be addresses of variables whose type is
+determined by the format string.  For the conversion to succeed, the
+\var{arg} object must match the format and the format must be
+exhausted.  On success, \cfunction{PyArg_ParseTuple()} returns true,
+otherwise it returns false and raises an appropriate exception.
+
+Note that while \cfunction{PyArg_ParseTuple()} checks that the Python
+arguments have the required types, it cannot check the validity of the
+addresses of C variables passed to the call: if you make mistakes
+there, your code will probably crash or at least overwrite random bits
+in memory.  So be careful!
+
+A format string consists of zero or more ``format units''.  A format
+unit describes one Python object; it is usually a single character or
+a parenthesized sequence of format units.  With a few exceptions, a
+format unit that is not a parenthesized sequence normally corresponds
+to a single address argument to \cfunction{PyArg_ParseTuple()}.  In the
+following description, the quoted form is the format unit; the entry
+in (round) parentheses is the Python object type that matches the
+format unit; and the entry in [square] brackets is the type of the C
+variable(s) whose address should be passed.  (Use the \samp{\&}
+operator to pass a variable's address.)
+
+Note that any Python object references which are provided to the
+caller are \emph{borrowed} references; do not decrement their
+reference count!
+
+\begin{description}
+
+\item[\samp{s} (string or Unicode object) {[char *]}]
+Convert a Python string or Unicode object to a C pointer to a
+character string.  You must not provide storage for the string
+itself; a pointer to an existing string is stored into the character
+pointer variable whose address you pass.  The C string is
+null-terminated.  The Python string must not contain embedded null
+bytes; if it does, a \exception{TypeError} exception is raised.
+Unicode objects are converted to C strings using the default
+encoding. If this conversion fails, an \exception{UnicodeError} is
+raised.
+
+\item[\samp{s\#} (string, Unicode or any read buffer compatible object) 
+{[char *, int]}]
+This variant on \samp{s} stores into two C variables, the first one a
+pointer to a character string, the second one its length.  In this
+case the Python string may contain embedded null bytes.  Unicode
+objects pass back a pointer to the default encoded string version of the
+object if such a conversion is possible. All other read buffer
+compatible objects pass back a reference to the raw internal data
+representation.
+
+\item[\samp{z} (string or \code{None}) {[char *]}]
+Like \samp{s}, but the Python object may also be \code{None}, in which
+case the C pointer is set to \NULL{}.
+
+\item[\samp{z\#} (string or \code{None} or any read buffer compatible object) 
+{[char *, int]}]
+This is to \samp{s\#} as \samp{z} is to \samp{s}.
+
+\item[\samp{u} (Unicode object) {[Py_UNICODE *]}]
+Convert a Python Unicode object to a C pointer to a null-terminated
+buffer of 16-bit Unicode (UTF-16) data.  As with \samp{s}, there is no need
+to provide storage for the Unicode data buffer; a pointer to the
+existing Unicode data is stored into the Py_UNICODE pointer variable whose
+address you pass.  
+
+\item[\samp{u\#} (Unicode object) {[Py_UNICODE *, int]}]
+This variant on \samp{u} stores into two C variables, the first one
+a pointer to a Unicode data buffer, the second one its length.
+
+\item[\samp{es} (string, Unicode object or character buffer compatible
+object) {[const char *encoding, char **buffer]}]
+This variant on \samp{s} is used for encoding Unicode and objects
+convertible to Unicode into a character buffer. It only works for
+encoded data without embedded \NULL{} bytes.
+
+The variant reads one C variable and stores into two C variables, the
+first one a pointer to an encoding name string (\var{encoding}), and the
+second a pointer to a pointer to a character buffer (\var{**buffer},
+the buffer used for storing the encoded data).
+
+The encoding name must map to a registered codec. If set to \NULL{},
+the default encoding is used.
+
+\cfunction{PyArg_ParseTuple()} will allocate a buffer of the needed
+size using \cfunction{PyMem_NEW()}, copy the encoded data into this
+buffer and adjust \var{*buffer} to reference the newly allocated
+storage. The caller is responsible for calling
+\cfunction{PyMem_Free()} to free the allocated buffer after usage.
+
+\item[\samp{et} (string, Unicode object or character buffer compatible
+object) {[const char *encoding, char **buffer]}]
+Same as \samp{es} except that string objects are passed through without
+recoding them. Instead, the implementation assumes that the string
+object uses the encoding passed in as parameter.
+
+\item[\samp{es\#} (string, Unicode object or character buffer compatible
+object) {[const char *encoding, char **buffer, int *buffer_length]}]
+This variant on \samp{s\#} is used for encoding Unicode and objects
+convertible to Unicode into a character buffer. It reads one C
+variable and stores into three C variables, the first one a pointer to
+an encoding name string (\var{encoding}), the second a pointer to a
+pointer to a character buffer (\var{**buffer}, the buffer used for
+storing the encoded data) and the third one a pointer to an integer
+(\var{*buffer_length}, the buffer length).
+
+The encoding name must map to a registered codec. If set to \NULL{},
+the default encoding is used.
+
+There are two modes of operation: 
+
+If \var{*buffer} points a \NULL{} pointer,
+\cfunction{PyArg_ParseTuple()} will allocate a buffer of the needed
+size using \cfunction{PyMem_NEW()}, copy the encoded data into this
+buffer and adjust \var{*buffer} to reference the newly allocated
+storage. The caller is responsible for calling
+\cfunction{PyMem_Free()} to free the allocated buffer after usage.
+
+If \var{*buffer} points to a non-\NULL{} pointer (an already allocated
+buffer), \cfunction{PyArg_ParseTuple()} will use this location as
+buffer and interpret \var{*buffer_length} as buffer size. It will then
+copy the encoded data into the buffer and 0-terminate it. Buffer
+overflow is signalled with an exception.
+
+In both cases, \var{*buffer_length} is set to the length of the
+encoded data without the trailing 0-byte.
+
+\item[\samp{et\#} (string, Unicode object or character buffer compatible
+object) {[const char *encoding, char **buffer]}]
+Same as \samp{es\#} except that string objects are passed through without
+recoding them. Instead, the implementation assumes that the string
+object uses the encoding passed in as parameter.
+
+\item[\samp{b} (integer) {[char]}]
+Convert a Python integer to a tiny int, stored in a C \ctype{char}.
+
+\item[\samp{h} (integer) {[short int]}]
+Convert a Python integer to a C \ctype{short int}.
+
+\item[\samp{i} (integer) {[int]}]
+Convert a Python integer to a plain C \ctype{int}.
+
+\item[\samp{l} (integer) {[long int]}]
+Convert a Python integer to a C \ctype{long int}.
+
+\item[\samp{c} (string of length 1) {[char]}]
+Convert a Python character, represented as a string of length 1, to a
+C \ctype{char}.
+
+\item[\samp{f} (float) {[float]}]
+Convert a Python floating point number to a C \ctype{float}.
+
+\item[\samp{d} (float) {[double]}]
+Convert a Python floating point number to a C \ctype{double}.
+
+\item[\samp{D} (complex) {[Py_complex]}]
+Convert a Python complex number to a C \ctype{Py_complex} structure.
+
+\item[\samp{O} (object) {[PyObject *]}]
+Store a Python object (without any conversion) in a C object pointer.
+The C program thus receives the actual object that was passed.  The
+object's reference count is not increased.  The pointer stored is not
+\NULL{}.
+
+\item[\samp{O!} (object) {[\var{typeobject}, PyObject *]}]
+Store a Python object in a C object pointer.  This is similar to
+\samp{O}, but takes two C arguments: the first is the address of a
+Python type object, the second is the address of the C variable (of
+type \ctype{PyObject *}) into which the object pointer is stored.
+If the Python object does not have the required type,
+\exception{TypeError} is raised.
+
+\item[\samp{O\&} (object) {[\var{converter}, \var{anything}]}]
+Convert a Python object to a C variable through a \var{converter}
+function.  This takes two arguments: the first is a function, the
+second is the address of a C variable (of arbitrary type), converted
+to \ctype{void *}.  The \var{converter} function in turn is called as
+follows:
+
+\var{status}\code{ = }\var{converter}\code{(}\var{object}, \var{address}\code{);}
+
+where \var{object} is the Python object to be converted and
+\var{address} is the \ctype{void *} argument that was passed to
+\cfunction{PyArg_ConvertTuple()}.  The returned \var{status} should be
+\code{1} for a successful conversion and \code{0} if the conversion
+has failed.  When the conversion fails, the \var{converter} function
+should raise an exception.
+
+\item[\samp{S} (string) {[PyStringObject *]}]
+Like \samp{O} but requires that the Python object is a string object.
+Raises \exception{TypeError} if the object is not a string object.
+The C variable may also be declared as \ctype{PyObject *}.
+
+\item[\samp{U} (Unicode string) {[PyUnicodeObject *]}]
+Like \samp{O} but requires that the Python object is a Unicode object.
+Raises \exception{TypeError} if the object is not a Unicode object.
+The C variable may also be declared as \ctype{PyObject *}.
+
+\item[\samp{t\#} (read-only character buffer) {[char *, int]}]
+Like \samp{s\#}, but accepts any object which implements the read-only 
+buffer interface.  The \ctype{char *} variable is set to point to the
+first byte of the buffer, and the \ctype{int} is set to the length of
+the buffer.  Only single-segment buffer objects are accepted;
+\exception{TypeError} is raised for all others.
+
+\item[\samp{w} (read-write character buffer) {[char *]}]
+Similar to \samp{s}, but accepts any object which implements the
+read-write buffer interface.  The caller must determine the length of
+the buffer by other means, or use \samp{w\#} instead.  Only
+single-segment buffer objects are accepted; \exception{TypeError} is
+raised for all others.
+
+\item[\samp{w\#} (read-write character buffer) {[char *, int]}]
+Like \samp{s\#}, but accepts any object which implements the
+read-write buffer interface.  The \ctype{char *} variable is set to
+point to the first byte of the buffer, and the \ctype{int} is set to
+the length of the buffer.  Only single-segment buffer objects are
+accepted; \exception{TypeError} is raised for all others.
+
+\item[\samp{(\var{items})} (tuple) {[\var{matching-items}]}]
+The object must be a Python sequence whose length is the number of
+format units in \var{items}.  The C arguments must correspond to the
+individual format units in \var{items}.  Format units for sequences
+may be nested.
+
+\strong{Note:} Prior to Python version 1.5.2, this format specifier
+only accepted a tuple containing the individual parameters, not an
+arbitrary sequence.  Code which previously caused
+\exception{TypeError} to be raised here may now proceed without an
+exception.  This is not expected to be a problem for existing code.
+
+\end{description}
+
+It is possible to pass Python long integers where integers are
+requested; however no proper range checking is done --- the most
+significant bits are silently truncated when the receiving field is
+too small to receive the value (actually, the semantics are inherited
+from downcasts in C --- your mileage may vary).
+
+A few other characters have a meaning in a format string.  These may
+not occur inside nested parentheses.  They are:
+
+\begin{description}
+
+\item[\samp{|}]
+Indicates that the remaining arguments in the Python argument list are
+optional.  The C variables corresponding to optional arguments should
+be initialized to their default value --- when an optional argument is
+not specified, \cfunction{PyArg_ParseTuple()} does not touch the contents
+of the corresponding C variable(s).
+
+\item[\samp{:}]
+The list of format units ends here; the string after the colon is used
+as the function name in error messages (the ``associated value'' of
+the exception that \cfunction{PyArg_ParseTuple()} raises).
+
+\item[\samp{;}]
+The list of format units ends here; the string after the semicolon is
+used as the error message \emph{instead} of the default error message.
+Clearly, \samp{:} and \samp{;} mutually exclude each other.
+
+\end{description}
+
+Some example calls:
+
+\begin{verbatim}
+    int ok;
+    int i, j;
+    long k, l;
+    char *s;
+    int size;
+
+    ok = PyArg_ParseTuple(args, ""); /* No arguments */
+        /* Python call: f() */
+\end{verbatim}
+
+\begin{verbatim}
+    ok = PyArg_ParseTuple(args, "s", &s); /* A string */
+        /* Possible Python call: f('whoops!') */
+\end{verbatim}
+
+\begin{verbatim}
+    ok = PyArg_ParseTuple(args, "lls", &k, &l, &s); /* Two longs and a string */
+        /* Possible Python call: f(1, 2, 'three') */
+\end{verbatim}
+
+\begin{verbatim}
+    ok = PyArg_ParseTuple(args, "(ii)s#", &i, &j, &s, &size);
+        /* A pair of ints and a string, whose size is also returned */
+        /* Possible Python call: f((1, 2), 'three') */
+\end{verbatim}
+
+\begin{verbatim}
+    {
+        char *file;
+        char *mode = "r";
+        int bufsize = 0;
+        ok = PyArg_ParseTuple(args, "s|si", &file, &mode, &bufsize);
+        /* A string, and optionally another string and an integer */
+        /* Possible Python calls:
+           f('spam')
+           f('spam', 'w')
+           f('spam', 'wb', 100000) */
+    }
+\end{verbatim}
+
+\begin{verbatim}
+    {
+        int left, top, right, bottom, h, v;
+        ok = PyArg_ParseTuple(args, "((ii)(ii))(ii)",
+                 &left, &top, &right, &bottom, &h, &v);
+        /* A rectangle and a point */
+        /* Possible Python call:
+           f(((0, 0), (400, 300)), (10, 10)) */
+    }
+\end{verbatim}
+
+\begin{verbatim}
+    {
+        Py_complex c;
+        ok = PyArg_ParseTuple(args, "D:myfunction", &c);
+        /* a complex, also providing a function name for errors */
+        /* Possible Python call: myfunction(1+2j) */
+    }
+\end{verbatim}
+
+
+\section{Keyword Parameters for Extension Functions
+         \label{parseTupleAndKeywords}}
+
+The \cfunction{PyArg_ParseTupleAndKeywords()} function is declared as
+follows:
+
+\begin{verbatim}
+int PyArg_ParseTupleAndKeywords(PyObject *arg, PyObject *kwdict,
+                                char *format, char **kwlist, ...);
+\end{verbatim}
+
+The \var{arg} and \var{format} parameters are identical to those of the
+\cfunction{PyArg_ParseTuple()} function.  The \var{kwdict} parameter
+is the dictionary of keywords received as the third parameter from the
+Python runtime.  The \var{kwlist} parameter is a \NULL{}-terminated
+list of strings which identify the parameters; the names are matched
+with the type information from \var{format} from left to right.  On
+success, \cfunction{PyArg_ParseTupleAndKeywords()} returns true,
+otherwise it returns false and raises an appropriate exception.
+
+\strong{Note:}  Nested tuples cannot be parsed when using keyword
+arguments!  Keyword parameters passed in which are not present in the
+\var{kwlist} will cause \exception{TypeError} to be raised.
+
+Here is an example module which uses keywords, based on an example by
+Geoff Philbrick (\email{philbrick@hks.com}):%
+\index{Philbrick, Geoff}
+
+\begin{verbatim}
+#include <stdio.h>
+#include "Python.h"
+
+static PyObject *
+keywdarg_parrot(self, args, keywds)
+    PyObject *self;
+    PyObject *args;
+    PyObject *keywds;
+{  
+    int voltage;
+    char *state = "a stiff";
+    char *action = "voom";
+    char *type = "Norwegian Blue";
+
+    static char *kwlist[] = {"voltage", "state", "action", "type", NULL};
+
+    if (!PyArg_ParseTupleAndKeywords(args, keywds, "i|sss", kwlist, 
+                                     &voltage, &state, &action, &type))
+        return NULL; 
+  
+    printf("-- This parrot wouldn't %s if you put %i Volts through it.\n", 
+           action, voltage);
+    printf("-- Lovely plumage, the %s -- It's %s!\n", type, state);
+
+    Py_INCREF(Py_None);
+
+    return Py_None;
+}
+
+static PyMethodDef keywdarg_methods[] = {
+    /* The cast of the function is necessary since PyCFunction values
+     * only take two PyObject* parameters, and keywdarg_parrot() takes
+     * three.
+     */
+    {"parrot", (PyCFunction)keywdarg_parrot, METH_VARARGS|METH_KEYWORDS},
+    {NULL,  NULL}   /* sentinel */
+};
+
+void
+initkeywdarg()
+{
+  /* Create the module and add the functions */
+  Py_InitModule("keywdarg", keywdarg_methods);
+}
+\end{verbatim}
+
+
+\section{Building Arbitrary Values
+         \label{buildValue}}
+
+This function is the counterpart to \cfunction{PyArg_ParseTuple()}.  It is
+declared as follows:
+
+\begin{verbatim}
+PyObject *Py_BuildValue(char *format, ...);
+\end{verbatim}
+
+It recognizes a set of format units similar to the ones recognized by
+\cfunction{PyArg_ParseTuple()}, but the arguments (which are input to the
+function, not output) must not be pointers, just values.  It returns a
+new Python object, suitable for returning from a C function called
+from Python.
+
+One difference with \cfunction{PyArg_ParseTuple()}: while the latter
+requires its first argument to be a tuple (since Python argument lists
+are always represented as tuples internally),
+\cfunction{Py_BuildValue()} does not always build a tuple.  It builds
+a tuple only if its format string contains two or more format units.
+If the format string is empty, it returns \code{None}; if it contains
+exactly one format unit, it returns whatever object is described by
+that format unit.  To force it to return a tuple of size 0 or one,
+parenthesize the format string.
+
+When memory buffers are passed as parameters to supply data to build
+objects, as for the \samp{s} and \samp{s\#} formats, the required data
+is copied.  Buffers provided by the caller are never referenced by the
+objects created by \cfunction{Py_BuildValue()}.  In other words, if
+your code invokes \cfunction{malloc()} and passes the allocated memory
+to \cfunction{Py_BuildValue()}, your code is responsible for
+calling \cfunction{free()} for that memory once
+\cfunction{Py_BuildValue()} returns.
+
+In the following description, the quoted form is the format unit; the
+entry in (round) parentheses is the Python object type that the format
+unit will return; and the entry in [square] brackets is the type of
+the C value(s) to be passed.
+
+The characters space, tab, colon and comma are ignored in format
+strings (but not within format units such as \samp{s\#}).  This can be
+used to make long format strings a tad more readable.
+
+\begin{description}
+
+\item[\samp{s} (string) {[char *]}]
+Convert a null-terminated C string to a Python object.  If the C
+string pointer is \NULL{}, \code{None} is used.
+
+\item[\samp{s\#} (string) {[char *, int]}]
+Convert a C string and its length to a Python object.  If the C string
+pointer is \NULL{}, the length is ignored and \code{None} is
+returned.
+
+\item[\samp{z} (string or \code{None}) {[char *]}]
+Same as \samp{s}.
+
+\item[\samp{z\#} (string or \code{None}) {[char *, int]}]
+Same as \samp{s\#}.
+
+\item[\samp{u} (Unicode string) {[Py_UNICODE *]}]
+Convert a null-terminated buffer of Unicode (UCS-2) data to a Python
+Unicode object.  If the Unicode buffer pointer is \NULL,
+\code{None} is returned.
+
+\item[\samp{u\#} (Unicode string) {[Py_UNICODE *, int]}]
+Convert a Unicode (UCS-2) data buffer and its length to a Python
+Unicode object.   If the Unicode buffer pointer is \NULL, the length
+is ignored and \code{None} is returned.
+
+\item[\samp{i} (integer) {[int]}]
+Convert a plain C \ctype{int} to a Python integer object.
+
+\item[\samp{b} (integer) {[char]}]
+Same as \samp{i}.
+
+\item[\samp{h} (integer) {[short int]}]
+Same as \samp{i}.
+
+\item[\samp{l} (integer) {[long int]}]
+Convert a C \ctype{long int} to a Python integer object.
+
+\item[\samp{c} (string of length 1) {[char]}]
+Convert a C \ctype{int} representing a character to a Python string of
+length 1.
+
+\item[\samp{d} (float) {[double]}]
+Convert a C \ctype{double} to a Python floating point number.
+
+\item[\samp{f} (float) {[float]}]
+Same as \samp{d}.
+
+\item[\samp{D} (complex) {[Py_complex *]}]
+Convert a C \ctype{Py_complex} structure to a Python complex number.
+
+\item[\samp{O} (object) {[PyObject *]}]
+Pass a Python object untouched (except for its reference count, which
+is incremented by one).  If the object passed in is a \NULL{}
+pointer, it is assumed that this was caused because the call producing
+the argument found an error and set an exception.  Therefore,
+\cfunction{Py_BuildValue()} will return \NULL{} but won't raise an
+exception.  If no exception has been raised yet,
+\cdata{PyExc_SystemError} is set.
+
+\item[\samp{S} (object) {[PyObject *]}]
+Same as \samp{O}.
+
+\item[\samp{U} (object) {[PyObject *]}]
+Same as \samp{O}.
+
+\item[\samp{N} (object) {[PyObject *]}]
+Same as \samp{O}, except it doesn't increment the reference count on
+the object.  Useful when the object is created by a call to an object
+constructor in the argument list.
+
+\item[\samp{O\&} (object) {[\var{converter}, \var{anything}]}]
+Convert \var{anything} to a Python object through a \var{converter}
+function.  The function is called with \var{anything} (which should be
+compatible with \ctype{void *}) as its argument and should return a
+``new'' Python object, or \NULL{} if an error occurred.
+
+\item[\samp{(\var{items})} (tuple) {[\var{matching-items}]}]
+Convert a sequence of C values to a Python tuple with the same number
+of items.
+
+\item[\samp{[\var{items}]} (list) {[\var{matching-items}]}]
+Convert a sequence of C values to a Python list with the same number
+of items.
+
+\item[\samp{\{\var{items}\}} (dictionary) {[\var{matching-items}]}]
+Convert a sequence of C values to a Python dictionary.  Each pair of
+consecutive C values adds one item to the dictionary, serving as key
+and value, respectively.
+
+\end{description}
+
+If there is an error in the format string, the
+\cdata{PyExc_SystemError} exception is raised and \NULL{} returned.
+
+Examples (to the left the call, to the right the resulting Python value):
+
+\begin{verbatim}
+    Py_BuildValue("")                        None
+    Py_BuildValue("i", 123)                  123
+    Py_BuildValue("iii", 123, 456, 789)      (123, 456, 789)
+    Py_BuildValue("s", "hello")              'hello'
+    Py_BuildValue("ss", "hello", "world")    ('hello', 'world')
+    Py_BuildValue("s#", "hello", 4)          'hell'
+    Py_BuildValue("()")                      ()
+    Py_BuildValue("(i)", 123)                (123,)
+    Py_BuildValue("(ii)", 123, 456)          (123, 456)
+    Py_BuildValue("(i,i)", 123, 456)         (123, 456)
+    Py_BuildValue("[i,i]", 123, 456)         [123, 456]
+    Py_BuildValue("{s:i,s:i}",
+                  "abc", 123, "def", 456)    {'abc': 123, 'def': 456}
+    Py_BuildValue("((ii)(ii)) (ii)",
+                  1, 2, 3, 4, 5, 6)          (((1, 2), (3, 4)), (5, 6))
+\end{verbatim}
+
+
+\section{Reference Counts
+         \label{refcounts}}
+
+In languages like C or \Cpp{}, the programmer is responsible for
+dynamic allocation and deallocation of memory on the heap.  In C,
+this is done using the functions \cfunction{malloc()} and
+\cfunction{free()}.  In \Cpp{}, the operators \keyword{new} and
+\keyword{delete} are used with essentially the same meaning; they are
+actually implemented using \cfunction{malloc()} and
+\cfunction{free()}, so we'll restrict the following discussion to the
+latter.
+
+Every block of memory allocated with \cfunction{malloc()} should
+eventually be returned to the pool of available memory by exactly one
+call to \cfunction{free()}.  It is important to call
+\cfunction{free()} at the right time.  If a block's address is
+forgotten but \cfunction{free()} is not called for it, the memory it
+occupies cannot be reused until the program terminates.  This is
+called a \dfn{memory leak}.  On the other hand, if a program calls
+\cfunction{free()} for a block and then continues to use the block, it
+creates a conflict with re-use of the block through another
+\cfunction{malloc()} call.  This is called \dfn{using freed memory}.
+It has the same bad consequences as referencing uninitialized data ---
+core dumps, wrong results, mysterious crashes.
+
+Common causes of memory leaks are unusual paths through the code.  For
+instance, a function may allocate a block of memory, do some
+calculation, and then free the block again.  Now a change in the
+requirements for the function may add a test to the calculation that
+detects an error condition and can return prematurely from the
+function.  It's easy to forget to free the allocated memory block when
+taking this premature exit, especially when it is added later to the
+code.  Such leaks, once introduced, often go undetected for a long
+time: the error exit is taken only in a small fraction of all calls,
+and most modern machines have plenty of virtual memory, so the leak
+only becomes apparent in a long-running process that uses the leaking
+function frequently.  Therefore, it's important to prevent leaks from
+happening by having a coding convention or strategy that minimizes
+this kind of errors.
+
+Since Python makes heavy use of \cfunction{malloc()} and
+\cfunction{free()}, it needs a strategy to avoid memory leaks as well
+as the use of freed memory.  The chosen method is called
+\dfn{reference counting}.  The principle is simple: every object
+contains a counter, which is incremented when a reference to the
+object is stored somewhere, and which is decremented when a reference
+to it is deleted.  When the counter reaches zero, the last reference
+to the object has been deleted and the object is freed.
+
+An alternative strategy is called \dfn{automatic garbage collection}.
+(Sometimes, reference counting is also referred to as a garbage
+collection strategy, hence my use of ``automatic'' to distinguish the
+two.)  The big advantage of automatic garbage collection is that the
+user doesn't need to call \cfunction{free()} explicitly.  (Another claimed
+advantage is an improvement in speed or memory usage --- this is no
+hard fact however.)  The disadvantage is that for C, there is no
+truly portable automatic garbage collector, while reference counting
+can be implemented portably (as long as the functions \cfunction{malloc()}
+and \cfunction{free()} are available --- which the C Standard guarantees).
+Maybe some day a sufficiently portable automatic garbage collector
+will be available for C.  Until then, we'll have to live with
+reference counts.
+
+\subsection{Reference Counting in Python
+            \label{refcountsInPython}}
+
+There are two macros, \code{Py_INCREF(x)} and \code{Py_DECREF(x)},
+which handle the incrementing and decrementing of the reference count.
+\cfunction{Py_DECREF()} also frees the object when the count reaches zero.
+For flexibility, it doesn't call \cfunction{free()} directly --- rather, it
+makes a call through a function pointer in the object's \dfn{type
+object}.  For this purpose (and others), every object also contains a
+pointer to its type object.
+
+The big question now remains: when to use \code{Py_INCREF(x)} and
+\code{Py_DECREF(x)}?  Let's first introduce some terms.  Nobody
+``owns'' an object; however, you can \dfn{own a reference} to an
+object.  An object's reference count is now defined as the number of
+owned references to it.  The owner of a reference is responsible for
+calling \cfunction{Py_DECREF()} when the reference is no longer
+needed.  Ownership of a reference can be transferred.  There are three
+ways to dispose of an owned reference: pass it on, store it, or call
+\cfunction{Py_DECREF()}.  Forgetting to dispose of an owned reference
+creates a memory leak.
+
+It is also possible to \dfn{borrow}\footnote{The metaphor of
+``borrowing'' a reference is not completely correct: the owner still
+has a copy of the reference.} a reference to an object.  The borrower
+of a reference should not call \cfunction{Py_DECREF()}.  The borrower must
+not hold on to the object longer than the owner from which it was
+borrowed.  Using a borrowed reference after the owner has disposed of
+it risks using freed memory and should be avoided
+completely.\footnote{Checking that the reference count is at least 1
+\strong{does not work} --- the reference count itself could be in
+freed memory and may thus be reused for another object!}
+
+The advantage of borrowing over owning a reference is that you don't
+need to take care of disposing of the reference on all possible paths
+through the code --- in other words, with a borrowed reference you
+don't run the risk of leaking when a premature exit is taken.  The
+disadvantage of borrowing over leaking is that there are some subtle
+situations where in seemingly correct code a borrowed reference can be
+used after the owner from which it was borrowed has in fact disposed
+of it.
+
+A borrowed reference can be changed into an owned reference by calling
+\cfunction{Py_INCREF()}.  This does not affect the status of the owner from
+which the reference was borrowed --- it creates a new owned reference,
+and gives full owner responsibilities (the new owner must
+dispose of the reference properly, as well as the previous owner).
+
+
+\subsection{Ownership Rules
+            \label{ownershipRules}}
+
+Whenever an object reference is passed into or out of a function, it
+is part of the function's interface specification whether ownership is
+transferred with the reference or not.
+
+Most functions that return a reference to an object pass on ownership
+with the reference.  In particular, all functions whose function it is
+to create a new object, such as \cfunction{PyInt_FromLong()} and
+\cfunction{Py_BuildValue()}, pass ownership to the receiver.  Even if in
+fact, in some cases, you don't receive a reference to a brand new
+object, you still receive ownership of the reference.  For instance,
+\cfunction{PyInt_FromLong()} maintains a cache of popular values and can
+return a reference to a cached item.
+
+Many functions that extract objects from other objects also transfer
+ownership with the reference, for instance
+\cfunction{PyObject_GetAttrString()}.  The picture is less clear, here,
+however, since a few common routines are exceptions:
+\cfunction{PyTuple_GetItem()}, \cfunction{PyList_GetItem()},
+\cfunction{PyDict_GetItem()}, and \cfunction{PyDict_GetItemString()}
+all return references that you borrow from the tuple, list or
+dictionary.
+
+The function \cfunction{PyImport_AddModule()} also returns a borrowed
+reference, even though it may actually create the object it returns:
+this is possible because an owned reference to the object is stored in
+\code{sys.modules}.
+
+When you pass an object reference into another function, in general,
+the function borrows the reference from you --- if it needs to store
+it, it will use \cfunction{Py_INCREF()} to become an independent
+owner.  There are exactly two important exceptions to this rule:
+\cfunction{PyTuple_SetItem()} and \cfunction{PyList_SetItem()}.  These
+functions take over ownership of the item passed to them --- even if
+they fail!  (Note that \cfunction{PyDict_SetItem()} and friends don't
+take over ownership --- they are ``normal.'')
+
+When a C function is called from Python, it borrows references to its
+arguments from the caller.  The caller owns a reference to the object,
+so the borrowed reference's lifetime is guaranteed until the function
+returns.  Only when such a borrowed reference must be stored or passed
+on, it must be turned into an owned reference by calling
+\cfunction{Py_INCREF()}.
+
+The object reference returned from a C function that is called from
+Python must be an owned reference --- ownership is tranferred from the
+function to its caller.
+
+
+\subsection{Thin Ice
+            \label{thinIce}}
+
+There are a few situations where seemingly harmless use of a borrowed
+reference can lead to problems.  These all have to do with implicit
+invocations of the interpreter, which can cause the owner of a
+reference to dispose of it.
+
+The first and most important case to know about is using
+\cfunction{Py_DECREF()} on an unrelated object while borrowing a
+reference to a list item.  For instance:
+
+\begin{verbatim}
+bug(PyObject *list) {
+    PyObject *item = PyList_GetItem(list, 0);
+
+    PyList_SetItem(list, 1, PyInt_FromLong(0L));
+    PyObject_Print(item, stdout, 0); /* BUG! */
+}
+\end{verbatim}
+
+This function first borrows a reference to \code{list[0]}, then
+replaces \code{list[1]} with the value \code{0}, and finally prints
+the borrowed reference.  Looks harmless, right?  But it's not!
+
+Let's follow the control flow into \cfunction{PyList_SetItem()}.  The list
+owns references to all its items, so when item 1 is replaced, it has
+to dispose of the original item 1.  Now let's suppose the original
+item 1 was an instance of a user-defined class, and let's further
+suppose that the class defined a \method{__del__()} method.  If this
+class instance has a reference count of 1, disposing of it will call
+its \method{__del__()} method.
+
+Since it is written in Python, the \method{__del__()} method can execute
+arbitrary Python code.  Could it perhaps do something to invalidate
+the reference to \code{item} in \cfunction{bug()}?  You bet!  Assuming
+that the list passed into \cfunction{bug()} is accessible to the
+\method{__del__()} method, it could execute a statement to the effect of
+\samp{del list[0]}, and assuming this was the last reference to that
+object, it would free the memory associated with it, thereby
+invalidating \code{item}.
+
+The solution, once you know the source of the problem, is easy:
+temporarily increment the reference count.  The correct version of the
+function reads:
+
+\begin{verbatim}
+no_bug(PyObject *list) {
+    PyObject *item = PyList_GetItem(list, 0);
+
+    Py_INCREF(item);
+    PyList_SetItem(list, 1, PyInt_FromLong(0L));
+    PyObject_Print(item, stdout, 0);
+    Py_DECREF(item);
+}
+\end{verbatim}
+
+This is a true story.  An older version of Python contained variants
+of this bug and someone spent a considerable amount of time in a C
+debugger to figure out why his \method{__del__()} methods would fail...
+
+The second case of problems with a borrowed reference is a variant
+involving threads.  Normally, multiple threads in the Python
+interpreter can't get in each other's way, because there is a global
+lock protecting Python's entire object space.  However, it is possible
+to temporarily release this lock using the macro
+\code{Py_BEGIN_ALLOW_THREADS}, and to re-acquire it using
+\code{Py_END_ALLOW_THREADS}.  This is common around blocking I/O
+calls, to let other threads use the processor while waiting for the I/O to
+complete.  Obviously, the following function has the same problem as
+the previous one:
+
+\begin{verbatim}
+bug(PyObject *list) {
+    PyObject *item = PyList_GetItem(list, 0);
+    Py_BEGIN_ALLOW_THREADS
+    ...some blocking I/O call...
+    Py_END_ALLOW_THREADS
+    PyObject_Print(item, stdout, 0); /* BUG! */
+}
+\end{verbatim}
+
+
+\subsection{NULL Pointers
+            \label{nullPointers}}
+
+In general, functions that take object references as arguments do not
+expect you to pass them \NULL{} pointers, and will dump core (or
+cause later core dumps) if you do so.  Functions that return object
+references generally return \NULL{} only to indicate that an
+exception occurred.  The reason for not testing for \NULL{}
+arguments is that functions often pass the objects they receive on to
+other function --- if each function were to test for \NULL{},
+there would be a lot of redundant tests and the code would run more
+slowly.
+
+It is better to test for \NULL{} only at the ``source:'' when a
+pointer that may be \NULL{} is received, for example, from
+\cfunction{malloc()} or from a function that may raise an exception.
+
+The macros \cfunction{Py_INCREF()} and \cfunction{Py_DECREF()}
+do not check for \NULL{} pointers --- however, their variants
+\cfunction{Py_XINCREF()} and \cfunction{Py_XDECREF()} do.
+
+The macros for checking for a particular object type
+(\code{Py\var{type}_Check()}) don't check for \NULL{} pointers ---
+again, there is much code that calls several of these in a row to test
+an object against various different expected types, and this would
+generate redundant tests.  There are no variants with \NULL{}
+checking.
+
+The C function calling mechanism guarantees that the argument list
+passed to C functions (\code{args} in the examples) is never
+\NULL{} --- in fact it guarantees that it is always a tuple.\footnote{
+These guarantees don't hold when you use the ``old'' style
+calling convention --- this is still found in much existing code.}
+
+It is a severe error to ever let a \NULL{} pointer ``escape'' to
+the Python user.
+
+% Frank Stajano:
+% A pedagogically buggy example, along the lines of the previous listing, 
+% would be helpful here -- showing in more concrete terms what sort of 
+% actions could cause the problem. I can't very well imagine it from the 
+% description.
+
+
+\section{Writing Extensions in \Cpp{}
+         \label{cplusplus}}
+
+It is possible to write extension modules in \Cpp{}.  Some restrictions
+apply.  If the main program (the Python interpreter) is compiled and
+linked by the C compiler, global or static objects with constructors
+cannot be used.  This is not a problem if the main program is linked
+by the \Cpp{} compiler.  Functions that will be called by the
+Python interpreter (in particular, module initalization functions)
+have to be declared using \code{extern "C"}.
+It is unnecessary to enclose the Python header files in
+\code{extern "C" \{...\}} --- they use this form already if the symbol
+\samp{__cplusplus} is defined (all recent \Cpp{} compilers define this
+symbol).
+
+
+\section{Providing a C API for an Extension Module
+         \label{using-cobjects}}
+\sectionauthor{Konrad Hinsen}{hinsen@cnrs-orleans.fr}
+
+Many extension modules just provide new functions and types to be
+used from Python, but sometimes the code in an extension module can
+be useful for other extension modules. For example, an extension
+module could implement a type ``collection'' which works like lists
+without order. Just like the standard Python list type has a C API
+which permits extension modules to create and manipulate lists, this
+new collection type should have a set of C functions for direct
+manipulation from other extension modules.
+
+At first sight this seems easy: just write the functions (without
+declaring them \keyword{static}, of course), provide an appropriate
+header file, and document the C API. And in fact this would work if
+all extension modules were always linked statically with the Python
+interpreter. When modules are used as shared libraries, however, the
+symbols defined in one module may not be visible to another module.
+The details of visibility depend on the operating system; some systems
+use one global namespace for the Python interpreter and all extension
+modules (Windows, for example), whereas others require an explicit
+list of imported symbols at module link time (AIX is one example), or
+offer a choice of different strategies (most Unices). And even if
+symbols are globally visible, the module whose functions one wishes to
+call might not have been loaded yet!
+
+Portability therefore requires not to make any assumptions about
+symbol visibility. This means that all symbols in extension modules
+should be declared \keyword{static}, except for the module's
+initialization function, in order to avoid name clashes with other
+extension modules (as discussed in section~\ref{methodTable}). And it
+means that symbols that \emph{should} be accessible from other
+extension modules must be exported in a different way.
+
+Python provides a special mechanism to pass C-level information
+(pointers) from one extension module to another one: CObjects.
+A CObject is a Python data type which stores a pointer (\ctype{void
+*}).  CObjects can only be created and accessed via their C API, but
+they can be passed around like any other Python object. In particular, 
+they can be assigned to a name in an extension module's namespace.
+Other extension modules can then import this module, retrieve the
+value of this name, and then retrieve the pointer from the CObject.
+
+There are many ways in which CObjects can be used to export the C API
+of an extension module. Each name could get its own CObject, or all C
+API pointers could be stored in an array whose address is published in
+a CObject. And the various tasks of storing and retrieving the pointers
+can be distributed in different ways between the module providing the
+code and the client modules.
+
+The following example demonstrates an approach that puts most of the
+burden on the writer of the exporting module, which is appropriate
+for commonly used library modules. It stores all C API pointers
+(just one in the example!) in an array of \ctype{void} pointers which
+becomes the value of a CObject. The header file corresponding to
+the module provides a macro that takes care of importing the module
+and retrieving its C API pointers; client modules only have to call
+this macro before accessing the C API.
+
+The exporting module is a modification of the \module{spam} module from
+section~\ref{simpleExample}. The function \function{spam.system()}
+does not call the C library function \cfunction{system()} directly,
+but a function \cfunction{PySpam_System()}, which would of course do
+something more complicated in reality (such as adding ``spam'' to
+every command). This function \cfunction{PySpam_System()} is also
+exported to other extension modules.
+
+The function \cfunction{PySpam_System()} is a plain C function,
+declared \keyword{static} like everything else:
+
+\begin{verbatim}
+static int
+PySpam_System(command)
+    char *command;
+{
+    return system(command);
+}
+\end{verbatim}
+
+The function \cfunction{spam_system()} is modified in a trivial way:
+
+\begin{verbatim}
+static PyObject *
+spam_system(self, args)
+    PyObject *self;
+    PyObject *args;
+{
+    char *command;
+    int sts;
+
+    if (!PyArg_ParseTuple(args, "s", &command))
+        return NULL;
+    sts = PySpam_System(command);
+    return Py_BuildValue("i", sts);
+}
+\end{verbatim}
+
+In the beginning of the module, right after the line
+
+\begin{verbatim}
+#include "Python.h"
+\end{verbatim}
+
+two more lines must be added:
+
+\begin{verbatim}
+#define SPAM_MODULE
+#include "spammodule.h"
+\end{verbatim}
+
+The \code{\#define} is used to tell the header file that it is being
+included in the exporting module, not a client module. Finally,
+the module's initialization function must take care of initializing
+the C API pointer array:
+
+\begin{verbatim}
+void
+initspam()
+{
+    PyObject *m;
+    static void *PySpam_API[PySpam_API_pointers];
+    PyObject *c_api_object;
+
+    m = Py_InitModule("spam", SpamMethods);
+
+    /* Initialize the C API pointer array */
+    PySpam_API[PySpam_System_NUM] = (void *)PySpam_System;
+
+    /* Create a CObject containing the API pointer array's address */
+    c_api_object = PyCObject_FromVoidPtr((void *)PySpam_API, NULL);
+
+    if (c_api_object != NULL) {
+        /* Create a name for this object in the module's namespace */
+        PyObject *d = PyModule_GetDict(m);
+
+        PyDict_SetItemString(d, "_C_API", c_api_object);
+        Py_DECREF(c_api_object);
+    }
+}
+\end{verbatim}
+
+Note that \code{PySpam_API} is declared \code{static}; otherwise
+the pointer array would disappear when \code{initspam} terminates!
+
+The bulk of the work is in the header file \file{spammodule.h},
+which looks like this:
+
+\begin{verbatim}
+#ifndef Py_SPAMMODULE_H
+#define Py_SPAMMODULE_H
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/* Header file for spammodule */
+
+/* C API functions */
+#define PySpam_System_NUM 0
+#define PySpam_System_RETURN int
+#define PySpam_System_PROTO (char *command)
+
+/* Total number of C API pointers */
+#define PySpam_API_pointers 1
+
+
+#ifdef SPAM_MODULE
+/* This section is used when compiling spammodule.c */
+
+static PySpam_System_RETURN PySpam_System PySpam_System_PROTO;
+
+#else
+/* This section is used in modules that use spammodule's API */
+
+static void **PySpam_API;
+
+#define PySpam_System \
+ (*(PySpam_System_RETURN (*)PySpam_System_PROTO) PySpam_API[PySpam_System_NUM])
+
+#define import_spam() \
+{ \
+  PyObject *module = PyImport_ImportModule("spam"); \
+  if (module != NULL) { \
+    PyObject *module_dict = PyModule_GetDict(module); \
+    PyObject *c_api_object = PyDict_GetItemString(module_dict, "_C_API"); \
+    if (PyCObject_Check(c_api_object)) { \
+      PySpam_API = (void **)PyCObject_AsVoidPtr(c_api_object); \
+    } \
+  } \
+}
+
+#endif
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* !defined(Py_SPAMMODULE_H */
+\end{verbatim}
+
+All that a client module must do in order to have access to the
+function \cfunction{PySpam_System()} is to call the function (or
+rather macro) \cfunction{import_spam()} in its initialization
+function:
+
+\begin{verbatim}
+void
+initclient()
+{
+    PyObject *m;
+
+    Py_InitModule("client", ClientMethods);
+    import_spam();
+}
+\end{verbatim}
+
+The main disadvantage of this approach is that the file
+\file{spammodule.h} is rather complicated. However, the
+basic structure is the same for each function that is
+exported, so it has to be learned only once.
+
+Finally it should be mentioned that CObjects offer additional
+functionality, which is especially useful for memory allocation and
+deallocation of the pointer stored in a CObject. The details
+are described in the \citetitle[../api/api.html]{Python/C API
+Reference Manual} in the section ``CObjects'' and in the
+implementation of CObjects (files \file{Include/cobject.h} and
+\file{Objects/cobject.c} in the Python source code distribution).
author	Fred Drake <fdrake@acm.org>	2001-08-20 19:30:29 (GMT)
committer	Fred Drake <fdrake@acm.org>	2001-08-20 19:30:29 (GMT)
commit	cc8f44b8847d65ba62b3d34bf4b7613414ba0fae (patch)
tree	701be3a763a37672598eae3a6c3c2e540f608b32 /Doc/ext/extending.tex
parent	1ba6bada67b2eca079b13f2deebff91696df909b (diff)
download	cpython-cc8f44b8847d65ba62b3d34bf4b7613414ba0fae.zip cpython-cc8f44b8847d65ba62b3d34bf4b7613414ba0fae.tar.gz cpython-cc8f44b8847d65ba62b3d34bf4b7613414ba0fae.tar.bz2