summaryrefslogtreecommitdiffstats
path: root/Doc/api/intro.tex
diff options
context:
space:
mode:
authorFred Drake <fdrake@acm.org>2001-10-12 19:01:43 (GMT)
committerFred Drake <fdrake@acm.org>2001-10-12 19:01:43 (GMT)
commit3adf79e3e2ac4ba0c2960997234c0d36c40468a8 (patch)
tree86cbac99bf498cbc2db49feb345b4bd4a17608f4 /Doc/api/intro.tex
parent716aac0448ef9fb6f3fd8c82237a7e73e9adb307 (diff)
downloadcpython-3adf79e3e2ac4ba0c2960997234c0d36c40468a8.zip
cpython-3adf79e3e2ac4ba0c2960997234c0d36c40468a8.tar.gz
cpython-3adf79e3e2ac4ba0c2960997234c0d36c40468a8.tar.bz2
Break the Python/C API manual into smaller files by chapter. This manual
has grown beyond what font-lock will work with using the default (X)Emacs settings. Indentation of the description has been made consistent, and a number of smaller markup adjustments have been made as well.
Diffstat (limited to 'Doc/api/intro.tex')
-rw-r--r--Doc/api/intro.tex558
1 files changed, 558 insertions, 0 deletions
diff --git a/Doc/api/intro.tex b/Doc/api/intro.tex
new file mode 100644
index 0000000..d148ba8
--- /dev/null
+++ b/Doc/api/intro.tex
@@ -0,0 +1,558 @@
+\chapter{Introduction \label{intro}}
+
+
+The Application Programmer's Interface to Python gives C and
+\Cpp{} programmers access to the Python interpreter at a variety of
+levels. The API is equally usable from \Cpp{}, but for brevity it is
+generally referred to as the Python/C API. There are two
+fundamentally different reasons for using the Python/C API. The first
+reason is to write \emph{extension modules} for specific purposes;
+these are C modules that extend the Python interpreter. This is
+probably the most common use. The second reason is to use Python as a
+component in a larger application; this technique is generally
+referred to as \dfn{embedding} Python in an application.
+
+Writing an extension module is a relatively well-understood process,
+where a ``cookbook'' approach works well. There are several tools
+that automate the process to some extent. While people have embedded
+Python in other applications since its early existence, the process of
+embedding Python is less straightforward than writing an extension.
+
+Many API functions are useful independent of whether you're embedding
+or extending Python; moreover, most applications that embed Python
+will need to provide a custom extension as well, so it's probably a
+good idea to become familiar with writing an extension before
+attempting to embed Python in a real application.
+
+
+\section{Include Files \label{includes}}
+
+All function, type and macro definitions needed to use the Python/C
+API are included in your code by the following line:
+
+\begin{verbatim}
+#include "Python.h"
+\end{verbatim}
+
+This implies inclusion of the following standard headers:
+\code{<stdio.h>}, \code{<string.h>}, \code{<errno.h>},
+\code{<limits.h>}, and \code{<stdlib.h>} (if available).
+Since Python may define some pre-processor definitions which affect
+the standard headers on some systems, you must include \file{Python.h}
+before any standard headers are included.
+
+All user visible names defined by Python.h (except those defined by
+the included standard headers) have one of the prefixes \samp{Py} or
+\samp{_Py}. Names beginning with \samp{_Py} are for internal use by
+the Python implementation and should not be used by extension writers.
+Structure member names do not have a reserved prefix.
+
+\strong{Important:} user code should never define names that begin
+with \samp{Py} or \samp{_Py}. This confuses the reader, and
+jeopardizes the portability of the user code to future Python
+versions, which may define additional names beginning with one of
+these prefixes.
+
+The header files are typically installed with Python. On \UNIX, these
+are located in the directories
+\file{\envvar{prefix}/include/python\var{version}/} and
+\file{\envvar{exec_prefix}/include/python\var{version}/}, where
+\envvar{prefix} and \envvar{exec_prefix} are defined by the
+corresponding parameters to Python's \program{configure} script and
+\var{version} is \code{sys.version[:3]}. On Windows, the headers are
+installed in \file{\envvar{prefix}/include}, where \envvar{prefix} is
+the installation directory specified to the installer.
+
+To include the headers, place both directories (if different) on your
+compiler's search path for includes. Do \emph{not} place the parent
+directories on the search path and then use
+\samp{\#include <python\shortversion/Python.h>}; this will break on
+multi-platform builds since the platform independent headers under
+\envvar{prefix} include the platform specific headers from
+\envvar{exec_prefix}.
+
+\Cpp{} users should note that though the API is defined entirely using
+C, the header files do properly declare the entry points to be
+\code{extern "C"}, so there is no need to do anything special to use
+the API from \Cpp.
+
+
+\section{Objects, Types and Reference Counts \label{objects}}
+
+Most Python/C API functions have one or more arguments as well as a
+return value of type \ctype{PyObject*}. This type is a pointer
+to an opaque data type representing an arbitrary Python
+object. Since all Python object types are treated the same way by the
+Python language in most situations (e.g., assignments, scope rules,
+and argument passing), it is only fitting that they should be
+represented by a single C type. Almost all Python objects live on the
+heap: you never declare an automatic or static variable of type
+\ctype{PyObject}, only pointer variables of type \ctype{PyObject*} can
+be declared. The sole exception are the type objects\obindex{type};
+since these must never be deallocated, they are typically static
+\ctype{PyTypeObject} objects.
+
+All Python objects (even Python integers) have a \dfn{type} and a
+\dfn{reference count}. An object's type determines what kind of object
+it is (e.g., an integer, a list, or a user-defined function; there are
+many more as explained in the \citetitle[../ref/ref.html]{Python
+Reference Manual}). For each of the well-known types there is a macro
+to check whether an object is of that type; for instance,
+\samp{PyList_Check(\var{a})} is true if (and only if) the object
+pointed to by \var{a} is a Python list.
+
+
+\subsection{Reference Counts \label{refcounts}}
+
+The reference count is important because today's computers have a
+finite (and often severely limited) memory size; it counts how many
+different places there are that have a reference to an object. Such a
+place could be another object, or a global (or static) C variable, or
+a local variable in some C function. When an object's reference count
+becomes zero, the object is deallocated. If it contains references to
+other objects, their reference count is decremented. Those other
+objects may be deallocated in turn, if this decrement makes their
+reference count become zero, and so on. (There's an obvious problem
+with objects that reference each other here; for now, the solution is
+``don't do that.'')
+
+Reference counts are always manipulated explicitly. The normal way is
+to use the macro \cfunction{Py_INCREF()}\ttindex{Py_INCREF()} to
+increment an object's reference count by one, and
+\cfunction{Py_DECREF()}\ttindex{Py_DECREF()} to decrement it by
+one. The \cfunction{Py_DECREF()} macro is considerably more complex
+than the incref one, since it must check whether the reference count
+becomes zero and then cause the object's deallocator to be called.
+The deallocator is a function pointer contained in the object's type
+structure. The type-specific deallocator takes care of decrementing
+the reference counts for other objects contained in the object if this
+is a compound object type, such as a list, as well as performing any
+additional finalization that's needed. There's no chance that the
+reference count can overflow; at least as many bits are used to hold
+the reference count as there are distinct memory locations in virtual
+memory (assuming \code{sizeof(long) >= sizeof(char*)}). Thus, the
+reference count increment is a simple operation.
+
+It is not necessary to increment an object's reference count for every
+local variable that contains a pointer to an object. In theory, the
+object's reference count goes up by one when the variable is made to
+point to it and it goes down by one when the variable goes out of
+scope. However, these two cancel each other out, so at the end the
+reference count hasn't changed. The only real reason to use the
+reference count is to prevent the object from being deallocated as
+long as our variable is pointing to it. If we know that there is at
+least one other reference to the object that lives at least as long as
+our variable, there is no need to increment the reference count
+temporarily. An important situation where this arises is in objects
+that are passed as arguments to C functions in an extension module
+that are called from Python; the call mechanism guarantees to hold a
+reference to every argument for the duration of the call.
+
+However, a common pitfall is to extract an object from a list and
+hold on to it for a while without incrementing its reference count.
+Some other operation might conceivably remove the object from the
+list, decrementing its reference count and possible deallocating it.
+The real danger is that innocent-looking operations may invoke
+arbitrary Python code which could do this; there is a code path which
+allows control to flow back to the user from a \cfunction{Py_DECREF()},
+so almost any operation is potentially dangerous.
+
+A safe approach is to always use the generic operations (functions
+whose name begins with \samp{PyObject_}, \samp{PyNumber_},
+\samp{PySequence_} or \samp{PyMapping_}). These operations always
+increment the reference count of the object they return. This leaves
+the caller with the responsibility to call
+\cfunction{Py_DECREF()} when they are done with the result; this soon
+becomes second nature.
+
+
+\subsubsection{Reference Count Details \label{refcountDetails}}
+
+The reference count behavior of functions in the Python/C API is best
+explained in terms of \emph{ownership of references}. Note that we
+talk of owning references, never of owning objects; objects are always
+shared! When a function owns a reference, it has to dispose of it
+properly --- either by passing ownership on (usually to its caller) or
+by calling \cfunction{Py_DECREF()} or \cfunction{Py_XDECREF()}. When
+a function passes ownership of a reference on to its caller, the
+caller is said to receive a \emph{new} reference. When no ownership
+is transferred, the caller is said to \emph{borrow} the reference.
+Nothing needs to be done for a borrowed reference.
+
+Conversely, when a calling function passes it a reference to an
+object, there are two possibilities: the function \emph{steals} a
+reference to the object, or it does not. Few functions steal
+references; the two notable exceptions are
+\cfunction{PyList_SetItem()}\ttindex{PyList_SetItem()} and
+\cfunction{PyTuple_SetItem()}\ttindex{PyTuple_SetItem()}, which
+steal a reference to the item (but not to the tuple or list into which
+the item is put!). These functions were designed to steal a reference
+because of a common idiom for populating a tuple or list with newly
+created objects; for example, the code to create the tuple \code{(1,
+2, "three")} could look like this (forgetting about error handling for
+the moment; a better way to code this is shown below):
+
+\begin{verbatim}
+PyObject *t;
+
+t = PyTuple_New(3);
+PyTuple_SetItem(t, 0, PyInt_FromLong(1L));
+PyTuple_SetItem(t, 1, PyInt_FromLong(2L));
+PyTuple_SetItem(t, 2, PyString_FromString("three"));
+\end{verbatim}
+
+Incidentally, \cfunction{PyTuple_SetItem()} is the \emph{only} way to
+set tuple items; \cfunction{PySequence_SetItem()} and
+\cfunction{PyObject_SetItem()} refuse to do this since tuples are an
+immutable data type. You should only use
+\cfunction{PyTuple_SetItem()} for tuples that you are creating
+yourself.
+
+Equivalent code for populating a list can be written using
+\cfunction{PyList_New()} and \cfunction{PyList_SetItem()}. Such code
+can also use \cfunction{PySequence_SetItem()}; this illustrates the
+difference between the two (the extra \cfunction{Py_DECREF()} calls):
+
+\begin{verbatim}
+PyObject *l, *x;
+
+l = PyList_New(3);
+x = PyInt_FromLong(1L);
+PySequence_SetItem(l, 0, x); Py_DECREF(x);
+x = PyInt_FromLong(2L);
+PySequence_SetItem(l, 1, x); Py_DECREF(x);
+x = PyString_FromString("three");
+PySequence_SetItem(l, 2, x); Py_DECREF(x);
+\end{verbatim}
+
+You might find it strange that the ``recommended'' approach takes more
+code. However, in practice, you will rarely use these ways of
+creating and populating a tuple or list. There's a generic function,
+\cfunction{Py_BuildValue()}, that can create most common objects from
+C values, directed by a \dfn{format string}. For example, the
+above two blocks of code could be replaced by the following (which
+also takes care of the error checking):
+
+\begin{verbatim}
+PyObject *t, *l;
+
+t = Py_BuildValue("(iis)", 1, 2, "three");
+l = Py_BuildValue("[iis]", 1, 2, "three");
+\end{verbatim}
+
+It is much more common to use \cfunction{PyObject_SetItem()} and
+friends with items whose references you are only borrowing, like
+arguments that were passed in to the function you are writing. In
+that case, their behaviour regarding reference counts is much saner,
+since you don't have to increment a reference count so you can give a
+reference away (``have it be stolen''). For example, this function
+sets all items of a list (actually, any mutable sequence) to a given
+item:
+
+\begin{verbatim}
+int set_all(PyObject *target, PyObject *item)
+{
+ int i, n;
+
+ n = PyObject_Length(target);
+ if (n < 0)
+ return -1;
+ for (i = 0; i < n; i++) {
+ if (PyObject_SetItem(target, i, item) < 0)
+ return -1;
+ }
+ return 0;
+}
+\end{verbatim}
+\ttindex{set_all()}
+
+The situation is slightly different for function return values.
+While passing a reference to most functions does not change your
+ownership responsibilities for that reference, many functions that
+return a referece to an object give you ownership of the reference.
+The reason is simple: in many cases, the returned object is created
+on the fly, and the reference you get is the only reference to the
+object. Therefore, the generic functions that return object
+references, like \cfunction{PyObject_GetItem()} and
+\cfunction{PySequence_GetItem()}, always return a new reference (the
+caller becomes the owner of the reference).
+
+It is important to realize that whether you own a reference returned
+by a function depends on which function you call only --- \emph{the
+plumage} (the type of the type of the object passed as an
+argument to the function) \emph{doesn't enter into it!} Thus, if you
+extract an item from a list using \cfunction{PyList_GetItem()}, you
+don't own the reference --- but if you obtain the same item from the
+same list using \cfunction{PySequence_GetItem()} (which happens to
+take exactly the same arguments), you do own a reference to the
+returned object.
+
+Here is an example of how you could write a function that computes the
+sum of the items in a list of integers; once using
+\cfunction{PyList_GetItem()}\ttindex{PyList_GetItem()}, and once using
+\cfunction{PySequence_GetItem()}\ttindex{PySequence_GetItem()}.
+
+\begin{verbatim}
+long sum_list(PyObject *list)
+{
+ int i, n;
+ long total = 0;
+ PyObject *item;
+
+ n = PyList_Size(list);
+ if (n < 0)
+ return -1; /* Not a list */
+ for (i = 0; i < n; i++) {
+ item = PyList_GetItem(list, i); /* Can't fail */
+ if (!PyInt_Check(item)) continue; /* Skip non-integers */
+ total += PyInt_AsLong(item);
+ }
+ return total;
+}
+\end{verbatim}
+\ttindex{sum_list()}
+
+\begin{verbatim}
+long sum_sequence(PyObject *sequence)
+{
+ int i, n;
+ long total = 0;
+ PyObject *item;
+ n = PySequence_Length(sequence);
+ if (n < 0)
+ return -1; /* Has no length */
+ for (i = 0; i < n; i++) {
+ item = PySequence_GetItem(sequence, i);
+ if (item == NULL)
+ return -1; /* Not a sequence, or other failure */
+ if (PyInt_Check(item))
+ total += PyInt_AsLong(item);
+ Py_DECREF(item); /* Discard reference ownership */
+ }
+ return total;
+}
+\end{verbatim}
+\ttindex{sum_sequence()}
+
+
+\subsection{Types \label{types}}
+
+There are few other data types that play a significant role in
+the Python/C API; most are simple C types such as \ctype{int},
+\ctype{long}, \ctype{double} and \ctype{char*}. A few structure types
+are used to describe static tables used to list the functions exported
+by a module or the data attributes of a new object type, and another
+is used to describe the value of a complex number. These will
+be discussed together with the functions that use them.
+
+
+\section{Exceptions \label{exceptions}}
+
+The Python programmer only needs to deal with exceptions if specific
+error handling is required; unhandled exceptions are automatically
+propagated to the caller, then to the caller's caller, and so on, until
+they reach the top-level interpreter, where they are reported to the
+user accompanied by a stack traceback.
+
+For C programmers, however, error checking always has to be explicit.
+All functions in the Python/C API can raise exceptions, unless an
+explicit claim is made otherwise in a function's documentation. In
+general, when a function encounters an error, it sets an exception,
+discards any object references that it owns, and returns an
+error indicator --- usually \NULL{} or \code{-1}. A few functions
+return a Boolean true/false result, with false indicating an error.
+Very few functions return no explicit error indicator or have an
+ambiguous return value, and require explicit testing for errors with
+\cfunction{PyErr_Occurred()}\ttindex{PyErr_Occurred()}.
+
+Exception state is maintained in per-thread storage (this is
+equivalent to using global storage in an unthreaded application). A
+thread can be in one of two states: an exception has occurred, or not.
+The function \cfunction{PyErr_Occurred()} can be used to check for
+this: it returns a borrowed reference to the exception type object
+when an exception has occurred, and \NULL{} otherwise. There are a
+number of functions to set the exception state:
+\cfunction{PyErr_SetString()}\ttindex{PyErr_SetString()} is the most
+common (though not the most general) function to set the exception
+state, and \cfunction{PyErr_Clear()}\ttindex{PyErr_Clear()} clears the
+exception state.
+
+The full exception state consists of three objects (all of which can
+be \NULL): the exception type, the corresponding exception
+value, and the traceback. These have the same meanings as the Python
+\withsubitem{(in module sys)}{
+ \ttindex{exc_type}\ttindex{exc_value}\ttindex{exc_traceback}}
+objects \code{sys.exc_type}, \code{sys.exc_value}, and
+\code{sys.exc_traceback}; however, they are not the same: the Python
+objects represent the last exception being handled by a Python
+\keyword{try} \ldots\ \keyword{except} statement, while the C level
+exception state only exists while an exception is being passed on
+between C functions until it reaches the Python bytecode interpreter's
+main loop, which takes care of transferring it to \code{sys.exc_type}
+and friends.
+
+Note that starting with Python 1.5, the preferred, thread-safe way to
+access the exception state from Python code is to call the function
+\withsubitem{(in module sys)}{\ttindex{exc_info()}}
+\function{sys.exc_info()}, which returns the per-thread exception state
+for Python code. Also, the semantics of both ways to access the
+exception state have changed so that a function which catches an
+exception will save and restore its thread's exception state so as to
+preserve the exception state of its caller. This prevents common bugs
+in exception handling code caused by an innocent-looking function
+overwriting the exception being handled; it also reduces the often
+unwanted lifetime extension for objects that are referenced by the
+stack frames in the traceback.
+
+As a general principle, a function that calls another function to
+perform some task should check whether the called function raised an
+exception, and if so, pass the exception state on to its caller. It
+should discard any object references that it owns, and return an
+error indicator, but it should \emph{not} set another exception ---
+that would overwrite the exception that was just raised, and lose
+important information about the exact cause of the error.
+
+A simple example of detecting exceptions and passing them on is shown
+in the \cfunction{sum_sequence()}\ttindex{sum_sequence()} example
+above. It so happens that that example doesn't need to clean up any
+owned references when it detects an error. The following example
+function shows some error cleanup. First, to remind you why you like
+Python, we show the equivalent Python code:
+
+\begin{verbatim}
+def incr_item(dict, key):
+ try:
+ item = dict[key]
+ except KeyError:
+ item = 0
+ dict[key] = item + 1
+\end{verbatim}
+\ttindex{incr_item()}
+
+Here is the corresponding C code, in all its glory:
+
+\begin{verbatim}
+int incr_item(PyObject *dict, PyObject *key)
+{
+ /* Objects all initialized to NULL for Py_XDECREF */
+ PyObject *item = NULL, *const_one = NULL, *incremented_item = NULL;
+ int rv = -1; /* Return value initialized to -1 (failure) */
+
+ item = PyObject_GetItem(dict, key);
+ if (item == NULL) {
+ /* Handle KeyError only: */
+ if (!PyErr_ExceptionMatches(PyExc_KeyError))
+ goto error;
+
+ /* Clear the error and use zero: */
+ PyErr_Clear();
+ item = PyInt_FromLong(0L);
+ if (item == NULL)
+ goto error;
+ }
+ const_one = PyInt_FromLong(1L);
+ if (const_one == NULL)
+ goto error;
+
+ incremented_item = PyNumber_Add(item, const_one);
+ if (incremented_item == NULL)
+ goto error;
+
+ if (PyObject_SetItem(dict, key, incremented_item) < 0)
+ goto error;
+ rv = 0; /* Success */
+ /* Continue with cleanup code */
+
+ error:
+ /* Cleanup code, shared by success and failure path */
+
+ /* Use Py_XDECREF() to ignore NULL references */
+ Py_XDECREF(item);
+ Py_XDECREF(const_one);
+ Py_XDECREF(incremented_item);
+
+ return rv; /* -1 for error, 0 for success */
+}
+\end{verbatim}
+\ttindex{incr_item()}
+
+This example represents an endorsed use of the \keyword{goto} statement
+in C! It illustrates the use of
+\cfunction{PyErr_ExceptionMatches()}\ttindex{PyErr_ExceptionMatches()} and
+\cfunction{PyErr_Clear()}\ttindex{PyErr_Clear()} to
+handle specific exceptions, and the use of
+\cfunction{Py_XDECREF()}\ttindex{Py_XDECREF()} to
+dispose of owned references that may be \NULL{} (note the
+\character{X} in the name; \cfunction{Py_DECREF()} would crash when
+confronted with a \NULL{} reference). It is important that the
+variables used to hold owned references are initialized to \NULL{} for
+this to work; likewise, the proposed return value is initialized to
+\code{-1} (failure) and only set to success after the final call made
+is successful.
+
+
+\section{Embedding Python \label{embedding}}
+
+The one important task that only embedders (as opposed to extension
+writers) of the Python interpreter have to worry about is the
+initialization, and possibly the finalization, of the Python
+interpreter. Most functionality of the interpreter can only be used
+after the interpreter has been initialized.
+
+The basic initialization function is
+\cfunction{Py_Initialize()}\ttindex{Py_Initialize()}.
+This initializes the table of loaded modules, and creates the
+fundamental modules \module{__builtin__}\refbimodindex{__builtin__},
+\module{__main__}\refbimodindex{__main__}, \module{sys}\refbimodindex{sys},
+and \module{exceptions}.\refbimodindex{exceptions} It also initializes
+the module search path (\code{sys.path}).%
+\indexiii{module}{search}{path}
+\withsubitem{(in module sys)}{\ttindex{path}}
+
+\cfunction{Py_Initialize()} does not set the ``script argument list''
+(\code{sys.argv}). If this variable is needed by Python code that
+will be executed later, it must be set explicitly with a call to
+\code{PySys_SetArgv(\var{argc},
+\var{argv})}\ttindex{PySys_SetArgv()} subsequent to the call to
+\cfunction{Py_Initialize()}.
+
+On most systems (in particular, on \UNIX{} and Windows, although the
+details are slightly different),
+\cfunction{Py_Initialize()} calculates the module search path based
+upon its best guess for the location of the standard Python
+interpreter executable, assuming that the Python library is found in a
+fixed location relative to the Python interpreter executable. In
+particular, it looks for a directory named
+\file{lib/python\shortversion} relative to the parent directory where
+the executable named \file{python} is found on the shell command
+search path (the environment variable \envvar{PATH}).
+
+For instance, if the Python executable is found in
+\file{/usr/local/bin/python}, it will assume that the libraries are in
+\file{/usr/local/lib/python\shortversion}. (In fact, this particular path
+is also the ``fallback'' location, used when no executable file named
+\file{python} is found along \envvar{PATH}.) The user can override
+this behavior by setting the environment variable \envvar{PYTHONHOME},
+or insert additional directories in front of the standard path by
+setting \envvar{PYTHONPATH}.
+
+The embedding application can steer the search by calling
+\code{Py_SetProgramName(\var{file})}\ttindex{Py_SetProgramName()} \emph{before} calling
+\cfunction{Py_Initialize()}. Note that \envvar{PYTHONHOME} still
+overrides this and \envvar{PYTHONPATH} is still inserted in front of
+the standard path. An application that requires total control has to
+provide its own implementation of
+\cfunction{Py_GetPath()}\ttindex{Py_GetPath()},
+\cfunction{Py_GetPrefix()}\ttindex{Py_GetPrefix()},
+\cfunction{Py_GetExecPrefix()}\ttindex{Py_GetExecPrefix()}, and
+\cfunction{Py_GetProgramFullPath()}\ttindex{Py_GetProgramFullPath()} (all
+defined in \file{Modules/getpath.c}).
+
+Sometimes, it is desirable to ``uninitialize'' Python. For instance,
+the application may want to start over (make another call to
+\cfunction{Py_Initialize()}) or the application is simply done with its
+use of Python and wants to free all memory allocated by Python. This
+can be accomplished by calling \cfunction{Py_Finalize()}. The function
+\cfunction{Py_IsInitialized()}\ttindex{Py_IsInitialized()} returns
+true if Python is currently in the initialized state. More
+information about these functions is given in a later chapter.