diff options
author | Fred Drake <fdrake@acm.org> | 2001-10-12 19:01:43 (GMT) |
---|---|---|
committer | Fred Drake <fdrake@acm.org> | 2001-10-12 19:01:43 (GMT) |
commit | 3adf79e3e2ac4ba0c2960997234c0d36c40468a8 (patch) | |
tree | 86cbac99bf498cbc2db49feb345b4bd4a17608f4 /Doc/api/intro.tex | |
parent | 716aac0448ef9fb6f3fd8c82237a7e73e9adb307 (diff) | |
download | cpython-3adf79e3e2ac4ba0c2960997234c0d36c40468a8.zip cpython-3adf79e3e2ac4ba0c2960997234c0d36c40468a8.tar.gz cpython-3adf79e3e2ac4ba0c2960997234c0d36c40468a8.tar.bz2 |
Break the Python/C API manual into smaller files by chapter. This manual
has grown beyond what font-lock will work with using the default (X)Emacs
settings.
Indentation of the description has been made consistent, and a number of
smaller markup adjustments have been made as well.
Diffstat (limited to 'Doc/api/intro.tex')
-rw-r--r-- | Doc/api/intro.tex | 558 |
1 files changed, 558 insertions, 0 deletions
diff --git a/Doc/api/intro.tex b/Doc/api/intro.tex new file mode 100644 index 0000000..d148ba8 --- /dev/null +++ b/Doc/api/intro.tex @@ -0,0 +1,558 @@ +\chapter{Introduction \label{intro}} + + +The Application Programmer's Interface to Python gives C and +\Cpp{} programmers access to the Python interpreter at a variety of +levels. The API is equally usable from \Cpp{}, but for brevity it is +generally referred to as the Python/C API. There are two +fundamentally different reasons for using the Python/C API. The first +reason is to write \emph{extension modules} for specific purposes; +these are C modules that extend the Python interpreter. This is +probably the most common use. The second reason is to use Python as a +component in a larger application; this technique is generally +referred to as \dfn{embedding} Python in an application. + +Writing an extension module is a relatively well-understood process, +where a ``cookbook'' approach works well. There are several tools +that automate the process to some extent. While people have embedded +Python in other applications since its early existence, the process of +embedding Python is less straightforward than writing an extension. + +Many API functions are useful independent of whether you're embedding +or extending Python; moreover, most applications that embed Python +will need to provide a custom extension as well, so it's probably a +good idea to become familiar with writing an extension before +attempting to embed Python in a real application. + + +\section{Include Files \label{includes}} + +All function, type and macro definitions needed to use the Python/C +API are included in your code by the following line: + +\begin{verbatim} +#include "Python.h" +\end{verbatim} + +This implies inclusion of the following standard headers: +\code{<stdio.h>}, \code{<string.h>}, \code{<errno.h>}, +\code{<limits.h>}, and \code{<stdlib.h>} (if available). +Since Python may define some pre-processor definitions which affect +the standard headers on some systems, you must include \file{Python.h} +before any standard headers are included. + +All user visible names defined by Python.h (except those defined by +the included standard headers) have one of the prefixes \samp{Py} or +\samp{_Py}. Names beginning with \samp{_Py} are for internal use by +the Python implementation and should not be used by extension writers. +Structure member names do not have a reserved prefix. + +\strong{Important:} user code should never define names that begin +with \samp{Py} or \samp{_Py}. This confuses the reader, and +jeopardizes the portability of the user code to future Python +versions, which may define additional names beginning with one of +these prefixes. + +The header files are typically installed with Python. On \UNIX, these +are located in the directories +\file{\envvar{prefix}/include/python\var{version}/} and +\file{\envvar{exec_prefix}/include/python\var{version}/}, where +\envvar{prefix} and \envvar{exec_prefix} are defined by the +corresponding parameters to Python's \program{configure} script and +\var{version} is \code{sys.version[:3]}. On Windows, the headers are +installed in \file{\envvar{prefix}/include}, where \envvar{prefix} is +the installation directory specified to the installer. + +To include the headers, place both directories (if different) on your +compiler's search path for includes. Do \emph{not} place the parent +directories on the search path and then use +\samp{\#include <python\shortversion/Python.h>}; this will break on +multi-platform builds since the platform independent headers under +\envvar{prefix} include the platform specific headers from +\envvar{exec_prefix}. + +\Cpp{} users should note that though the API is defined entirely using +C, the header files do properly declare the entry points to be +\code{extern "C"}, so there is no need to do anything special to use +the API from \Cpp. + + +\section{Objects, Types and Reference Counts \label{objects}} + +Most Python/C API functions have one or more arguments as well as a +return value of type \ctype{PyObject*}. This type is a pointer +to an opaque data type representing an arbitrary Python +object. Since all Python object types are treated the same way by the +Python language in most situations (e.g., assignments, scope rules, +and argument passing), it is only fitting that they should be +represented by a single C type. Almost all Python objects live on the +heap: you never declare an automatic or static variable of type +\ctype{PyObject}, only pointer variables of type \ctype{PyObject*} can +be declared. The sole exception are the type objects\obindex{type}; +since these must never be deallocated, they are typically static +\ctype{PyTypeObject} objects. + +All Python objects (even Python integers) have a \dfn{type} and a +\dfn{reference count}. An object's type determines what kind of object +it is (e.g., an integer, a list, or a user-defined function; there are +many more as explained in the \citetitle[../ref/ref.html]{Python +Reference Manual}). For each of the well-known types there is a macro +to check whether an object is of that type; for instance, +\samp{PyList_Check(\var{a})} is true if (and only if) the object +pointed to by \var{a} is a Python list. + + +\subsection{Reference Counts \label{refcounts}} + +The reference count is important because today's computers have a +finite (and often severely limited) memory size; it counts how many +different places there are that have a reference to an object. Such a +place could be another object, or a global (or static) C variable, or +a local variable in some C function. When an object's reference count +becomes zero, the object is deallocated. If it contains references to +other objects, their reference count is decremented. Those other +objects may be deallocated in turn, if this decrement makes their +reference count become zero, and so on. (There's an obvious problem +with objects that reference each other here; for now, the solution is +``don't do that.'') + +Reference counts are always manipulated explicitly. The normal way is +to use the macro \cfunction{Py_INCREF()}\ttindex{Py_INCREF()} to +increment an object's reference count by one, and +\cfunction{Py_DECREF()}\ttindex{Py_DECREF()} to decrement it by +one. The \cfunction{Py_DECREF()} macro is considerably more complex +than the incref one, since it must check whether the reference count +becomes zero and then cause the object's deallocator to be called. +The deallocator is a function pointer contained in the object's type +structure. The type-specific deallocator takes care of decrementing +the reference counts for other objects contained in the object if this +is a compound object type, such as a list, as well as performing any +additional finalization that's needed. There's no chance that the +reference count can overflow; at least as many bits are used to hold +the reference count as there are distinct memory locations in virtual +memory (assuming \code{sizeof(long) >= sizeof(char*)}). Thus, the +reference count increment is a simple operation. + +It is not necessary to increment an object's reference count for every +local variable that contains a pointer to an object. In theory, the +object's reference count goes up by one when the variable is made to +point to it and it goes down by one when the variable goes out of +scope. However, these two cancel each other out, so at the end the +reference count hasn't changed. The only real reason to use the +reference count is to prevent the object from being deallocated as +long as our variable is pointing to it. If we know that there is at +least one other reference to the object that lives at least as long as +our variable, there is no need to increment the reference count +temporarily. An important situation where this arises is in objects +that are passed as arguments to C functions in an extension module +that are called from Python; the call mechanism guarantees to hold a +reference to every argument for the duration of the call. + +However, a common pitfall is to extract an object from a list and +hold on to it for a while without incrementing its reference count. +Some other operation might conceivably remove the object from the +list, decrementing its reference count and possible deallocating it. +The real danger is that innocent-looking operations may invoke +arbitrary Python code which could do this; there is a code path which +allows control to flow back to the user from a \cfunction{Py_DECREF()}, +so almost any operation is potentially dangerous. + +A safe approach is to always use the generic operations (functions +whose name begins with \samp{PyObject_}, \samp{PyNumber_}, +\samp{PySequence_} or \samp{PyMapping_}). These operations always +increment the reference count of the object they return. This leaves +the caller with the responsibility to call +\cfunction{Py_DECREF()} when they are done with the result; this soon +becomes second nature. + + +\subsubsection{Reference Count Details \label{refcountDetails}} + +The reference count behavior of functions in the Python/C API is best +explained in terms of \emph{ownership of references}. Note that we +talk of owning references, never of owning objects; objects are always +shared! When a function owns a reference, it has to dispose of it +properly --- either by passing ownership on (usually to its caller) or +by calling \cfunction{Py_DECREF()} or \cfunction{Py_XDECREF()}. When +a function passes ownership of a reference on to its caller, the +caller is said to receive a \emph{new} reference. When no ownership +is transferred, the caller is said to \emph{borrow} the reference. +Nothing needs to be done for a borrowed reference. + +Conversely, when a calling function passes it a reference to an +object, there are two possibilities: the function \emph{steals} a +reference to the object, or it does not. Few functions steal +references; the two notable exceptions are +\cfunction{PyList_SetItem()}\ttindex{PyList_SetItem()} and +\cfunction{PyTuple_SetItem()}\ttindex{PyTuple_SetItem()}, which +steal a reference to the item (but not to the tuple or list into which +the item is put!). These functions were designed to steal a reference +because of a common idiom for populating a tuple or list with newly +created objects; for example, the code to create the tuple \code{(1, +2, "three")} could look like this (forgetting about error handling for +the moment; a better way to code this is shown below): + +\begin{verbatim} +PyObject *t; + +t = PyTuple_New(3); +PyTuple_SetItem(t, 0, PyInt_FromLong(1L)); +PyTuple_SetItem(t, 1, PyInt_FromLong(2L)); +PyTuple_SetItem(t, 2, PyString_FromString("three")); +\end{verbatim} + +Incidentally, \cfunction{PyTuple_SetItem()} is the \emph{only} way to +set tuple items; \cfunction{PySequence_SetItem()} and +\cfunction{PyObject_SetItem()} refuse to do this since tuples are an +immutable data type. You should only use +\cfunction{PyTuple_SetItem()} for tuples that you are creating +yourself. + +Equivalent code for populating a list can be written using +\cfunction{PyList_New()} and \cfunction{PyList_SetItem()}. Such code +can also use \cfunction{PySequence_SetItem()}; this illustrates the +difference between the two (the extra \cfunction{Py_DECREF()} calls): + +\begin{verbatim} +PyObject *l, *x; + +l = PyList_New(3); +x = PyInt_FromLong(1L); +PySequence_SetItem(l, 0, x); Py_DECREF(x); +x = PyInt_FromLong(2L); +PySequence_SetItem(l, 1, x); Py_DECREF(x); +x = PyString_FromString("three"); +PySequence_SetItem(l, 2, x); Py_DECREF(x); +\end{verbatim} + +You might find it strange that the ``recommended'' approach takes more +code. However, in practice, you will rarely use these ways of +creating and populating a tuple or list. There's a generic function, +\cfunction{Py_BuildValue()}, that can create most common objects from +C values, directed by a \dfn{format string}. For example, the +above two blocks of code could be replaced by the following (which +also takes care of the error checking): + +\begin{verbatim} +PyObject *t, *l; + +t = Py_BuildValue("(iis)", 1, 2, "three"); +l = Py_BuildValue("[iis]", 1, 2, "three"); +\end{verbatim} + +It is much more common to use \cfunction{PyObject_SetItem()} and +friends with items whose references you are only borrowing, like +arguments that were passed in to the function you are writing. In +that case, their behaviour regarding reference counts is much saner, +since you don't have to increment a reference count so you can give a +reference away (``have it be stolen''). For example, this function +sets all items of a list (actually, any mutable sequence) to a given +item: + +\begin{verbatim} +int set_all(PyObject *target, PyObject *item) +{ + int i, n; + + n = PyObject_Length(target); + if (n < 0) + return -1; + for (i = 0; i < n; i++) { + if (PyObject_SetItem(target, i, item) < 0) + return -1; + } + return 0; +} +\end{verbatim} +\ttindex{set_all()} + +The situation is slightly different for function return values. +While passing a reference to most functions does not change your +ownership responsibilities for that reference, many functions that +return a referece to an object give you ownership of the reference. +The reason is simple: in many cases, the returned object is created +on the fly, and the reference you get is the only reference to the +object. Therefore, the generic functions that return object +references, like \cfunction{PyObject_GetItem()} and +\cfunction{PySequence_GetItem()}, always return a new reference (the +caller becomes the owner of the reference). + +It is important to realize that whether you own a reference returned +by a function depends on which function you call only --- \emph{the +plumage} (the type of the type of the object passed as an +argument to the function) \emph{doesn't enter into it!} Thus, if you +extract an item from a list using \cfunction{PyList_GetItem()}, you +don't own the reference --- but if you obtain the same item from the +same list using \cfunction{PySequence_GetItem()} (which happens to +take exactly the same arguments), you do own a reference to the +returned object. + +Here is an example of how you could write a function that computes the +sum of the items in a list of integers; once using +\cfunction{PyList_GetItem()}\ttindex{PyList_GetItem()}, and once using +\cfunction{PySequence_GetItem()}\ttindex{PySequence_GetItem()}. + +\begin{verbatim} +long sum_list(PyObject *list) +{ + int i, n; + long total = 0; + PyObject *item; + + n = PyList_Size(list); + if (n < 0) + return -1; /* Not a list */ + for (i = 0; i < n; i++) { + item = PyList_GetItem(list, i); /* Can't fail */ + if (!PyInt_Check(item)) continue; /* Skip non-integers */ + total += PyInt_AsLong(item); + } + return total; +} +\end{verbatim} +\ttindex{sum_list()} + +\begin{verbatim} +long sum_sequence(PyObject *sequence) +{ + int i, n; + long total = 0; + PyObject *item; + n = PySequence_Length(sequence); + if (n < 0) + return -1; /* Has no length */ + for (i = 0; i < n; i++) { + item = PySequence_GetItem(sequence, i); + if (item == NULL) + return -1; /* Not a sequence, or other failure */ + if (PyInt_Check(item)) + total += PyInt_AsLong(item); + Py_DECREF(item); /* Discard reference ownership */ + } + return total; +} +\end{verbatim} +\ttindex{sum_sequence()} + + +\subsection{Types \label{types}} + +There are few other data types that play a significant role in +the Python/C API; most are simple C types such as \ctype{int}, +\ctype{long}, \ctype{double} and \ctype{char*}. A few structure types +are used to describe static tables used to list the functions exported +by a module or the data attributes of a new object type, and another +is used to describe the value of a complex number. These will +be discussed together with the functions that use them. + + +\section{Exceptions \label{exceptions}} + +The Python programmer only needs to deal with exceptions if specific +error handling is required; unhandled exceptions are automatically +propagated to the caller, then to the caller's caller, and so on, until +they reach the top-level interpreter, where they are reported to the +user accompanied by a stack traceback. + +For C programmers, however, error checking always has to be explicit. +All functions in the Python/C API can raise exceptions, unless an +explicit claim is made otherwise in a function's documentation. In +general, when a function encounters an error, it sets an exception, +discards any object references that it owns, and returns an +error indicator --- usually \NULL{} or \code{-1}. A few functions +return a Boolean true/false result, with false indicating an error. +Very few functions return no explicit error indicator or have an +ambiguous return value, and require explicit testing for errors with +\cfunction{PyErr_Occurred()}\ttindex{PyErr_Occurred()}. + +Exception state is maintained in per-thread storage (this is +equivalent to using global storage in an unthreaded application). A +thread can be in one of two states: an exception has occurred, or not. +The function \cfunction{PyErr_Occurred()} can be used to check for +this: it returns a borrowed reference to the exception type object +when an exception has occurred, and \NULL{} otherwise. There are a +number of functions to set the exception state: +\cfunction{PyErr_SetString()}\ttindex{PyErr_SetString()} is the most +common (though not the most general) function to set the exception +state, and \cfunction{PyErr_Clear()}\ttindex{PyErr_Clear()} clears the +exception state. + +The full exception state consists of three objects (all of which can +be \NULL): the exception type, the corresponding exception +value, and the traceback. These have the same meanings as the Python +\withsubitem{(in module sys)}{ + \ttindex{exc_type}\ttindex{exc_value}\ttindex{exc_traceback}} +objects \code{sys.exc_type}, \code{sys.exc_value}, and +\code{sys.exc_traceback}; however, they are not the same: the Python +objects represent the last exception being handled by a Python +\keyword{try} \ldots\ \keyword{except} statement, while the C level +exception state only exists while an exception is being passed on +between C functions until it reaches the Python bytecode interpreter's +main loop, which takes care of transferring it to \code{sys.exc_type} +and friends. + +Note that starting with Python 1.5, the preferred, thread-safe way to +access the exception state from Python code is to call the function +\withsubitem{(in module sys)}{\ttindex{exc_info()}} +\function{sys.exc_info()}, which returns the per-thread exception state +for Python code. Also, the semantics of both ways to access the +exception state have changed so that a function which catches an +exception will save and restore its thread's exception state so as to +preserve the exception state of its caller. This prevents common bugs +in exception handling code caused by an innocent-looking function +overwriting the exception being handled; it also reduces the often +unwanted lifetime extension for objects that are referenced by the +stack frames in the traceback. + +As a general principle, a function that calls another function to +perform some task should check whether the called function raised an +exception, and if so, pass the exception state on to its caller. It +should discard any object references that it owns, and return an +error indicator, but it should \emph{not} set another exception --- +that would overwrite the exception that was just raised, and lose +important information about the exact cause of the error. + +A simple example of detecting exceptions and passing them on is shown +in the \cfunction{sum_sequence()}\ttindex{sum_sequence()} example +above. It so happens that that example doesn't need to clean up any +owned references when it detects an error. The following example +function shows some error cleanup. First, to remind you why you like +Python, we show the equivalent Python code: + +\begin{verbatim} +def incr_item(dict, key): + try: + item = dict[key] + except KeyError: + item = 0 + dict[key] = item + 1 +\end{verbatim} +\ttindex{incr_item()} + +Here is the corresponding C code, in all its glory: + +\begin{verbatim} +int incr_item(PyObject *dict, PyObject *key) +{ + /* Objects all initialized to NULL for Py_XDECREF */ + PyObject *item = NULL, *const_one = NULL, *incremented_item = NULL; + int rv = -1; /* Return value initialized to -1 (failure) */ + + item = PyObject_GetItem(dict, key); + if (item == NULL) { + /* Handle KeyError only: */ + if (!PyErr_ExceptionMatches(PyExc_KeyError)) + goto error; + + /* Clear the error and use zero: */ + PyErr_Clear(); + item = PyInt_FromLong(0L); + if (item == NULL) + goto error; + } + const_one = PyInt_FromLong(1L); + if (const_one == NULL) + goto error; + + incremented_item = PyNumber_Add(item, const_one); + if (incremented_item == NULL) + goto error; + + if (PyObject_SetItem(dict, key, incremented_item) < 0) + goto error; + rv = 0; /* Success */ + /* Continue with cleanup code */ + + error: + /* Cleanup code, shared by success and failure path */ + + /* Use Py_XDECREF() to ignore NULL references */ + Py_XDECREF(item); + Py_XDECREF(const_one); + Py_XDECREF(incremented_item); + + return rv; /* -1 for error, 0 for success */ +} +\end{verbatim} +\ttindex{incr_item()} + +This example represents an endorsed use of the \keyword{goto} statement +in C! It illustrates the use of +\cfunction{PyErr_ExceptionMatches()}\ttindex{PyErr_ExceptionMatches()} and +\cfunction{PyErr_Clear()}\ttindex{PyErr_Clear()} to +handle specific exceptions, and the use of +\cfunction{Py_XDECREF()}\ttindex{Py_XDECREF()} to +dispose of owned references that may be \NULL{} (note the +\character{X} in the name; \cfunction{Py_DECREF()} would crash when +confronted with a \NULL{} reference). It is important that the +variables used to hold owned references are initialized to \NULL{} for +this to work; likewise, the proposed return value is initialized to +\code{-1} (failure) and only set to success after the final call made +is successful. + + +\section{Embedding Python \label{embedding}} + +The one important task that only embedders (as opposed to extension +writers) of the Python interpreter have to worry about is the +initialization, and possibly the finalization, of the Python +interpreter. Most functionality of the interpreter can only be used +after the interpreter has been initialized. + +The basic initialization function is +\cfunction{Py_Initialize()}\ttindex{Py_Initialize()}. +This initializes the table of loaded modules, and creates the +fundamental modules \module{__builtin__}\refbimodindex{__builtin__}, +\module{__main__}\refbimodindex{__main__}, \module{sys}\refbimodindex{sys}, +and \module{exceptions}.\refbimodindex{exceptions} It also initializes +the module search path (\code{sys.path}).% +\indexiii{module}{search}{path} +\withsubitem{(in module sys)}{\ttindex{path}} + +\cfunction{Py_Initialize()} does not set the ``script argument list'' +(\code{sys.argv}). If this variable is needed by Python code that +will be executed later, it must be set explicitly with a call to +\code{PySys_SetArgv(\var{argc}, +\var{argv})}\ttindex{PySys_SetArgv()} subsequent to the call to +\cfunction{Py_Initialize()}. + +On most systems (in particular, on \UNIX{} and Windows, although the +details are slightly different), +\cfunction{Py_Initialize()} calculates the module search path based +upon its best guess for the location of the standard Python +interpreter executable, assuming that the Python library is found in a +fixed location relative to the Python interpreter executable. In +particular, it looks for a directory named +\file{lib/python\shortversion} relative to the parent directory where +the executable named \file{python} is found on the shell command +search path (the environment variable \envvar{PATH}). + +For instance, if the Python executable is found in +\file{/usr/local/bin/python}, it will assume that the libraries are in +\file{/usr/local/lib/python\shortversion}. (In fact, this particular path +is also the ``fallback'' location, used when no executable file named +\file{python} is found along \envvar{PATH}.) The user can override +this behavior by setting the environment variable \envvar{PYTHONHOME}, +or insert additional directories in front of the standard path by +setting \envvar{PYTHONPATH}. + +The embedding application can steer the search by calling +\code{Py_SetProgramName(\var{file})}\ttindex{Py_SetProgramName()} \emph{before} calling +\cfunction{Py_Initialize()}. Note that \envvar{PYTHONHOME} still +overrides this and \envvar{PYTHONPATH} is still inserted in front of +the standard path. An application that requires total control has to +provide its own implementation of +\cfunction{Py_GetPath()}\ttindex{Py_GetPath()}, +\cfunction{Py_GetPrefix()}\ttindex{Py_GetPrefix()}, +\cfunction{Py_GetExecPrefix()}\ttindex{Py_GetExecPrefix()}, and +\cfunction{Py_GetProgramFullPath()}\ttindex{Py_GetProgramFullPath()} (all +defined in \file{Modules/getpath.c}). + +Sometimes, it is desirable to ``uninitialize'' Python. For instance, +the application may want to start over (make another call to +\cfunction{Py_Initialize()}) or the application is simply done with its +use of Python and wants to free all memory allocated by Python. This +can be accomplished by calling \cfunction{Py_Finalize()}. The function +\cfunction{Py_IsInitialized()}\ttindex{Py_IsInitialized()} returns +true if Python is currently in the initialized state. More +information about these functions is given in a later chapter. |