summaryrefslogtreecommitdiffstats
path: root/Doc/libpickle.tex
diff options
context:
space:
mode:
authorGuido van Rossum <guido@python.org>1995-02-15 15:53:08 (GMT)
committerGuido van Rossum <guido@python.org>1995-02-15 15:53:08 (GMT)
commitd1883588aec09fcba1ad512cd889e5837087e318 (patch)
treee4b4464323558d66f94c4ffb1a0145048f886d1d /Doc/libpickle.tex
parente1ff7adbf64d50583687e4d51cc12eabb7a01c31 (diff)
downloadcpython-d1883588aec09fcba1ad512cd889e5837087e318.zip
cpython-d1883588aec09fcba1ad512cd889e5837087e318.tar.gz
cpython-d1883588aec09fcba1ad512cd889e5837087e318.tar.bz2
added docs for pickle, shelve and copy
Diffstat (limited to 'Doc/libpickle.tex')
-rw-r--r--Doc/libpickle.tex170
1 files changed, 170 insertions, 0 deletions
diff --git a/Doc/libpickle.tex b/Doc/libpickle.tex
new file mode 100644
index 0000000..a9d5fa4
--- /dev/null
+++ b/Doc/libpickle.tex
@@ -0,0 +1,170 @@
+\section{Built-in module \sectcode{pickle}}
+\stmodindex{pickle}
+\index{persistency}
+\indexii{persistent}{objects}
+\indexii{serializing}{objects}
+\indexii{marshalling}{objects}
+\indexii{flattening}{objects}
+\indexii{pickling}{objects}
+
+The \code{pickle} module implements a basic but powerful algorithm for
+``pickling'' (a.k.a. serializing, marshalling or flattening) nearly
+arbitrary Python objects. This is a more primitive notion than
+persistency --- although \code{pickle} reads and writes file objects,
+it does not handle the issue of naming persistent objects, nor the
+(even more complicated) area of concurrent access to persistent
+objects. The \code{pickle} module can transform a complex object into
+a byte stream and it can transform the byte stream into an object with
+the same internal structure. The most obvious thing to do with these
+byte streams is to write them onto a file, but it is also conceivable
+to send them across a network or store them in a database. The module
+\code{shelve} provides a simple interface to pickle and unpickle
+objects on ``dbm''-style database files.
+\stmodindex{shelve}
+
+Unlike the built-in module \code{marshal}, \code{pickle} handles the
+following correctly:
+\stmodindex{marshal}
+
+\begin{itemize}
+
+\item recursive objects
+
+\item pointer sharing
+
+\item instances uf user-defined classes
+
+\end{itemize}
+
+The data format used by \code{pickle} is Python-specific. This has
+the advantage that there are no restrictions imposed by external
+standards such as CORBA (which probably can't represent pointer
+sharing or recursive objects); however it means that non-Python
+programs may not be able to reconstruct pickled Python objects.
+
+The \code{pickle} data format uses a printable ASCII representation.
+This is slightly more voluminous than a binary representation.
+However, small integers actually take {\em less} space when
+represented as minimal-size decimal strings than when represented as
+32-bit binary numbers, and strings are only much longer if they
+contain many control characters or 8-bit characters. The big
+advantage of using printable ASCII (and of some other characteristics
+of \code{pickle}'s representation) is that for debugging or recovery
+purposes it is possible for a human to read the pickled file with a
+standard text editor. (I could have gone a step further and used a
+notation like S-expressions, but the parser would have been
+considerably more complicated and slower, and the files would probably
+have become much larger.)
+
+The \code{pickle} module doesn't handle code objects, which the
+\code{marshal} module does. I suppose \code{pickle} could, and maybe
+it should, but there's probably no great need for it right now (as
+long as \code{marshal} continues to be used for reading and writing
+code objects), and at least this avoids the possibility of smuggling
+Trojan horses into a program.
+\stmodindex{marshal}
+
+For the benefit of persistency modules written using \code{pickle}, it
+supports the notion of a reference to an object outside the pickled
+data stream. Such objects are referenced by a name, which is an
+arbitrary string of printable ASCII characters. The resolution of
+such names is not defined by the \code{pickle} module --- the
+persistent object module will have to implement a method
+\code{persistent_load}. To write references to persistent objects,
+the persistent module must define a method \code{persistent_id} which
+returns either \code{None} or the persistent ID of the object.
+
+There are some restrictions on the pickling of class instances.
+
+First of all, the class must be defined at the top level in a module.
+
+Next, it must normally be possible to create class instances by
+calling the class without arguments. If this is undesirable, the
+class can define a method \code{__getinitargs__()}, which should
+return a {\em tuple} containing the arguments to be passed to the
+class constructor (\code{__init__()}).
+\ttindex{__getinitargs__}
+\ttindex{__init__}
+
+Classes can further influence how they are pickled --- if the class
+defines the method \code{__getstate__()}, it is called and the return
+state is pickled as the contents for the instance, and if the class
+defines the method \code{__setstate__()}, it is called with the
+unpickled state. (Note that these methods can also be used to
+implement copying class instances.) If there is no
+\code{__getstate__()} method, the instance's \code{__dict__} is
+pickled. If there is no \code{__setstate__()} method, the pickled
+object must be a dictionary and its items are assigned to the new
+instance's dictionary. (If a class defines both \code{__getstate__()}
+and \code{__setstate__()}, the state object needn't be a dictionary
+--- these methods can do what they want.) This protocol is also used
+by the shallow and deep copying operations defined in the \code{copy}
+module.
+\ttindex{__getstate__}
+\ttindex{__setstate__}
+\ttindex{__dict__}
+
+Note that when class instances are pickled, their class's code and
+data is not pickled along with them. Only the instance data is
+pickled. This is done on purpose, so you can fix bugs in a class or
+add methods and still load objects that were created with an earlier
+version of the class. If you plan to have long-lived objects that
+will see many versions of a class, it may be worth to put a version
+number in the objects so that suitable conversions can be made by the
+class's \code{__setstate__()} method.
+
+The interface can be summarized as follows.
+
+To pickle an object \code{x} onto a file \code{f}, open for writing:
+
+\begin{verbatim}
+p = pickle.Pickler(f)
+p.dump(x)
+\end{verbatim}
+
+To unpickle an object \code{x} from a file \code{f}, open for reading:
+
+\begin{verbatim}
+u = pickle.Unpickler(f)
+x = u.load(x)
+\end{verbatim}
+
+The \code{Pickler} class only calls the method \code{f.write} with a
+string argument. The \code{Unpickler} calls the methods \code{f.read}
+(with an integer argument) and \code{f.readline} (without argument),
+both returning a string. It is explicitly allowed to pass non-file
+objects here, as long as they have the right methods.
+
+The following types can be pickled:
+\begin{itemize}
+
+\item \code{None}
+
+\item integers, long integers, floating point numbers
+
+\item strings
+
+\item tuples, lists and dictionaries containing only picklable objects
+
+\item class instances whose \code{__dict__} or \code{__setstate__()}
+is picklable
+
+\end{itemize}
+
+Attempts to pickle unpicklable objects will raise an exception; when
+this happens, an unspecified number of bytes may have been written to
+the file argument.
+
+It is possible to make multiple calls to \code{Pickler.dump()} or to
+\code{Unpickler.load()}, as long as there is a one-to-one
+correspondence between pickler and \code{Unpickler} objects and
+between \code{dump} and \code{load} calls for any pair of
+corresponding \code{Pickler} and \code{Unpicklers}. {\em Warning}:
+this is intended for pickling multiple objects without intervening
+modifications to the objects or their parts. If you modify an object
+and then pickle it again using the same \code{Pickler} instance, the
+object is not pickled again --- a reference to it is pickled and the
+\code{Unpickler} will return the old value, not the modified one. (There
+are two problems here: (a) detecting changes, and (b) marshalling a
+minimal set of changes. I have no answers. Garbage Collection may
+also become a problem here.)