diff options
author | Georg Brandl <georg@python.org> | 2007-08-15 14:28:01 (GMT) |
---|---|---|
committer | Georg Brandl <georg@python.org> | 2007-08-15 14:28:01 (GMT) |
commit | 8ec7f656134b1230ab23003a94ba3266d7064122 (patch) | |
tree | bc730d5fb3302dc375edd26b26f750d609b61d72 /Doc/library/pickle.rst | |
parent | f56181ff53ba00b7bed3997a4dccd9a1b6217b57 (diff) | |
download | cpython-8ec7f656134b1230ab23003a94ba3266d7064122.zip cpython-8ec7f656134b1230ab23003a94ba3266d7064122.tar.gz cpython-8ec7f656134b1230ab23003a94ba3266d7064122.tar.bz2 |
Move the 2.6 reST doc tree in place.
Diffstat (limited to 'Doc/library/pickle.rst')
-rw-r--r-- | Doc/library/pickle.rst | 868 |
1 files changed, 868 insertions, 0 deletions
diff --git a/Doc/library/pickle.rst b/Doc/library/pickle.rst new file mode 100644 index 0000000..ab19ff8 --- /dev/null +++ b/Doc/library/pickle.rst @@ -0,0 +1,868 @@ + +:mod:`pickle` --- Python object serialization +============================================= + +.. index:: + single: persistence + pair: persistent; objects + pair: serializing; objects + pair: marshalling; objects + pair: flattening; objects + pair: pickling; objects + +.. module:: pickle + :synopsis: Convert Python objects to streams of bytes and back. + + +.. % Substantial improvements by Jim Kerr <jbkerr@sr.hp.com>. +.. % Rewritten by Barry Warsaw <barry@zope.com> + +The :mod:`pickle` module implements a fundamental, but powerful algorithm for +serializing and de-serializing a Python object structure. "Pickling" is the +process whereby a Python object hierarchy is converted into a byte stream, and +"unpickling" is the inverse operation, whereby a byte stream is converted back +into an object hierarchy. Pickling (and unpickling) is alternatively known as +"serialization", "marshalling," [#]_ or "flattening", however, to avoid +confusion, the terms used here are "pickling" and "unpickling". + +This documentation describes both the :mod:`pickle` module and the +:mod:`cPickle` module. + + +Relationship to other Python modules +------------------------------------ + +The :mod:`pickle` module has an optimized cousin called the :mod:`cPickle` +module. As its name implies, :mod:`cPickle` is written in C, so it can be up to +1000 times faster than :mod:`pickle`. However it does not support subclassing +of the :func:`Pickler` and :func:`Unpickler` classes, because in :mod:`cPickle` +these are functions, not classes. Most applications have no need for this +functionality, and can benefit from the improved performance of :mod:`cPickle`. +Other than that, the interfaces of the two modules are nearly identical; the +common interface is described in this manual and differences are pointed out +where necessary. In the following discussions, we use the term "pickle" to +collectively describe the :mod:`pickle` and :mod:`cPickle` modules. + +The data streams the two modules produce are guaranteed to be interchangeable. + +Python has a more primitive serialization module called :mod:`marshal`, but in +general :mod:`pickle` should always be the preferred way to serialize Python +objects. :mod:`marshal` exists primarily to support Python's :file:`.pyc` +files. + +The :mod:`pickle` module differs from :mod:`marshal` several significant ways: + +* The :mod:`pickle` module keeps track of the objects it has already serialized, + so that later references to the same object won't be serialized again. + :mod:`marshal` doesn't do this. + + This has implications both for recursive objects and object sharing. Recursive + objects are objects that contain references to themselves. These are not + handled by marshal, and in fact, attempting to marshal recursive objects will + crash your Python interpreter. Object sharing happens when there are multiple + references to the same object in different places in the object hierarchy being + serialized. :mod:`pickle` stores such objects only once, and ensures that all + other references point to the master copy. Shared objects remain shared, which + can be very important for mutable objects. + +* :mod:`marshal` cannot be used to serialize user-defined classes and their + instances. :mod:`pickle` can save and restore class instances transparently, + however the class definition must be importable and live in the same module as + when the object was stored. + +* The :mod:`marshal` serialization format is not guaranteed to be portable + across Python versions. Because its primary job in life is to support + :file:`.pyc` files, the Python implementers reserve the right to change the + serialization format in non-backwards compatible ways should the need arise. + The :mod:`pickle` serialization format is guaranteed to be backwards compatible + across Python releases. + +.. warning:: + + The :mod:`pickle` module is not intended to be secure against erroneous or + maliciously constructed data. Never unpickle data received from an untrusted or + unauthenticated source. + +Note that serialization is a more primitive notion than persistence; although +:mod:`pickle` reads and writes file objects, it does not handle the issue of +naming persistent objects, nor the (even more complicated) issue of concurrent +access to persistent objects. The :mod:`pickle` module can transform a complex +object into a byte stream and it can transform the byte stream into an object +with the same internal structure. Perhaps the most obvious thing to do with +these byte streams is to write them onto a file, but it is also conceivable to +send them across a network or store them in a database. The module +:mod:`shelve` provides a simple interface to pickle and unpickle objects on +DBM-style database files. + + +Data stream format +------------------ + +.. index:: + single: XDR + single: External Data Representation + +The data format used by :mod:`pickle` is Python-specific. This has the +advantage that there are no restrictions imposed by external standards such as +XDR (which can't represent pointer sharing); however it means that non-Python +programs may not be able to reconstruct pickled Python objects. + +By default, the :mod:`pickle` data format uses a printable ASCII representation. +This is slightly more voluminous than a binary representation. The big +advantage of using printable ASCII (and of some other characteristics of +:mod:`pickle`'s representation) is that for debugging or recovery purposes it is +possible for a human to read the pickled file with a standard text editor. + +There are currently 3 different protocols which can be used for pickling. + +* Protocol version 0 is the original ASCII protocol and is backwards compatible + with earlier versions of Python. + +* Protocol version 1 is the old binary format which is also compatible with + earlier versions of Python. + +* Protocol version 2 was introduced in Python 2.3. It provides much more + efficient pickling of new-style classes. + +Refer to :pep:`307` for more information. + +If a *protocol* is not specified, protocol 0 is used. If *protocol* is specified +as a negative value or :const:`HIGHEST_PROTOCOL`, the highest protocol version +available will be used. + +.. versionchanged:: 2.3 + Introduced the *protocol* parameter. + +A binary format, which is slightly more efficient, can be chosen by specifying a +*protocol* version >= 1. + + +Usage +----- + +To serialize an object hierarchy, you first create a pickler, then you call the +pickler's :meth:`dump` method. To de-serialize a data stream, you first create +an unpickler, then you call the unpickler's :meth:`load` method. The +:mod:`pickle` module provides the following constant: + + +.. data:: HIGHEST_PROTOCOL + + The highest protocol version available. This value can be passed as a + *protocol* value. + + .. versionadded:: 2.3 + +.. note:: + + Be sure to always open pickle files created with protocols >= 1 in binary mode. + For the old ASCII-based pickle protocol 0 you can use either text mode or binary + mode as long as you stay consistent. + + A pickle file written with protocol 0 in binary mode will contain lone linefeeds + as line terminators and therefore will look "funny" when viewed in Notepad or + other editors which do not support this format. + +The :mod:`pickle` module provides the following functions to make the pickling +process more convenient: + + +.. function:: dump(obj, file[, protocol]) + + Write a pickled representation of *obj* to the open file object *file*. This is + equivalent to ``Pickler(file, protocol).dump(obj)``. + + If the *protocol* parameter is omitted, protocol 0 is used. If *protocol* is + specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest protocol + version will be used. + + .. versionchanged:: 2.3 + Introduced the *protocol* parameter. + + *file* must have a :meth:`write` method that accepts a single string argument. + It can thus be a file object opened for writing, a :mod:`StringIO` object, or + any other custom object that meets this interface. + + +.. function:: load(file) + + Read a string from the open file object *file* and interpret it as a pickle data + stream, reconstructing and returning the original object hierarchy. This is + equivalent to ``Unpickler(file).load()``. + + *file* must have two methods, a :meth:`read` method that takes an integer + argument, and a :meth:`readline` method that requires no arguments. Both + methods should return a string. Thus *file* can be a file object opened for + reading, a :mod:`StringIO` object, or any other custom object that meets this + interface. + + This function automatically determines whether the data stream was written in + binary mode or not. + + +.. function:: dumps(obj[, protocol]) + + Return the pickled representation of the object as a string, instead of writing + it to a file. + + If the *protocol* parameter is omitted, protocol 0 is used. If *protocol* is + specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest protocol + version will be used. + + .. versionchanged:: 2.3 + The *protocol* parameter was added. + + +.. function:: loads(string) + + Read a pickled object hierarchy from a string. Characters in the string past + the pickled object's representation are ignored. + +The :mod:`pickle` module also defines three exceptions: + + +.. exception:: PickleError + + A common base class for the other exceptions defined below. This inherits from + :exc:`Exception`. + + +.. exception:: PicklingError + + This exception is raised when an unpicklable object is passed to the + :meth:`dump` method. + + +.. exception:: UnpicklingError + + This exception is raised when there is a problem unpickling an object. Note that + other exceptions may also be raised during unpickling, including (but not + necessarily limited to) :exc:`AttributeError`, :exc:`EOFError`, + :exc:`ImportError`, and :exc:`IndexError`. + +The :mod:`pickle` module also exports two callables [#]_, :class:`Pickler` and +:class:`Unpickler`: + + +.. class:: Pickler(file[, protocol]) + + This takes a file-like object to which it will write a pickle data stream. + + If the *protocol* parameter is omitted, protocol 0 is used. If *protocol* is + specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest + protocol version will be used. + + .. versionchanged:: 2.3 + Introduced the *protocol* parameter. + + *file* must have a :meth:`write` method that accepts a single string argument. + It can thus be an open file object, a :mod:`StringIO` object, or any other + custom object that meets this interface. + +:class:`Pickler` objects define one (or two) public methods: + + +.. method:: Pickler.dump(obj) + + Write a pickled representation of *obj* to the open file object given in the + constructor. Either the binary or ASCII format will be used, depending on the + value of the *protocol* argument passed to the constructor. + + +.. method:: Pickler.clear_memo() + + Clears the pickler's "memo". The memo is the data structure that remembers + which objects the pickler has already seen, so that shared or recursive objects + pickled by reference and not by value. This method is useful when re-using + picklers. + + .. note:: + + Prior to Python 2.3, :meth:`clear_memo` was only available on the picklers + created by :mod:`cPickle`. In the :mod:`pickle` module, picklers have an + instance variable called :attr:`memo` which is a Python dictionary. So to clear + the memo for a :mod:`pickle` module pickler, you could do the following:: + + mypickler.memo.clear() + + Code that does not need to support older versions of Python should simply use + :meth:`clear_memo`. + +It is possible to make multiple calls to the :meth:`dump` method of the same +:class:`Pickler` instance. These must then be matched to the same number of +calls to the :meth:`load` method of the corresponding :class:`Unpickler` +instance. If the same object is pickled by multiple :meth:`dump` calls, the +:meth:`load` will all yield references to the same object. [#]_ + +:class:`Unpickler` objects are defined as: + + +.. class:: Unpickler(file) + + This takes a file-like object from which it will read a pickle data stream. + This class automatically determines whether the data stream was written in + binary mode or not, so it does not need a flag as in the :class:`Pickler` + factory. + + *file* must have two methods, a :meth:`read` method that takes an integer + argument, and a :meth:`readline` method that requires no arguments. Both + methods should return a string. Thus *file* can be a file object opened for + reading, a :mod:`StringIO` object, or any other custom object that meets this + interface. + +:class:`Unpickler` objects have one (or two) public methods: + + +.. method:: Unpickler.load() + + Read a pickled object representation from the open file object given in the + constructor, and return the reconstituted object hierarchy specified therein. + + This method automatically determines whether the data stream was written in + binary mode or not. + + +.. method:: Unpickler.noload() + + This is just like :meth:`load` except that it doesn't actually create any + objects. This is useful primarily for finding what's called "persistent ids" + that may be referenced in a pickle data stream. See section + :ref:`pickle-protocol` below for more details. + + **Note:** the :meth:`noload` method is currently only available on + :class:`Unpickler` objects created with the :mod:`cPickle` module. + :mod:`pickle` module :class:`Unpickler`\ s do not have the :meth:`noload` + method. + + +What can be pickled and unpickled? +---------------------------------- + +The following types can be pickled: + +* ``None``, ``True``, and ``False`` + +* integers, long integers, floating point numbers, complex numbers + +* normal and Unicode strings + +* tuples, lists, sets, and dictionaries containing only picklable objects + +* functions defined at the top level of a module + +* built-in functions defined at the top level of a module + +* classes that are defined at the top level of a module + +* instances of such classes whose :attr:`__dict__` or :meth:`__setstate__` is + picklable (see section :ref:`pickle-protocol` for details) + +Attempts to pickle unpicklable objects will raise the :exc:`PicklingError` +exception; when this happens, an unspecified number of bytes may have already +been written to the underlying file. Trying to pickle a highly recursive data +structure may exceed the maximum recursion depth, a :exc:`RuntimeError` will be +raised in this case. You can carefully raise this limit with +:func:`sys.setrecursionlimit`. + +Note that functions (built-in and user-defined) are pickled by "fully qualified" +name reference, not by value. This means that only the function name is +pickled, along with the name of module the function is defined in. Neither the +function's code, nor any of its function attributes are pickled. Thus the +defining module must be importable in the unpickling environment, and the module +must contain the named object, otherwise an exception will be raised. [#]_ + +Similarly, classes are pickled by named reference, so the same restrictions in +the unpickling environment apply. Note that none of the class's code or data is +pickled, so in the following example the class attribute ``attr`` is not +restored in the unpickling environment:: + + class Foo: + attr = 'a class attr' + + picklestring = pickle.dumps(Foo) + +These restrictions are why picklable functions and classes must be defined in +the top level of a module. + +Similarly, when class instances are pickled, their class's code and data are not +pickled along with them. Only the instance data are pickled. This is done on +purpose, so you can fix bugs in a class or add methods to the class and still +load objects that were created with an earlier version of the class. If you +plan to have long-lived objects that will see many versions of a class, it may +be worthwhile to put a version number in the objects so that suitable +conversions can be made by the class's :meth:`__setstate__` method. + + +.. _pickle-protocol: + +The pickle protocol +------------------- + +This section describes the "pickling protocol" that defines the interface +between the pickler/unpickler and the objects that are being serialized. This +protocol provides a standard way for you to define, customize, and control how +your objects are serialized and de-serialized. The description in this section +doesn't cover specific customizations that you can employ to make the unpickling +environment slightly safer from untrusted pickle data streams; see section +:ref:`pickle-sub` for more details. + + +.. _pickle-inst: + +Pickling and unpickling normal class instances +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. index:: + single: __getinitargs__() (copy protocol) + single: __init__() (instance constructor) + +When a pickled class instance is unpickled, its :meth:`__init__` method is +normally *not* invoked. If it is desirable that the :meth:`__init__` method be +called on unpickling, an old-style class can define a method +:meth:`__getinitargs__`, which should return a *tuple* containing the arguments +to be passed to the class constructor (:meth:`__init__` for example). The +:meth:`__getinitargs__` method is called at pickle time; the tuple it returns is +incorporated in the pickle for the instance. + +.. index:: single: __getnewargs__() (copy protocol) + +New-style types can provide a :meth:`__getnewargs__` method that is used for +protocol 2. Implementing this method is needed if the type establishes some +internal invariants when the instance is created, or if the memory allocation is +affected by the values passed to the :meth:`__new__` method for the type (as it +is for tuples and strings). Instances of a new-style type :class:`C` are +created using :: + + obj = C.__new__(C, *args) + + +where *args* is the result of calling :meth:`__getnewargs__` on the original +object; if there is no :meth:`__getnewargs__`, an empty tuple is assumed. + +.. index:: + single: __getstate__() (copy protocol) + single: __setstate__() (copy protocol) + single: __dict__ (instance attribute) + +Classes can further influence how their instances are pickled; if the class +defines the method :meth:`__getstate__`, it is called and the return state is +pickled as the contents for the instance, instead of the contents of the +instance's dictionary. If there is no :meth:`__getstate__` method, the +instance's :attr:`__dict__` is pickled. + +Upon unpickling, if the class also defines the method :meth:`__setstate__`, it +is called with the unpickled state. [#]_ If there is no :meth:`__setstate__` +method, the pickled state must be a dictionary and its items are assigned to the +new instance's dictionary. If a class defines both :meth:`__getstate__` and +:meth:`__setstate__`, the state object needn't be a dictionary and these methods +can do what they want. [#]_ + +.. warning:: + + For new-style classes, if :meth:`__getstate__` returns a false value, the + :meth:`__setstate__` method will not be called. + + +Pickling and unpickling extension types +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +When the :class:`Pickler` encounters an object of a type it knows nothing about +--- such as an extension type --- it looks in two places for a hint of how to +pickle it. One alternative is for the object to implement a :meth:`__reduce__` +method. If provided, at pickling time :meth:`__reduce__` will be called with no +arguments, and it must return either a string or a tuple. + +If a string is returned, it names a global variable whose contents are pickled +as normal. The string returned by :meth:`__reduce__` should be the object's +local name relative to its module; the pickle module searches the module +namespace to determine the object's module. + +When a tuple is returned, it must be between two and five elements long. +Optional elements can either be omitted, or ``None`` can be provided as their +value. The semantics of each element are: + +* A callable object that will be called to create the initial version of the + object. The next element of the tuple will provide arguments for this callable, + and later elements provide additional state information that will subsequently + be used to fully reconstruct the pickled data. + + In the unpickling environment this object must be either a class, a callable + registered as a "safe constructor" (see below), or it must have an attribute + :attr:`__safe_for_unpickling__` with a true value. Otherwise, an + :exc:`UnpicklingError` will be raised in the unpickling environment. Note that + as usual, the callable itself is pickled by name. + +* A tuple of arguments for the callable object. + + .. versionchanged:: 2.5 + Formerly, this argument could also be ``None``. + +* Optionally, the object's state, which will be passed to the object's + :meth:`__setstate__` method as described in section :ref:`pickle-inst`. If the + object has no :meth:`__setstate__` method, then, as above, the value must be a + dictionary and it will be added to the object's :attr:`__dict__`. + +* Optionally, an iterator (and not a sequence) yielding successive list items. + These list items will be pickled, and appended to the object using either + ``obj.append(item)`` or ``obj.extend(list_of_items)``. This is primarily used + for list subclasses, but may be used by other classes as long as they have + :meth:`append` and :meth:`extend` methods with the appropriate signature. + (Whether :meth:`append` or :meth:`extend` is used depends on which pickle + protocol version is used as well as the number of items to append, so both must + be supported.) + +* Optionally, an iterator (not a sequence) yielding successive dictionary items, + which should be tuples of the form ``(key, value)``. These items will be + pickled and stored to the object using ``obj[key] = value``. This is primarily + used for dictionary subclasses, but may be used by other classes as long as they + implement :meth:`__setitem__`. + +It is sometimes useful to know the protocol version when implementing +:meth:`__reduce__`. This can be done by implementing a method named +:meth:`__reduce_ex__` instead of :meth:`__reduce__`. :meth:`__reduce_ex__`, when +it exists, is called in preference over :meth:`__reduce__` (you may still +provide :meth:`__reduce__` for backwards compatibility). The +:meth:`__reduce_ex__` method will be called with a single integer argument, the +protocol version. + +The :class:`object` class implements both :meth:`__reduce__` and +:meth:`__reduce_ex__`; however, if a subclass overrides :meth:`__reduce__` but +not :meth:`__reduce_ex__`, the :meth:`__reduce_ex__` implementation detects this +and calls :meth:`__reduce__`. + +An alternative to implementing a :meth:`__reduce__` method on the object to be +pickled, is to register the callable with the :mod:`copy_reg` module. This +module provides a way for programs to register "reduction functions" and +constructors for user-defined types. Reduction functions have the same +semantics and interface as the :meth:`__reduce__` method described above, except +that they are called with a single argument, the object to be pickled. + +The registered constructor is deemed a "safe constructor" for purposes of +unpickling as described above. + + +Pickling and unpickling external objects +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +For the benefit of object persistence, the :mod:`pickle` module supports the +notion of a reference to an object outside the pickled data stream. Such +objects are referenced by a "persistent id", which is just an arbitrary string +of printable ASCII characters. The resolution of such names is not defined by +the :mod:`pickle` module; it will delegate this resolution to user defined +functions on the pickler and unpickler. [#]_ + +To define external persistent id resolution, you need to set the +:attr:`persistent_id` attribute of the pickler object and the +:attr:`persistent_load` attribute of the unpickler object. + +To pickle objects that have an external persistent id, the pickler must have a +custom :func:`persistent_id` method that takes an object as an argument and +returns either ``None`` or the persistent id for that object. When ``None`` is +returned, the pickler simply pickles the object as normal. When a persistent id +string is returned, the pickler will pickle that string, along with a marker so +that the unpickler will recognize the string as a persistent id. + +To unpickle external objects, the unpickler must have a custom +:func:`persistent_load` function that takes a persistent id string and returns +the referenced object. + +Here's a silly example that *might* shed more light:: + + import pickle + from cStringIO import StringIO + + src = StringIO() + p = pickle.Pickler(src) + + def persistent_id(obj): + if hasattr(obj, 'x'): + return 'the value %d' % obj.x + else: + return None + + p.persistent_id = persistent_id + + class Integer: + def __init__(self, x): + self.x = x + def __str__(self): + return 'My name is integer %d' % self.x + + i = Integer(7) + print i + p.dump(i) + + datastream = src.getvalue() + print repr(datastream) + dst = StringIO(datastream) + + up = pickle.Unpickler(dst) + + class FancyInteger(Integer): + def __str__(self): + return 'I am the integer %d' % self.x + + def persistent_load(persid): + if persid.startswith('the value '): + value = int(persid.split()[2]) + return FancyInteger(value) + else: + raise pickle.UnpicklingError, 'Invalid persistent id' + + up.persistent_load = persistent_load + + j = up.load() + print j + +In the :mod:`cPickle` module, the unpickler's :attr:`persistent_load` attribute +can also be set to a Python list, in which case, when the unpickler reaches a +persistent id, the persistent id string will simply be appended to this list. +This functionality exists so that a pickle data stream can be "sniffed" for +object references without actually instantiating all the objects in a pickle. +[#]_ Setting :attr:`persistent_load` to a list is usually used in conjunction +with the :meth:`noload` method on the Unpickler. + +.. % BAW: Both pickle and cPickle support something called +.. % inst_persistent_id() which appears to give unknown types a second +.. % shot at producing a persistent id. Since Jim Fulton can't remember +.. % why it was added or what it's for, I'm leaving it undocumented. + + +.. _pickle-sub: + +Subclassing Unpicklers +---------------------- + +By default, unpickling will import any class that it finds in the pickle data. +You can control exactly what gets unpickled and what gets called by customizing +your unpickler. Unfortunately, exactly how you do this is different depending +on whether you're using :mod:`pickle` or :mod:`cPickle`. [#]_ + +In the :mod:`pickle` module, you need to derive a subclass from +:class:`Unpickler`, overriding the :meth:`load_global` method. +:meth:`load_global` should read two lines from the pickle data stream where the +first line will the name of the module containing the class and the second line +will be the name of the instance's class. It then looks up the class, possibly +importing the module and digging out the attribute, then it appends what it +finds to the unpickler's stack. Later on, this class will be assigned to the +:attr:`__class__` attribute of an empty class, as a way of magically creating an +instance without calling its class's :meth:`__init__`. Your job (should you +choose to accept it), would be to have :meth:`load_global` push onto the +unpickler's stack, a known safe version of any class you deem safe to unpickle. +It is up to you to produce such a class. Or you could raise an error if you +want to disallow all unpickling of instances. If this sounds like a hack, +you're right. Refer to the source code to make this work. + +Things are a little cleaner with :mod:`cPickle`, but not by much. To control +what gets unpickled, you can set the unpickler's :attr:`find_global` attribute +to a function or ``None``. If it is ``None`` then any attempts to unpickle +instances will raise an :exc:`UnpicklingError`. If it is a function, then it +should accept a module name and a class name, and return the corresponding class +object. It is responsible for looking up the class and performing any necessary +imports, and it may raise an error to prevent instances of the class from being +unpickled. + +The moral of the story is that you should be really careful about the source of +the strings your application unpickles. + + +.. _pickle-example: + +Example +------- + +For the simplest code, use the :func:`dump` and :func:`load` functions. Note +that a self-referencing list is pickled and restored correctly. :: + + import pickle + + data1 = {'a': [1, 2.0, 3, 4+6j], + 'b': ('string', u'Unicode string'), + 'c': None} + + selfref_list = [1, 2, 3] + selfref_list.append(selfref_list) + + output = open('data.pkl', 'wb') + + # Pickle dictionary using protocol 0. + pickle.dump(data1, output) + + # Pickle the list using the highest protocol available. + pickle.dump(selfref_list, output, -1) + + output.close() + +The following example reads the resulting pickled data. When reading a +pickle-containing file, you should open the file in binary mode because you +can't be sure if the ASCII or binary format was used. :: + + import pprint, pickle + + pkl_file = open('data.pkl', 'rb') + + data1 = pickle.load(pkl_file) + pprint.pprint(data1) + + data2 = pickle.load(pkl_file) + pprint.pprint(data2) + + pkl_file.close() + +Here's a larger example that shows how to modify pickling behavior for a class. +The :class:`TextReader` class opens a text file, and returns the line number and +line contents each time its :meth:`readline` method is called. If a +:class:`TextReader` instance is pickled, all attributes *except* the file object +member are saved. When the instance is unpickled, the file is reopened, and +reading resumes from the last location. The :meth:`__setstate__` and +:meth:`__getstate__` methods are used to implement this behavior. :: + + #!/usr/local/bin/python + + class TextReader: + """Print and number lines in a text file.""" + def __init__(self, file): + self.file = file + self.fh = open(file) + self.lineno = 0 + + def readline(self): + self.lineno = self.lineno + 1 + line = self.fh.readline() + if not line: + return None + if line.endswith("\n"): + line = line[:-1] + return "%d: %s" % (self.lineno, line) + + def __getstate__(self): + odict = self.__dict__.copy() # copy the dict since we change it + del odict['fh'] # remove filehandle entry + return odict + + def __setstate__(self, dict): + fh = open(dict['file']) # reopen file + count = dict['lineno'] # read from file... + while count: # until line count is restored + fh.readline() + count = count - 1 + self.__dict__.update(dict) # update attributes + self.fh = fh # save the file object + +A sample usage might be something like this:: + + >>> import TextReader + >>> obj = TextReader.TextReader("TextReader.py") + >>> obj.readline() + '1: #!/usr/local/bin/python' + >>> obj.readline() + '2: ' + >>> obj.readline() + '3: class TextReader:' + >>> import pickle + >>> pickle.dump(obj, open('save.p', 'wb')) + +If you want to see that :mod:`pickle` works across Python processes, start +another Python session, before continuing. What follows can happen from either +the same process or a new process. :: + + >>> import pickle + >>> reader = pickle.load(open('save.p', 'rb')) + >>> reader.readline() + '4: """Print and number lines in a text file."""' + + +.. seealso:: + + Module :mod:`copy_reg` + Pickle interface constructor registration for extension types. + + Module :mod:`shelve` + Indexed databases of objects; uses :mod:`pickle`. + + Module :mod:`copy` + Shallow and deep object copying. + + Module :mod:`marshal` + High-performance serialization of built-in types. + + +:mod:`cPickle` --- A faster :mod:`pickle` +========================================= + +.. module:: cPickle + :synopsis: Faster version of pickle, but not subclassable. +.. moduleauthor:: Jim Fulton <jim@zope.com> +.. sectionauthor:: Fred L. Drake, Jr. <fdrake@acm.org> + + +.. index:: module: pickle + +The :mod:`cPickle` module supports serialization and de-serialization of Python +objects, providing an interface and functionality nearly identical to the +:mod:`pickle` module. There are several differences, the most important being +performance and subclassability. + +First, :mod:`cPickle` can be up to 1000 times faster than :mod:`pickle` because +the former is implemented in C. Second, in the :mod:`cPickle` module the +callables :func:`Pickler` and :func:`Unpickler` are functions, not classes. +This means that you cannot use them to derive custom pickling and unpickling +subclasses. Most applications have no need for this functionality and should +benefit from the greatly improved performance of the :mod:`cPickle` module. + +The pickle data stream produced by :mod:`pickle` and :mod:`cPickle` are +identical, so it is possible to use :mod:`pickle` and :mod:`cPickle` +interchangeably with existing pickles. [#]_ + +There are additional minor differences in API between :mod:`cPickle` and +:mod:`pickle`, however for most applications, they are interchangeable. More +documentation is provided in the :mod:`pickle` module documentation, which +includes a list of the documented differences. + +.. rubric:: Footnotes + +.. [#] Don't confuse this with the :mod:`marshal` module + +.. [#] In the :mod:`pickle` module these callables are classes, which you could + subclass to customize the behavior. However, in the :mod:`cPickle` module these + callables are factory functions and so cannot be subclassed. One common reason + to subclass is to control what objects can actually be unpickled. See section + :ref:`pickle-sub` for more details. + +.. [#] *Warning*: this is intended for pickling multiple objects without intervening + modifications to the objects or their parts. If you modify an object and then + pickle it again using the same :class:`Pickler` instance, the object is not + pickled again --- a reference to it is pickled and the :class:`Unpickler` will + return the old value, not the modified one. There are two problems here: (1) + detecting changes, and (2) marshalling a minimal set of changes. Garbage + Collection may also become a problem here. + +.. [#] The exception raised will likely be an :exc:`ImportError` or an + :exc:`AttributeError` but it could be something else. + +.. [#] These methods can also be used to implement copying class instances. + +.. [#] This protocol is also used by the shallow and deep copying operations defined in + the :mod:`copy` module. + +.. [#] The actual mechanism for associating these user defined functions is slightly + different for :mod:`pickle` and :mod:`cPickle`. The description given here + works the same for both implementations. Users of the :mod:`pickle` module + could also use subclassing to effect the same results, overriding the + :meth:`persistent_id` and :meth:`persistent_load` methods in the derived + classes. + +.. [#] We'll leave you with the image of Guido and Jim sitting around sniffing pickles + in their living rooms. + +.. [#] A word of caution: the mechanisms described here use internal attributes and + methods, which are subject to change in future versions of Python. We intend to + someday provide a common interface for controlling this behavior, which will + work in either :mod:`pickle` or :mod:`cPickle`. + +.. [#] Since the pickle data format is actually a tiny stack-oriented programming + language, and some freedom is taken in the encodings of certain objects, it is + possible that the two modules produce different data streams for the same input + objects. However it is guaranteed that they will always be able to read each + other's data streams. + |