diff options
author | Alexandre Vassalotti <alexandre@peadrop.com> | 2008-10-18 19:25:07 (GMT) |
---|---|---|
committer | Alexandre Vassalotti <alexandre@peadrop.com> | 2008-10-18 19:25:07 (GMT) |
commit | 758bca6e36167075fb41a2fc671665506ef0fe0e (patch) | |
tree | be29583e498d752b3422a97b13ac1d2245c658e0 /Doc | |
parent | 87eee631fb1ae38aa15ebd9741a1af82dd7b4ea0 (diff) | |
download | cpython-758bca6e36167075fb41a2fc671665506ef0fe0e.zip cpython-758bca6e36167075fb41a2fc671665506ef0fe0e.tar.gz cpython-758bca6e36167075fb41a2fc671665506ef0fe0e.tar.bz2 |
Improve pickle's documentation.
There is still much to be done, but I am committing my changes
incrementally to avoid losing them again (for a third time now).
Diffstat (limited to 'Doc')
-rw-r--r-- | Doc/library/pickle.rst | 240 |
1 files changed, 144 insertions, 96 deletions
diff --git a/Doc/library/pickle.rst b/Doc/library/pickle.rst index 2e6ea48..4aab5f5 100644 --- a/Doc/library/pickle.rst +++ b/Doc/library/pickle.rst @@ -92,11 +92,9 @@ advantage that there are no restrictions imposed by external standards such as XDR (which can't represent pointer sharing); however it means that non-Python programs may not be able to reconstruct pickled Python objects. -By default, the :mod:`pickle` data format uses a printable ASCII representation. -This is slightly more voluminous than a binary representation. The big -advantage of using printable ASCII (and of some other characteristics of -:mod:`pickle`'s representation) is that for debugging or recovery purposes it is -possible for a human to read the pickled file with a standard text editor. +By default, the :mod:`pickle` data format uses a compact binary representation. +The module :mod:`pickletools` contains tools for analyzing data streams +generated by :mod:`pickle`. There are currently 4 different protocols which can be used for pickling. @@ -110,17 +108,15 @@ There are currently 4 different protocols which can be used for pickling. efficient pickling of :term:`new-style class`\es. * Protocol version 3 was added in Python 3.0. It has explicit support for - bytes and cannot be unpickled by Python 2.x pickle modules. + bytes and cannot be unpickled by Python 2.x pickle modules. This is + the current recommended protocol, use it whenever it is possible. Refer to :pep:`307` for more information. -If a *protocol* is not specified, protocol 3 is used. If *protocol* is +If a *protocol* is not specified, protocol 3 is used. If *protocol* is specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest protocol version available will be used. -A binary format, which is slightly more efficient, can be chosen by specifying a -*protocol* version >= 1. - Usage ----- @@ -146,152 +142,210 @@ an unpickler, then you call the unpickler's :meth:`load` method. The as line terminators and therefore will look "funny" when viewed in Notepad or other editors which do not support this format. +.. data:: DEFAULT_PROTOCOL + + The default protocol used for pickling. May be less than HIGHEST_PROTOCOL. + Currently the default protocol is 3; a backward-incompatible protocol + designed for Python 3.0. + + The :mod:`pickle` module provides the following functions to make the pickling process more convenient: - .. function:: dump(obj, file[, protocol]) - Write a pickled representation of *obj* to the open file object *file*. This is - equivalent to ``Pickler(file, protocol).dump(obj)``. + Write a pickled representation of *obj* to the open file object *file*. This + is equivalent to ``Pickler(file, protocol).dump(obj)``. - If the *protocol* parameter is omitted, protocol 3 is used. If *protocol* is - specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest - protocol version will be used. + The optional *protocol* argument tells the pickler to use the given protocol; + supported protocols are 0, 1, 2, 3. The default protocol is 3; a + backward-incompatible protocol designed for Python 3.0. - *file* must have a :meth:`write` method that accepts a single string argument. - It can thus be a file object opened for writing, a :mod:`StringIO` object, or - any other custom object that meets this interface. + Specifying a negative protocol version selects the highest protocol version + supported. The higher the protocol used, the more recent the version of + Python needed to read the pickle produced. + The *file* argument must have a write() method that accepts a single bytes + argument. It can thus be a file object opened for binary writing, a + io.BytesIO instance, or any other custom object that meets this interface. -.. function:: load(file) +.. function:: dumps(obj[, protocol]) - Read a string from the open file object *file* and interpret it as a pickle data - stream, reconstructing and returning the original object hierarchy. This is - equivalent to ``Unpickler(file).load()``. + Return the pickled representation of the object as a :class:`bytes` + object, instead of writing it to a file. - *file* must have two methods, a :meth:`read` method that takes an integer - argument, and a :meth:`readline` method that requires no arguments. Both - methods should return a string. Thus *file* can be a file object opened for - reading, a :mod:`StringIO` object, or any other custom object that meets this - interface. + The optional *protocol* argument tells the pickler to use the given protocol; + supported protocols are 0, 1, 2, 3. The default protocol is 3; a + backward-incompatible protocol designed for Python 3.0. - This function automatically determines whether the data stream was written in - binary mode or not. + Specifying a negative protocol version selects the highest protocol version + supported. The higher the protocol used, the more recent the version of + Python needed to read the pickle produced. +.. function:: load(file, [\*, encoding="ASCII", errors="strict"]) -.. function:: dumps(obj[, protocol]) + Read a pickled object representation from the open file object *file* and + return the reconstituted object hierarchy specified therein. This is + equivalent to ``Unpickler(file).load()``. - Return the pickled representation of the object as a :class:`bytes` - object, instead of writing it to a file. + The protocol version of the pickle is detected automatically, so no protocol + argument is needed. Bytes past the pickled object's representation are + ignored. - If the *protocol* parameter is omitted, protocol 3 is used. If *protocol* - is specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest - protocol version will be used. + The argument *file* must have two methods, a read() method that takes an + integer argument, and a readline() method that requires no arguments. Both + methods should return bytes. Thus *file* can be a binary file object opened + for reading, a BytesIO object, or any other custom object that meets this + interface. + Optional keyword arguments are encoding and errors, which are used to decode + 8-bit string instances pickled by Python 2.x. These default to 'ASCII' and + 'strict', respectively. -.. function:: loads(bytes_object) +.. function:: loads(bytes_object, [\*, encoding="ASCII", errors="strict"]) - Read a pickled object hierarchy from a :class:`bytes` object. - Bytes past the pickled object's representation are ignored. + Read a pickled object hierarchy from a :class:`bytes` object and return the + reconstituted object hierarchy specified therein -The :mod:`pickle` module also defines three exceptions: + The protocol version of the pickle is detected automatically, so no protocol + argument is needed. Bytes past the pickled object's representation are + ignored. + Optional keyword arguments are encoding and errors, which are used to decode + 8-bit string instances pickled by Python 2.x. These default to 'ASCII' and + 'strict', respectively. + + +The :mod:`pickle` module defines three exceptions: .. exception:: PickleError - A common base class for the other exceptions defined below. This inherits from + Common base class for the other pickling exceptions. It inherits :exc:`Exception`. - .. exception:: PicklingError - This exception is raised when an unpicklable object is passed to the - :meth:`dump` method. - + Error raised when an unpicklable object is encountered by :class:`Pickler`. + It inherits :exc:`PickleError`. .. exception:: UnpicklingError - This exception is raised when there is a problem unpickling an object. Note that - other exceptions may also be raised during unpickling, including (but not - necessarily limited to) :exc:`AttributeError`, :exc:`EOFError`, - :exc:`ImportError`, and :exc:`IndexError`. + Error raised when there a problem unpickling an object, such as a data + corruption or a security violation. It inherits :exc:`PickleError`. -The :mod:`pickle` module also exports two callables, :class:`Pickler` and -:class:`Unpickler`: + Note that other exceptions may also be raised during unpickling, including + (but not necessarily limited to) AttributeError, EOFError, ImportError, and + IndexError. -.. class:: Pickler(file[, protocol]) +The :mod:`pickle` module exports two classes, :class:`Pickler` and +:class:`Unpickler`: - This takes a file-like object to which it will write a pickle data stream. +.. class:: Pickler(file[, protocol]) - If the *protocol* parameter is omitted, protocol 3 is used. If *protocol* is - specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest - protocol version will be used. + This takes a binary file for writing a pickle data stream. - *file* must have a :meth:`write` method that accepts a single string argument. - It can thus be an open file object, a :mod:`StringIO` object, or any other - custom object that meets this interface. + The optional *protocol* argument tells the pickler to use the given protocol; + supported protocols are 0, 1, 2, 3. The default protocol is 3; a + backward-incompatible protocol designed for Python 3.0. - :class:`Pickler` objects define one (or two) public methods: + Specifying a negative protocol version selects the highest protocol version + supported. The higher the protocol used, the more recent the version of + Python needed to read the pickle produced. + The *file* argument must have a write() method that accepts a single bytes + argument. It can thus be a file object opened for binary writing, a + io.BytesIO instance, or any other custom object that meets this interface. .. method:: dump(obj) - Write a pickled representation of *obj* to the open file object given in the - constructor. Either the binary or ASCII format will be used, depending on the - value of the *protocol* argument passed to the constructor. + Write a pickled representation of *obj* to the open file object given in + the constructor. + + .. method:: persistent_id(obj) + Do nothing by default. This exists so a subclass can override it. + + If :meth:`persistent_id` returns ``None``, *obj* is pickled as usual. Any + other value causes :class:`Pickler` to emit the returned value as a + persistent ID for *obj*. The meaning of this persistent ID should be + defined by :meth:`Unpickler.persistent_load`. Note that the value + returned by :meth:`persistent_id` cannot itself have a persistent ID. + + See :ref:`pickle-persistent` for details and examples of uses. .. method:: clear_memo() - Clears the pickler's "memo". The memo is the data structure that remembers - which objects the pickler has already seen, so that shared or recursive objects - pickled by reference and not by value. This method is useful when re-using - picklers. + Deprecated. Use the :meth:`clear` method on the :attr:`memo`. Clear the + pickler's memo, useful when reusing picklers. + + .. attribute:: fast + + Enable fast mode if set to a true value. The fast mode disables the usage + of memo, therefore speeding the pickling process by not generating + superfluous PUT opcodes. It should not be used with self-referential + objects, doing otherwise will cause :class:`Pickler` to recurse + infinitely. + + Use :func:`pickletools.optimize` if you need more compact pickles. + + .. attribute:: memo + + Dictionary holding previously pickled objects to allow shared or + recursive objects to pickled by reference as opposed to by value. It is possible to make multiple calls to the :meth:`dump` method of the same :class:`Pickler` instance. These must then be matched to the same number of calls to the :meth:`load` method of the corresponding :class:`Unpickler` instance. If the same object is pickled by multiple :meth:`dump` calls, the -:meth:`load` will all yield references to the same object. [#]_ +:meth:`load` will all yield references to the same object. -:class:`Unpickler` objects are defined as: +Please note, this is intended for pickling multiple objects without intervening +modifications to the objects or their parts. If you modify an object and then +pickle it again using the same :class:`Pickler` instance, the object is not +pickled again --- a reference to it is pickled and the :class:`Unpickler` will +return the old value, not the modified one. -.. class:: Unpickler(file) +.. class:: Unpickler(file, [\*, encoding="ASCII", errors="strict"]) - This takes a file-like object from which it will read a pickle data stream. - This class automatically determines whether the data stream was written in - binary mode or not, so it does not need a flag as in the :class:`Pickler` - factory. + This takes a binary file for reading a pickle data stream. - *file* must have two methods, a :meth:`read` method that takes an integer - argument, and a :meth:`readline` method that requires no arguments. Both - methods should return a string. Thus *file* can be a file object opened for - reading, a :mod:`StringIO` object, or any other custom object that meets this - interface. + The protocol version of the pickle is detected automatically, so no + protocol argument is needed. - :class:`Unpickler` objects have one (or two) public methods: + The argument *file* must have two methods, a read() method that takes an + integer argument, and a readline() method that requires no arguments. Both + methods should return bytes. Thus *file* can be a binary file object opened + for reading, a BytesIO object, or any other custom object that meets this + interface. + Optional keyword arguments are encoding and errors, which are used to decode + 8-bit string instances pickled by Python 2.x. These default to 'ASCII' and + 'strict', respectively. .. method:: load() Read a pickled object representation from the open file object given in the constructor, and return the reconstituted object hierarchy specified - therein. + therein. Bytes past the pickled object's representation are ignored. - This method automatically determines whether the data stream was written - in binary mode or not. + .. method:: persistent_load(pid) + Raise an :exc:`UnpickingError` by default. - .. method:: noload() + If defined, :meth:`persistent_load` should return the object specified by + the persistent ID *pid*. On errors, such as if an invalid persistent ID is + encountered, an :exc:`UnpickingError` should be raised. - This is just like :meth:`load` except that it doesn't actually create any - objects. This is useful primarily for finding what's called "persistent - ids" that may be referenced in a pickle data stream. See section - :ref:`pickle-protocol` below for more details. + See :ref:`pickle-persistent` for details and examples of uses. + + .. method:: find_class(module, name) + + Import *module* if necessary and return the object called *name* from it. + Subclasses may override this to gain control over what type of objects can + be loaded, potentially reducing security risks. What can be pickled and unpickled? @@ -506,6 +560,8 @@ The registered constructor is deemed a "safe constructor" for purposes of unpickling as described above. +.. _pickle-persistent: + Pickling and unpickling external objects ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -747,14 +803,6 @@ the same process or a new process. :: .. [#] Don't confuse this with the :mod:`marshal` module -.. [#] *Warning*: this is intended for pickling multiple objects without intervening - modifications to the objects or their parts. If you modify an object and then - pickle it again using the same :class:`Pickler` instance, the object is not - pickled again --- a reference to it is pickled and the :class:`Unpickler` will - return the old value, not the modified one. There are two problems here: (1) - detecting changes, and (2) marshalling a minimal set of changes. Garbage - Collection may also become a problem here. - .. [#] The exception raised will likely be an :exc:`ImportError` or an :exc:`AttributeError` but it could be something else. |