summaryrefslogtreecommitdiffstats
path: root/Doc
diff options
context:
space:
mode:
authorAlexandre Vassalotti <alexandre@peadrop.com>2008-10-18 19:25:07 (GMT)
committerAlexandre Vassalotti <alexandre@peadrop.com>2008-10-18 19:25:07 (GMT)
commit758bca6e36167075fb41a2fc671665506ef0fe0e (patch)
treebe29583e498d752b3422a97b13ac1d2245c658e0 /Doc
parent87eee631fb1ae38aa15ebd9741a1af82dd7b4ea0 (diff)
downloadcpython-758bca6e36167075fb41a2fc671665506ef0fe0e.zip
cpython-758bca6e36167075fb41a2fc671665506ef0fe0e.tar.gz
cpython-758bca6e36167075fb41a2fc671665506ef0fe0e.tar.bz2
Improve pickle's documentation.
There is still much to be done, but I am committing my changes incrementally to avoid losing them again (for a third time now).
Diffstat (limited to 'Doc')
-rw-r--r--Doc/library/pickle.rst240
1 files changed, 144 insertions, 96 deletions
diff --git a/Doc/library/pickle.rst b/Doc/library/pickle.rst
index 2e6ea48..4aab5f5 100644
--- a/Doc/library/pickle.rst
+++ b/Doc/library/pickle.rst
@@ -92,11 +92,9 @@ advantage that there are no restrictions imposed by external standards such as
XDR (which can't represent pointer sharing); however it means that non-Python
programs may not be able to reconstruct pickled Python objects.
-By default, the :mod:`pickle` data format uses a printable ASCII representation.
-This is slightly more voluminous than a binary representation. The big
-advantage of using printable ASCII (and of some other characteristics of
-:mod:`pickle`'s representation) is that for debugging or recovery purposes it is
-possible for a human to read the pickled file with a standard text editor.
+By default, the :mod:`pickle` data format uses a compact binary representation.
+The module :mod:`pickletools` contains tools for analyzing data streams
+generated by :mod:`pickle`.
There are currently 4 different protocols which can be used for pickling.
@@ -110,17 +108,15 @@ There are currently 4 different protocols which can be used for pickling.
efficient pickling of :term:`new-style class`\es.
* Protocol version 3 was added in Python 3.0. It has explicit support for
- bytes and cannot be unpickled by Python 2.x pickle modules.
+ bytes and cannot be unpickled by Python 2.x pickle modules. This is
+ the current recommended protocol, use it whenever it is possible.
Refer to :pep:`307` for more information.
-If a *protocol* is not specified, protocol 3 is used. If *protocol* is
+If a *protocol* is not specified, protocol 3 is used. If *protocol* is
specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest
protocol version available will be used.
-A binary format, which is slightly more efficient, can be chosen by specifying a
-*protocol* version >= 1.
-
Usage
-----
@@ -146,152 +142,210 @@ an unpickler, then you call the unpickler's :meth:`load` method. The
as line terminators and therefore will look "funny" when viewed in Notepad or
other editors which do not support this format.
+.. data:: DEFAULT_PROTOCOL
+
+ The default protocol used for pickling. May be less than HIGHEST_PROTOCOL.
+ Currently the default protocol is 3; a backward-incompatible protocol
+ designed for Python 3.0.
+
+
The :mod:`pickle` module provides the following functions to make the pickling
process more convenient:
-
.. function:: dump(obj, file[, protocol])
- Write a pickled representation of *obj* to the open file object *file*. This is
- equivalent to ``Pickler(file, protocol).dump(obj)``.
+ Write a pickled representation of *obj* to the open file object *file*. This
+ is equivalent to ``Pickler(file, protocol).dump(obj)``.
- If the *protocol* parameter is omitted, protocol 3 is used. If *protocol* is
- specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest
- protocol version will be used.
+ The optional *protocol* argument tells the pickler to use the given protocol;
+ supported protocols are 0, 1, 2, 3. The default protocol is 3; a
+ backward-incompatible protocol designed for Python 3.0.
- *file* must have a :meth:`write` method that accepts a single string argument.
- It can thus be a file object opened for writing, a :mod:`StringIO` object, or
- any other custom object that meets this interface.
+ Specifying a negative protocol version selects the highest protocol version
+ supported. The higher the protocol used, the more recent the version of
+ Python needed to read the pickle produced.
+ The *file* argument must have a write() method that accepts a single bytes
+ argument. It can thus be a file object opened for binary writing, a
+ io.BytesIO instance, or any other custom object that meets this interface.
-.. function:: load(file)
+.. function:: dumps(obj[, protocol])
- Read a string from the open file object *file* and interpret it as a pickle data
- stream, reconstructing and returning the original object hierarchy. This is
- equivalent to ``Unpickler(file).load()``.
+ Return the pickled representation of the object as a :class:`bytes`
+ object, instead of writing it to a file.
- *file* must have two methods, a :meth:`read` method that takes an integer
- argument, and a :meth:`readline` method that requires no arguments. Both
- methods should return a string. Thus *file* can be a file object opened for
- reading, a :mod:`StringIO` object, or any other custom object that meets this
- interface.
+ The optional *protocol* argument tells the pickler to use the given protocol;
+ supported protocols are 0, 1, 2, 3. The default protocol is 3; a
+ backward-incompatible protocol designed for Python 3.0.
- This function automatically determines whether the data stream was written in
- binary mode or not.
+ Specifying a negative protocol version selects the highest protocol version
+ supported. The higher the protocol used, the more recent the version of
+ Python needed to read the pickle produced.
+.. function:: load(file, [\*, encoding="ASCII", errors="strict"])
-.. function:: dumps(obj[, protocol])
+ Read a pickled object representation from the open file object *file* and
+ return the reconstituted object hierarchy specified therein. This is
+ equivalent to ``Unpickler(file).load()``.
- Return the pickled representation of the object as a :class:`bytes`
- object, instead of writing it to a file.
+ The protocol version of the pickle is detected automatically, so no protocol
+ argument is needed. Bytes past the pickled object's representation are
+ ignored.
- If the *protocol* parameter is omitted, protocol 3 is used. If *protocol*
- is specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest
- protocol version will be used.
+ The argument *file* must have two methods, a read() method that takes an
+ integer argument, and a readline() method that requires no arguments. Both
+ methods should return bytes. Thus *file* can be a binary file object opened
+ for reading, a BytesIO object, or any other custom object that meets this
+ interface.
+ Optional keyword arguments are encoding and errors, which are used to decode
+ 8-bit string instances pickled by Python 2.x. These default to 'ASCII' and
+ 'strict', respectively.
-.. function:: loads(bytes_object)
+.. function:: loads(bytes_object, [\*, encoding="ASCII", errors="strict"])
- Read a pickled object hierarchy from a :class:`bytes` object.
- Bytes past the pickled object's representation are ignored.
+ Read a pickled object hierarchy from a :class:`bytes` object and return the
+ reconstituted object hierarchy specified therein
-The :mod:`pickle` module also defines three exceptions:
+ The protocol version of the pickle is detected automatically, so no protocol
+ argument is needed. Bytes past the pickled object's representation are
+ ignored.
+ Optional keyword arguments are encoding and errors, which are used to decode
+ 8-bit string instances pickled by Python 2.x. These default to 'ASCII' and
+ 'strict', respectively.
+
+
+The :mod:`pickle` module defines three exceptions:
.. exception:: PickleError
- A common base class for the other exceptions defined below. This inherits from
+ Common base class for the other pickling exceptions. It inherits
:exc:`Exception`.
-
.. exception:: PicklingError
- This exception is raised when an unpicklable object is passed to the
- :meth:`dump` method.
-
+ Error raised when an unpicklable object is encountered by :class:`Pickler`.
+ It inherits :exc:`PickleError`.
.. exception:: UnpicklingError
- This exception is raised when there is a problem unpickling an object. Note that
- other exceptions may also be raised during unpickling, including (but not
- necessarily limited to) :exc:`AttributeError`, :exc:`EOFError`,
- :exc:`ImportError`, and :exc:`IndexError`.
+ Error raised when there a problem unpickling an object, such as a data
+ corruption or a security violation. It inherits :exc:`PickleError`.
-The :mod:`pickle` module also exports two callables, :class:`Pickler` and
-:class:`Unpickler`:
+ Note that other exceptions may also be raised during unpickling, including
+ (but not necessarily limited to) AttributeError, EOFError, ImportError, and
+ IndexError.
-.. class:: Pickler(file[, protocol])
+The :mod:`pickle` module exports two classes, :class:`Pickler` and
+:class:`Unpickler`:
- This takes a file-like object to which it will write a pickle data stream.
+.. class:: Pickler(file[, protocol])
- If the *protocol* parameter is omitted, protocol 3 is used. If *protocol* is
- specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest
- protocol version will be used.
+ This takes a binary file for writing a pickle data stream.
- *file* must have a :meth:`write` method that accepts a single string argument.
- It can thus be an open file object, a :mod:`StringIO` object, or any other
- custom object that meets this interface.
+ The optional *protocol* argument tells the pickler to use the given protocol;
+ supported protocols are 0, 1, 2, 3. The default protocol is 3; a
+ backward-incompatible protocol designed for Python 3.0.
- :class:`Pickler` objects define one (or two) public methods:
+ Specifying a negative protocol version selects the highest protocol version
+ supported. The higher the protocol used, the more recent the version of
+ Python needed to read the pickle produced.
+ The *file* argument must have a write() method that accepts a single bytes
+ argument. It can thus be a file object opened for binary writing, a
+ io.BytesIO instance, or any other custom object that meets this interface.
.. method:: dump(obj)
- Write a pickled representation of *obj* to the open file object given in the
- constructor. Either the binary or ASCII format will be used, depending on the
- value of the *protocol* argument passed to the constructor.
+ Write a pickled representation of *obj* to the open file object given in
+ the constructor.
+
+ .. method:: persistent_id(obj)
+ Do nothing by default. This exists so a subclass can override it.
+
+ If :meth:`persistent_id` returns ``None``, *obj* is pickled as usual. Any
+ other value causes :class:`Pickler` to emit the returned value as a
+ persistent ID for *obj*. The meaning of this persistent ID should be
+ defined by :meth:`Unpickler.persistent_load`. Note that the value
+ returned by :meth:`persistent_id` cannot itself have a persistent ID.
+
+ See :ref:`pickle-persistent` for details and examples of uses.
.. method:: clear_memo()
- Clears the pickler's "memo". The memo is the data structure that remembers
- which objects the pickler has already seen, so that shared or recursive objects
- pickled by reference and not by value. This method is useful when re-using
- picklers.
+ Deprecated. Use the :meth:`clear` method on the :attr:`memo`. Clear the
+ pickler's memo, useful when reusing picklers.
+
+ .. attribute:: fast
+
+ Enable fast mode if set to a true value. The fast mode disables the usage
+ of memo, therefore speeding the pickling process by not generating
+ superfluous PUT opcodes. It should not be used with self-referential
+ objects, doing otherwise will cause :class:`Pickler` to recurse
+ infinitely.
+
+ Use :func:`pickletools.optimize` if you need more compact pickles.
+
+ .. attribute:: memo
+
+ Dictionary holding previously pickled objects to allow shared or
+ recursive objects to pickled by reference as opposed to by value.
It is possible to make multiple calls to the :meth:`dump` method of the same
:class:`Pickler` instance. These must then be matched to the same number of
calls to the :meth:`load` method of the corresponding :class:`Unpickler`
instance. If the same object is pickled by multiple :meth:`dump` calls, the
-:meth:`load` will all yield references to the same object. [#]_
+:meth:`load` will all yield references to the same object.
-:class:`Unpickler` objects are defined as:
+Please note, this is intended for pickling multiple objects without intervening
+modifications to the objects or their parts. If you modify an object and then
+pickle it again using the same :class:`Pickler` instance, the object is not
+pickled again --- a reference to it is pickled and the :class:`Unpickler` will
+return the old value, not the modified one.
-.. class:: Unpickler(file)
+.. class:: Unpickler(file, [\*, encoding="ASCII", errors="strict"])
- This takes a file-like object from which it will read a pickle data stream.
- This class automatically determines whether the data stream was written in
- binary mode or not, so it does not need a flag as in the :class:`Pickler`
- factory.
+ This takes a binary file for reading a pickle data stream.
- *file* must have two methods, a :meth:`read` method that takes an integer
- argument, and a :meth:`readline` method that requires no arguments. Both
- methods should return a string. Thus *file* can be a file object opened for
- reading, a :mod:`StringIO` object, or any other custom object that meets this
- interface.
+ The protocol version of the pickle is detected automatically, so no
+ protocol argument is needed.
- :class:`Unpickler` objects have one (or two) public methods:
+ The argument *file* must have two methods, a read() method that takes an
+ integer argument, and a readline() method that requires no arguments. Both
+ methods should return bytes. Thus *file* can be a binary file object opened
+ for reading, a BytesIO object, or any other custom object that meets this
+ interface.
+ Optional keyword arguments are encoding and errors, which are used to decode
+ 8-bit string instances pickled by Python 2.x. These default to 'ASCII' and
+ 'strict', respectively.
.. method:: load()
Read a pickled object representation from the open file object given in
the constructor, and return the reconstituted object hierarchy specified
- therein.
+ therein. Bytes past the pickled object's representation are ignored.
- This method automatically determines whether the data stream was written
- in binary mode or not.
+ .. method:: persistent_load(pid)
+ Raise an :exc:`UnpickingError` by default.
- .. method:: noload()
+ If defined, :meth:`persistent_load` should return the object specified by
+ the persistent ID *pid*. On errors, such as if an invalid persistent ID is
+ encountered, an :exc:`UnpickingError` should be raised.
- This is just like :meth:`load` except that it doesn't actually create any
- objects. This is useful primarily for finding what's called "persistent
- ids" that may be referenced in a pickle data stream. See section
- :ref:`pickle-protocol` below for more details.
+ See :ref:`pickle-persistent` for details and examples of uses.
+
+ .. method:: find_class(module, name)
+
+ Import *module* if necessary and return the object called *name* from it.
+ Subclasses may override this to gain control over what type of objects can
+ be loaded, potentially reducing security risks.
What can be pickled and unpickled?
@@ -506,6 +560,8 @@ The registered constructor is deemed a "safe constructor" for purposes of
unpickling as described above.
+.. _pickle-persistent:
+
Pickling and unpickling external objects
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -747,14 +803,6 @@ the same process or a new process. ::
.. [#] Don't confuse this with the :mod:`marshal` module
-.. [#] *Warning*: this is intended for pickling multiple objects without intervening
- modifications to the objects or their parts. If you modify an object and then
- pickle it again using the same :class:`Pickler` instance, the object is not
- pickled again --- a reference to it is pickled and the :class:`Unpickler` will
- return the old value, not the modified one. There are two problems here: (1)
- detecting changes, and (2) marshalling a minimal set of changes. Garbage
- Collection may also become a problem here.
-
.. [#] The exception raised will likely be an :exc:`ImportError` or an
:exc:`AttributeError` but it could be something else.