diff options
author | Alexandre Vassalotti <alexandre@peadrop.com> | 2008-10-29 23:32:33 (GMT) |
---|---|---|
committer | Alexandre Vassalotti <alexandre@peadrop.com> | 2008-10-29 23:32:33 (GMT) |
commit | 73b90a8d61898ccde2c083a6e51af6624ec52fc3 (patch) | |
tree | abe4636e2ca647ae1a35763b585fe617c28243f1 | |
parent | 64106fbdaf7d82abe8e829e32614bafd76eea9b9 (diff) | |
download | cpython-73b90a8d61898ccde2c083a6e51af6624ec52fc3.zip cpython-73b90a8d61898ccde2c083a6e51af6624ec52fc3.tar.gz cpython-73b90a8d61898ccde2c083a6e51af6624ec52fc3.tar.bz2 |
Improve pickle's documentation.
Deprecate the previously undocumented Pickler.fast attribute.
Revamp the "Pickling Class Instances" section.
Reorganize sections and subsections.
Clean up TextReader example.
-rw-r--r-- | Doc/library/pickle.rst | 391 |
1 files changed, 182 insertions, 209 deletions
diff --git a/Doc/library/pickle.rst b/Doc/library/pickle.rst index 027a014..b54de90 100644 --- a/Doc/library/pickle.rst +++ b/Doc/library/pickle.rst @@ -115,10 +115,6 @@ Refer to :pep:`307` for information about improvements brought by protocol 2. See :mod:`pickletools`'s source code for extensive comments about opcodes used by pickle protocols. -If a *protocol* is not specified, protocol 3 is used. If *protocol* is -specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest -protocol version available will be used. - Module Interface ---------------- @@ -286,11 +282,11 @@ The :mod:`pickle` module exports two classes, :class:`Pickler` and .. attribute:: fast - Enable fast mode if set to a true value. The fast mode disables the usage - of memo, therefore speeding the pickling process by not generating - superfluous PUT opcodes. It should not be used with self-referential - objects, doing otherwise will cause :class:`Pickler` to recurse - infinitely. + Deprecated. Enable fast mode if set to a true value. The fast mode + disables the usage of memo, therefore speeding the pickling process by not + generating superfluous PUT opcodes. It should not be used with + self-referential objects, doing otherwise will cause :class:`Pickler` to + recurse infinitely. Use :func:`pickletools.optimize` if you need more compact pickles. @@ -300,6 +296,8 @@ The :mod:`pickle` module exports two classes, :class:`Pickler` and recursive objects to pickled by reference as opposed to by value. +.. XXX Move these comments to somewhere more appropriate. + It is possible to make multiple calls to the :meth:`dump` method of the same :class:`Pickler` instance. These must then be matched to the same number of calls to the :meth:`load` method of the corresponding :class:`Unpickler` @@ -380,7 +378,7 @@ The following types can be pickled: * classes that are defined at the top level of a module * instances of such classes whose :attr:`__dict__` or :meth:`__setstate__` is - picklable (see section :ref:`pickle-protocol` for details) + picklable (see section :ref:`pickle-inst` for details) Attempts to pickle unpicklable objects will raise the :exc:`PicklingError` exception; when this happens, an unspecified number of bytes may have already @@ -418,164 +416,130 @@ be worthwhile to put a version number in the objects so that suitable conversions can be made by the class's :meth:`__setstate__` method. -.. _pickle-protocol: +.. _pickle-inst: -The pickle protocol -------------------- +Pickling Class Instances +------------------------ -This section describes the "pickling protocol" that defines the interface -between the pickler/unpickler and the objects that are being serialized. This -protocol provides a standard way for you to define, customize, and control how -your objects are serialized and de-serialized. The description in this section -doesn't cover specific customizations that you can employ to make the unpickling -environment slightly safer from untrusted pickle data streams; see section -:ref:`pickle-restrict` for more details. +In this section, we describe the general mechanisms available to you to define, +customize, and control how class instances are pickled and unpickled. +In most cases, no additional code is needed to make instances picklable. By +default, pickle will retrieve the class and the attributes of an instance via +introspection. When a class instance is unpickled, its :meth:`__init__` method +is usually *not* invoked. The default behaviour first creates an uninitialized +instance and then restores the saved attributes. The following code shows an +implementation of this behaviour:: -.. _pickle-inst: + def save(obj): + return (obj.__class__, obj.__dict__) -Pickling and unpickling normal class instances -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + def load(cls, attributes): + obj = cls.__new__(cls) + obj.__dict__.update(attributes) + return obj -.. index:: - single: __getinitargs__() (copy protocol) - single: __init__() (instance constructor) +.. index:: single: __getnewargs__() (copy protocol) -.. XXX is __getinitargs__ only used with old-style classes? -.. XXX update w.r.t Py3k's classes +Classes can alter the default behaviour by providing one or severals special +methods. In protocol 2 and newer, classes that implements the +:meth:`__getnewargs__` method can dictate the values passed to the +:meth:`__new__` method upon unpickling. This is often needed for classes +whose :meth:`__new__` method requires arguments. -When a pickled class instance is unpickled, its :meth:`__init__` method is -normally *not* invoked. If it is desirable that the :meth:`__init__` method be -called on unpickling, an old-style class can define a method -:meth:`__getinitargs__`, which should return a *tuple* containing the arguments -to be passed to the class constructor (:meth:`__init__` for example). The -:meth:`__getinitargs__` method is called at pickle time; the tuple it returns is -incorporated in the pickle for the instance. +.. index:: single: __getstate__() (copy protocol) -.. index:: single: __getnewargs__() (copy protocol) +Classes can further influence how their instances are pickled; if the class +defines the method :meth:`__getstate__`, it is called and the returned object is +pickled as the contents for the instance, instead of the contents of the +instance's dictionary. If the :meth:`__getstate__` method is absent, the +instance's :attr:`__dict__` is pickled as usual. -New-style types can provide a :meth:`__getnewargs__` method that is used for -protocol 2. Implementing this method is needed if the type establishes some -internal invariants when the instance is created, or if the memory allocation is -affected by the values passed to the :meth:`__new__` method for the type (as it -is for tuples and strings). Instances of a :term:`new-style class` :class:`C` -are created using :: +.. index:: single: __setstate__() (copy protocol) - obj = C.__new__(C, *args) +Upon unpickling, if the class defines :meth:`__setstate__`, it is called with +the unpickled state. In that case, there is no requirement for the state object +to be a dictionary. Otherwise, the pickled state must be a dictionary and its +items are assigned to the new instance's dictionary. +.. note:: -where *args* is the result of calling :meth:`__getnewargs__` on the original -object; if there is no :meth:`__getnewargs__`, an empty tuple is assumed. + If :meth:`__getstate__` returns a false value, the :meth:`__setstate__` + method will not be called. -.. index:: - single: __getstate__() (copy protocol) - single: __setstate__() (copy protocol) - single: __dict__ (instance attribute) +Refer to the section :ref:`pickle-state` for more information about how to use +the methods :meth:`__getstate__` and :meth:`__setstate__`. -Classes can further influence how their instances are pickled; if the class -defines the method :meth:`__getstate__`, it is called and the return state is -pickled as the contents for the instance, instead of the contents of the -instance's dictionary. If there is no :meth:`__getstate__` method, the -instance's :attr:`__dict__` is pickled. +.. index:: + pair: copy; protocol + single: __reduce__() (copy protocol) -Upon unpickling, if the class also defines the method :meth:`__setstate__`, it -is called with the unpickled state. [#]_ If there is no :meth:`__setstate__` -method, the pickled state must be a dictionary and its items are assigned to the -new instance's dictionary. If a class defines both :meth:`__getstate__` and -:meth:`__setstate__`, the state object needn't be a dictionary and these methods -can do what they want. [#]_ +As we shall see, pickle does not use directly the methods described above. In +fact, these methods are part of the copy protocol which implements the +:meth:`__reduce__` special method. The copy protocol provides a unified +interface for retrieving the data necessary for pickling and copying +objects. [#]_ -.. warning:: +Although powerful, implementing :meth:`__reduce__` directly in your classes is +error prone. For this reason, class designers should use the high-level +interface (i.e., :meth:`__getnewargs__`, :meth:`__getstate__` and +:meth:`__setstate__`) whenever possible. We will show however cases where using +:meth:`__reduce__` is the only option or leads to more efficient pickling or +both. - If :meth:`__getstate__` returns a false value, the :meth:`__setstate__` - method will not be called. +The interface is currently defined as follow. The :meth:`__reduce__` method +takes no argument and shall return either a string or preferably a tuple (the +returned object is often refered as the "reduce value"). +If a string is returned, the string should be interpreted as the name of a +global variable. It should be the object's local name relative to its module; +the pickle module searches the module namespace to determine the object's +module. This behaviour is typically useful for singletons. -Pickling and unpickling extension types -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +When a tuple is returned, it must be between two and five items long. Optional +items can either be omitted, or ``None`` can be provided as their value. The +semantics of each item are in order: -.. index:: - single: __reduce__() (pickle protocol) - single: __reduce_ex__() (pickle protocol) - single: __safe_for_unpickling__ (pickle protocol) - -When the :class:`Pickler` encounters an object of a type it knows nothing about ---- such as an extension type --- it looks in two places for a hint of how to -pickle it. One alternative is for the object to implement a :meth:`__reduce__` -method. If provided, at pickling time :meth:`__reduce__` will be called with no -arguments, and it must return either a string or a tuple. - -If a string is returned, it names a global variable whose contents are pickled -as normal. The string returned by :meth:`__reduce__` should be the object's -local name relative to its module; the pickle module searches the module -namespace to determine the object's module. - -When a tuple is returned, it must be between two and five elements long. -Optional elements can either be omitted, or ``None`` can be provided as their -value. The contents of this tuple are pickled as normal and used to -reconstruct the object at unpickling time. The semantics of each element are: +.. XXX Mention __newobj__ special-case? * A callable object that will be called to create the initial version of the - object. The next element of the tuple will provide arguments for this callable, - and later elements provide additional state information that will subsequently - be used to fully reconstruct the pickled data. + object. - In the unpickling environment this object must be either a class, a callable - registered as a "safe constructor" (see below), or it must have an attribute - :attr:`__safe_for_unpickling__` with a true value. Otherwise, an - :exc:`UnpicklingError` will be raised in the unpickling environment. Note that - as usual, the callable itself is pickled by name. - -* A tuple of arguments for the callable object, not ``None``. +* A tuple of arguments for the callable object. An empty tuple must be given if + the callable does not accept any argument. * Optionally, the object's state, which will be passed to the object's - :meth:`__setstate__` method as described in section :ref:`pickle-inst`. If the - object has no :meth:`__setstate__` method, then, as above, the value must be a - dictionary and it will be added to the object's :attr:`__dict__`. - -* Optionally, an iterator (and not a sequence) yielding successive list items. - These list items will be pickled, and appended to the object using either - ``obj.append(item)`` or ``obj.extend(list_of_items)``. This is primarily used - for list subclasses, but may be used by other classes as long as they have + :meth:`__setstate__` method as previously described. If the object has no + such method then, the value must be a dictionary and it will be added to the + object's :attr:`__dict__` attribute. + +* Optionally, an iterator (and not a sequence) yielding successive items. These + items will be appended to the object either using ``obj.append(item)`` or, in + batch, using ``obj.extend(list_of_items)``. This is primarily used for list + subclasses, but may be used by other classes as long as they have :meth:`append` and :meth:`extend` methods with the appropriate signature. (Whether :meth:`append` or :meth:`extend` is used depends on which pickle - protocol version is used as well as the number of items to append, so both must - be supported.) - -* Optionally, an iterator (not a sequence) yielding successive dictionary items, - which should be tuples of the form ``(key, value)``. These items will be - pickled and stored to the object using ``obj[key] = value``. This is primarily - used for dictionary subclasses, but may be used by other classes as long as they - implement :meth:`__setitem__`. - -It is sometimes useful to know the protocol version when implementing -:meth:`__reduce__`. This can be done by implementing a method named -:meth:`__reduce_ex__` instead of :meth:`__reduce__`. :meth:`__reduce_ex__`, when -it exists, is called in preference over :meth:`__reduce__` (you may still -provide :meth:`__reduce__` for backwards compatibility). The -:meth:`__reduce_ex__` method will be called with a single integer argument, the -protocol version. - -The :class:`object` class implements both :meth:`__reduce__` and -:meth:`__reduce_ex__`; however, if a subclass overrides :meth:`__reduce__` but -not :meth:`__reduce_ex__`, the :meth:`__reduce_ex__` implementation detects this -and calls :meth:`__reduce__`. - -An alternative to implementing a :meth:`__reduce__` method on the object to be -pickled, is to register the callable with the :mod:`copyreg` module. This -module provides a way for programs to register "reduction functions" and -constructors for user-defined types. Reduction functions have the same -semantics and interface as the :meth:`__reduce__` method described above, except -that they are called with a single argument, the object to be pickled. - -The registered constructor is deemed a "safe constructor" for purposes of -unpickling as described above. + protocol version is used as well as the number of items to append, so both + must be supported.) + +* Optionally, an iterator (not a sequence) yielding successive key-value pairs. + These items will be stored to the object using ``obj[key] = value``. This is + primarily used for dictionary subclasses, but may be used by other classes as + long as they implement :meth:`__setitem__`. + +.. index:: single: __reduce_ex__() (copy protocol) +Alternatively, a :meth:`__reduce_ex__` method may be defined. The only +difference is this method should take a single integer argument, the protocol +version. When defined, pickle will prefer it over the :meth:`__reduce__` +method. In addition, :meth:`__reduce__` automatically becomes a synonym for the +extended version. The main use for this method is to provide +backwards-compatible reduce values for older Python releases. .. _pickle-persistent: -Pickling and unpickling external objects -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +Persistence of External Objects +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. index:: single: persistent_id (pickle protocol) @@ -603,17 +567,85 @@ To unpickle external objects, the unpickler must have a custom :meth:`persistent_load` method that takes a persistent ID object and returns the referenced object. -Example: +Here is a comprehensive example presenting how persistent ID can be used to +pickle external objects by reference. .. XXX Work around for some bug in sphinx/pygments. .. highlightlang:: python .. literalinclude:: ../includes/dbpickle.py .. highlightlang:: python3 +.. _pickle-state: + +Handling Stateful Objects +^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. index:: + single: __getstate__() (copy protocol) + single: __setstate__() (copy protocol) + +Here's an example that shows how to modify pickling behavior for a class. +The :class:`TextReader` class opens a text file, and returns the line number and +line contents each time its :meth:`readline` method is called. If a +:class:`TextReader` instance is pickled, all attributes *except* the file object +member are saved. When the instance is unpickled, the file is reopened, and +reading resumes from the last location. The :meth:`__setstate__` and +:meth:`__getstate__` methods are used to implement this behavior. :: + + class TextReader: + """Print and number lines in a text file.""" + + def __init__(self, filename): + self.filename = filename + self.file = open(filename) + self.lineno = 0 + + def readline(self): + self.lineno += 1 + line = self.file.readline() + if not line: + return None + if line.endswith("\n"): + line = line[:-1] + return "%i: %s" % (self.lineno, line) + + def __getstate__(self): + # Copy the object's state from self.__dict__ which contains + # all our instance attributes. Always use the dict.copy() + # method to avoid modifying the original state. + state = self.__dict__.copy() + # Remove the unpicklable entries. + del state['file'] + return state + + def __setstate__(self, state): + # Restore instance attributes (i.e., filename and lineno). + self.__dict__.update(state) + # Restore the previously opened file's state. To do so, we need to + # reopen it and read from it until the line count is restored. + file = open(self.filename) + for _ in range(self.lineno): + file.readline() + # Finally, save the file. + self.file = file + + +A sample usage might be something like this:: + + >>> reader = TextReader("hello.txt") + >>> reader.readline() + '1: Hello world!' + >>> reader.readline() + '2: I am line number two.' + >>> new_reader = pickle.loads(pickle.dumps(reader)) + >>> new_reader.readline() + '3: Goodbye!' + + .. _pickle-restrict: Restricting Globals -^^^^^^^^^^^^^^^^^^^ +------------------- .. index:: single: find_class() (pickle protocol) @@ -653,6 +685,7 @@ Here is an example of an unpickler allowing only few safe classes from the } class RestrictedUnpickler(pickle.Unpickler): + def find_class(self, module, name): # Only allow safe classes from builtins. if module == "builtins" and name in safe_builtins: @@ -680,10 +713,15 @@ A sample usage of our unpickler working has intended:: ... pickle.UnpicklingError: global 'builtins.eval' is forbidden -As our examples shows, you have to be careful with what you allow to -be unpickled. Therefore if security is a concern, you may want to consider -alternatives such as the marshalling API in :mod:`xmlrpc.client` or -third-party solutions. + +.. XXX Add note about how extension codes could evade our protection + mechanism (e.g. cached classes do not invokes find_class()). + +As our examples shows, you have to be careful with what you allow to be +unpickled. Therefore if security is a concern, you may want to consider +alternatives such as the marshalling API in :mod:`xmlrpc.client` or third-party +solutions. + .. _pickle-example: @@ -728,69 +766,6 @@ can't be sure if the ASCII or binary format was used. :: pkl_file.close() -Here's a larger example that shows how to modify pickling behavior for a class. -The :class:`TextReader` class opens a text file, and returns the line number and -line contents each time its :meth:`readline` method is called. If a -:class:`TextReader` instance is pickled, all attributes *except* the file object -member are saved. When the instance is unpickled, the file is reopened, and -reading resumes from the last location. The :meth:`__setstate__` and -:meth:`__getstate__` methods are used to implement this behavior. :: - - #!/usr/local/bin/python - - class TextReader: - """Print and number lines in a text file.""" - def __init__(self, file): - self.file = file - self.fh = open(file) - self.lineno = 0 - - def readline(self): - self.lineno = self.lineno + 1 - line = self.fh.readline() - if not line: - return None - if line.endswith("\n"): - line = line[:-1] - return "%d: %s" % (self.lineno, line) - - def __getstate__(self): - odict = self.__dict__.copy() # copy the dict since we change it - del odict['fh'] # remove filehandle entry - return odict - - def __setstate__(self, dict): - fh = open(dict['file']) # reopen file - count = dict['lineno'] # read from file... - while count: # until line count is restored - fh.readline() - count = count - 1 - self.__dict__.update(dict) # update attributes - self.fh = fh # save the file object - -A sample usage might be something like this:: - - >>> import TextReader - >>> obj = TextReader.TextReader("TextReader.py") - >>> obj.readline() - '1: #!/usr/local/bin/python' - >>> obj.readline() - '2: ' - >>> obj.readline() - '3: class TextReader:' - >>> import pickle - >>> pickle.dump(obj, open('save.p', 'wb')) - -If you want to see that :mod:`pickle` works across Python processes, start -another Python session, before continuing. What follows can happen from either -the same process or a new process. :: - - >>> import pickle - >>> reader = pickle.load(open('save.p', 'rb')) - >>> reader.readline() - '4: """Print and number lines in a text file."""' - - .. seealso:: Module :mod:`copyreg` @@ -813,10 +788,8 @@ the same process or a new process. :: .. [#] The exception raised will likely be an :exc:`ImportError` or an :exc:`AttributeError` but it could be something else. -.. [#] These methods can also be used to implement copying class instances. - -.. [#] This protocol is also used by the shallow and deep copying operations - defined in the :mod:`copy` module. +.. [#] The :mod:`copy` module uses this protocol for shallow and deep copying + operations. .. [#] The limitation on alphanumeric characters is due to the fact the persistent IDs, in protocol 0, are delimited by the newline |