1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
|
:mod:`pickle` --- Python object serialization
=============================================
.. index::
single: persistence
pair: persistent; objects
pair: serializing; objects
pair: marshalling; objects
pair: flattening; objects
pair: pickling; objects
.. module:: pickle
:synopsis: Convert Python objects to streams of bytes and back.
.. sectionauthor:: Jim Kerr <jbkerr@sr.hp.com>.
.. sectionauthor:: Barry Warsaw <barry@zope.com>
The :mod:`pickle` module implements a fundamental, but powerful algorithm for
serializing and de-serializing a Python object structure. "Pickling" is the
process whereby a Python object hierarchy is converted into a byte stream, and
"unpickling" is the inverse operation, whereby a byte stream is converted back
into an object hierarchy. Pickling (and unpickling) is alternatively known as
"serialization", "marshalling," [#]_ or "flattening", however, to avoid
confusion, the terms used here are "pickling" and "unpickling"..
Relationship to other Python modules
------------------------------------
The :mod:`pickle` module has an transparent optimizer (:mod:`_pickle`) written
in C. It is used whenever available. Otherwise the pure Python implementation is
used.
Python has a more primitive serialization module called :mod:`marshal`, but in
general :mod:`pickle` should always be the preferred way to serialize Python
objects. :mod:`marshal` exists primarily to support Python's :file:`.pyc`
files.
The :mod:`pickle` module differs from :mod:`marshal` several significant ways:
* The :mod:`pickle` module keeps track of the objects it has already serialized,
so that later references to the same object won't be serialized again.
:mod:`marshal` doesn't do this.
This has implications both for recursive objects and object sharing. Recursive
objects are objects that contain references to themselves. These are not
handled by marshal, and in fact, attempting to marshal recursive objects will
crash your Python interpreter. Object sharing happens when there are multiple
references to the same object in different places in the object hierarchy being
serialized. :mod:`pickle` stores such objects only once, and ensures that all
other references point to the master copy. Shared objects remain shared, which
can be very important for mutable objects.
* :mod:`marshal` cannot be used to serialize user-defined classes and their
instances. :mod:`pickle` can save and restore class instances transparently,
however the class definition must be importable and live in the same module as
when the object was stored.
* The :mod:`marshal` serialization format is not guaranteed to be portable
across Python versions. Because its primary job in life is to support
:file:`.pyc` files, the Python implementers reserve the right to change the
serialization format in non-backwards compatible ways should the need arise.
The :mod:`pickle` serialization format is guaranteed to be backwards compatible
across Python releases.
.. warning::
The :mod:`pickle` module is not intended to be secure against erroneous or
maliciously constructed data. Never unpickle data received from an untrusted or
unauthenticated source.
Note that serialization is a more primitive notion than persistence; although
:mod:`pickle` reads and writes file objects, it does not handle the issue of
naming persistent objects, nor the (even more complicated) issue of concurrent
access to persistent objects. The :mod:`pickle` module can transform a complex
object into a byte stream and it can transform the byte stream into an object
with the same internal structure. Perhaps the most obvious thing to do with
these byte streams is to write them onto a file, but it is also conceivable to
send them across a network or store them in a database. The module
:mod:`shelve` provides a simple interface to pickle and unpickle objects on
DBM-style database files.
Data stream format
------------------
.. index::
single: XDR
single: External Data Representation
The data format used by :mod:`pickle` is Python-specific. This has the
advantage that there are no restrictions imposed by external standards such as
XDR (which can't represent pointer sharing); however it means that non-Python
programs may not be able to reconstruct pickled Python objects.
By default, the :mod:`pickle` data format uses a compact binary representation.
The module :mod:`pickletools` contains tools for analyzing data streams
generated by :mod:`pickle`.
There are currently 4 different protocols which can be used for pickling.
* Protocol version 0 is the original ASCII protocol and is backwards compatible
with earlier versions of Python.
* Protocol version 1 is the old binary format which is also compatible with
earlier versions of Python.
* Protocol version 2 was introduced in Python 2.3. It provides much more
efficient pickling of :term:`new-style class`\es.
* Protocol version 3 was added in Python 3.0. It has explicit support for
bytes and cannot be unpickled by Python 2.x pickle modules. This is
the current recommended protocol, use it whenever it is possible.
Refer to :pep:`307` for more information.
If a *protocol* is not specified, protocol 3 is used. If *protocol* is
specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest
protocol version available will be used.
Usage
-----
To serialize an object hierarchy, you first create a pickler, then you call the
pickler's :meth:`dump` method. To de-serialize a data stream, you first create
an unpickler, then you call the unpickler's :meth:`load` method. The
:mod:`pickle` module provides the following constant:
.. data:: HIGHEST_PROTOCOL
The highest protocol version available. This value can be passed as a
*protocol* value.
.. note::
Be sure to always open pickle files created with protocols >= 1 in binary mode.
For the old ASCII-based pickle protocol 0 you can use either text mode or binary
mode as long as you stay consistent.
A pickle file written with protocol 0 in binary mode will contain lone linefeeds
as line terminators and therefore will look "funny" when viewed in Notepad or
other editors which do not support this format.
.. data:: DEFAULT_PROTOCOL
The default protocol used for pickling. May be less than HIGHEST_PROTOCOL.
Currently the default protocol is 3; a backward-incompatible protocol
designed for Python 3.0.
The :mod:`pickle` module provides the following functions to make the pickling
process more convenient:
.. function:: dump(obj, file[, protocol])
Write a pickled representation of *obj* to the open file object *file*. This
is equivalent to ``Pickler(file, protocol).dump(obj)``.
The optional *protocol* argument tells the pickler to use the given protocol;
supported protocols are 0, 1, 2, 3. The default protocol is 3; a
backward-incompatible protocol designed for Python 3.0.
Specifying a negative protocol version selects the highest protocol version
supported. The higher the protocol used, the more recent the version of
Python needed to read the pickle produced.
The *file* argument must have a write() method that accepts a single bytes
argument. It can thus be a file object opened for binary writing, a
io.BytesIO instance, or any other custom object that meets this interface.
.. function:: dumps(obj[, protocol])
Return the pickled representation of the object as a :class:`bytes`
object, instead of writing it to a file.
The optional *protocol* argument tells the pickler to use the given protocol;
supported protocols are 0, 1, 2, 3. The default protocol is 3; a
backward-incompatible protocol designed for Python 3.0.
Specifying a negative protocol version selects the highest protocol version
supported. The higher the protocol used, the more recent the version of
Python needed to read the pickle produced.
.. function:: load(file, [\*, encoding="ASCII", errors="strict"])
Read a pickled object representation from the open file object *file* and
return the reconstituted object hierarchy specified therein. This is
equivalent to ``Unpickler(file).load()``.
The protocol version of the pickle is detected automatically, so no protocol
argument is needed. Bytes past the pickled object's representation are
ignored.
The argument *file* must have two methods, a read() method that takes an
integer argument, and a readline() method that requires no arguments. Both
methods should return bytes. Thus *file* can be a binary file object opened
for reading, a BytesIO object, or any other custom object that meets this
interface.
Optional keyword arguments are encoding and errors, which are used to decode
8-bit string instances pickled by Python 2.x. These default to 'ASCII' and
'strict', respectively.
.. function:: loads(bytes_object, [\*, encoding="ASCII", errors="strict"])
Read a pickled object hierarchy from a :class:`bytes` object and return the
reconstituted object hierarchy specified therein
The protocol version of the pickle is detected automatically, so no protocol
argument is needed. Bytes past the pickled object's representation are
ignored.
Optional keyword arguments are encoding and errors, which are used to decode
8-bit string instances pickled by Python 2.x. These default to 'ASCII' and
'strict', respectively.
The :mod:`pickle` module defines three exceptions:
.. exception:: PickleError
Common base class for the other pickling exceptions. It inherits
:exc:`Exception`.
.. exception:: PicklingError
Error raised when an unpicklable object is encountered by :class:`Pickler`.
It inherits :exc:`PickleError`.
.. exception:: UnpicklingError
Error raised when there a problem unpickling an object, such as a data
corruption or a security violation. It inherits :exc:`PickleError`.
Note that other exceptions may also be raised during unpickling, including
(but not necessarily limited to) AttributeError, EOFError, ImportError, and
IndexError.
The :mod:`pickle` module exports two classes, :class:`Pickler` and
:class:`Unpickler`:
.. class:: Pickler(file[, protocol])
This takes a binary file for writing a pickle data stream.
The optional *protocol* argument tells the pickler to use the given protocol;
supported protocols are 0, 1, 2, 3. The default protocol is 3; a
backward-incompatible protocol designed for Python 3.0.
Specifying a negative protocol version selects the highest protocol version
supported. The higher the protocol used, the more recent the version of
Python needed to read the pickle produced.
The *file* argument must have a write() method that accepts a single bytes
argument. It can thus be a file object opened for binary writing, a
io.BytesIO instance, or any other custom object that meets this interface.
.. method:: dump(obj)
Write a pickled representation of *obj* to the open file object given in
the constructor.
.. method:: persistent_id(obj)
Do nothing by default. This exists so a subclass can override it.
If :meth:`persistent_id` returns ``None``, *obj* is pickled as usual. Any
other value causes :class:`Pickler` to emit the returned value as a
persistent ID for *obj*. The meaning of this persistent ID should be
defined by :meth:`Unpickler.persistent_load`. Note that the value
returned by :meth:`persistent_id` cannot itself have a persistent ID.
See :ref:`pickle-persistent` for details and examples of uses.
.. method:: clear_memo()
Deprecated. Use the :meth:`clear` method on the :attr:`memo`. Clear the
pickler's memo, useful when reusing picklers.
.. attribute:: fast
Enable fast mode if set to a true value. The fast mode disables the usage
of memo, therefore speeding the pickling process by not generating
superfluous PUT opcodes. It should not be used with self-referential
objects, doing otherwise will cause :class:`Pickler` to recurse
infinitely.
Use :func:`pickletools.optimize` if you need more compact pickles.
.. attribute:: memo
Dictionary holding previously pickled objects to allow shared or
recursive objects to pickled by reference as opposed to by value.
It is possible to make multiple calls to the :meth:`dump` method of the same
:class:`Pickler` instance. These must then be matched to the same number of
calls to the :meth:`load` method of the corresponding :class:`Unpickler`
instance. If the same object is pickled by multiple :meth:`dump` calls, the
:meth:`load` will all yield references to the same object.
Please note, this is intended for pickling multiple objects without intervening
modifications to the objects or their parts. If you modify an object and then
pickle it again using the same :class:`Pickler` instance, the object is not
pickled again --- a reference to it is pickled and the :class:`Unpickler` will
return the old value, not the modified one.
.. class:: Unpickler(file, [\*, encoding="ASCII", errors="strict"])
This takes a binary file for reading a pickle data stream.
The protocol version of the pickle is detected automatically, so no
protocol argument is needed.
The argument *file* must have two methods, a read() method that takes an
integer argument, and a readline() method that requires no arguments. Both
methods should return bytes. Thus *file* can be a binary file object opened
for reading, a BytesIO object, or any other custom object that meets this
interface.
Optional keyword arguments are encoding and errors, which are used to decode
8-bit string instances pickled by Python 2.x. These default to 'ASCII' and
'strict', respectively.
.. method:: load()
Read a pickled object representation from the open file object given in
the constructor, and return the reconstituted object hierarchy specified
therein. Bytes past the pickled object's representation are ignored.
.. method:: persistent_load(pid)
Raise an :exc:`UnpickingError` by default.
If defined, :meth:`persistent_load` should return the object specified by
the persistent ID *pid*. On errors, such as if an invalid persistent ID is
encountered, an :exc:`UnpickingError` should be raised.
See :ref:`pickle-persistent` for details and examples of uses.
.. method:: find_class(module, name)
Import *module* if necessary and return the object called *name* from it.
Subclasses may override this to gain control over what type of objects can
be loaded, potentially reducing security risks.
What can be pickled and unpickled?
----------------------------------
The following types can be pickled:
* ``None``, ``True``, and ``False``
* integers, floating point numbers, complex numbers
* strings, bytes, bytearrays
* tuples, lists, sets, and dictionaries containing only picklable objects
* functions defined at the top level of a module
* built-in functions defined at the top level of a module
* classes that are defined at the top level of a module
* instances of such classes whose :attr:`__dict__` or :meth:`__setstate__` is
picklable (see section :ref:`pickle-protocol` for details)
Attempts to pickle unpicklable objects will raise the :exc:`PicklingError`
exception; when this happens, an unspecified number of bytes may have already
been written to the underlying file. Trying to pickle a highly recursive data
structure may exceed the maximum recursion depth, a :exc:`RuntimeError` will be
raised in this case. You can carefully raise this limit with
:func:`sys.setrecursionlimit`.
Note that functions (built-in and user-defined) are pickled by "fully qualified"
name reference, not by value. This means that only the function name is
pickled, along with the name of module the function is defined in. Neither the
function's code, nor any of its function attributes are pickled. Thus the
defining module must be importable in the unpickling environment, and the module
must contain the named object, otherwise an exception will be raised. [#]_
Similarly, classes are pickled by named reference, so the same restrictions in
the unpickling environment apply. Note that none of the class's code or data is
pickled, so in the following example the class attribute ``attr`` is not
restored in the unpickling environment::
class Foo:
attr = 'a class attr'
picklestring = pickle.dumps(Foo)
These restrictions are why picklable functions and classes must be defined in
the top level of a module.
Similarly, when class instances are pickled, their class's code and data are not
pickled along with them. Only the instance data are pickled. This is done on
purpose, so you can fix bugs in a class or add methods to the class and still
load objects that were created with an earlier version of the class. If you
plan to have long-lived objects that will see many versions of a class, it may
be worthwhile to put a version number in the objects so that suitable
conversions can be made by the class's :meth:`__setstate__` method.
.. _pickle-protocol:
The pickle protocol
-------------------
This section describes the "pickling protocol" that defines the interface
between the pickler/unpickler and the objects that are being serialized. This
protocol provides a standard way for you to define, customize, and control how
your objects are serialized and de-serialized. The description in this section
doesn't cover specific customizations that you can employ to make the unpickling
environment slightly safer from untrusted pickle data streams; see section
:ref:`pickle-sub` for more details.
.. _pickle-inst:
Pickling and unpickling normal class instances
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. index::
single: __getinitargs__() (copy protocol)
single: __init__() (instance constructor)
.. XXX is __getinitargs__ only used with old-style classes?
.. XXX update w.r.t Py3k's classes
When a pickled class instance is unpickled, its :meth:`__init__` method is
normally *not* invoked. If it is desirable that the :meth:`__init__` method be
called on unpickling, an old-style class can define a method
:meth:`__getinitargs__`, which should return a *tuple* containing the arguments
to be passed to the class constructor (:meth:`__init__` for example). The
:meth:`__getinitargs__` method is called at pickle time; the tuple it returns is
incorporated in the pickle for the instance.
.. index:: single: __getnewargs__() (copy protocol)
New-style types can provide a :meth:`__getnewargs__` method that is used for
protocol 2. Implementing this method is needed if the type establishes some
internal invariants when the instance is created, or if the memory allocation is
affected by the values passed to the :meth:`__new__` method for the type (as it
is for tuples and strings). Instances of a :term:`new-style class` :class:`C`
are created using ::
obj = C.__new__(C, *args)
where *args* is the result of calling :meth:`__getnewargs__` on the original
object; if there is no :meth:`__getnewargs__`, an empty tuple is assumed.
.. index::
single: __getstate__() (copy protocol)
single: __setstate__() (copy protocol)
single: __dict__ (instance attribute)
Classes can further influence how their instances are pickled; if the class
defines the method :meth:`__getstate__`, it is called and the return state is
pickled as the contents for the instance, instead of the contents of the
instance's dictionary. If there is no :meth:`__getstate__` method, the
instance's :attr:`__dict__` is pickled.
Upon unpickling, if the class also defines the method :meth:`__setstate__`, it
is called with the unpickled state. [#]_ If there is no :meth:`__setstate__`
method, the pickled state must be a dictionary and its items are assigned to the
new instance's dictionary. If a class defines both :meth:`__getstate__` and
:meth:`__setstate__`, the state object needn't be a dictionary and these methods
can do what they want. [#]_
.. warning::
If :meth:`__getstate__` returns a false value, the :meth:`__setstate__`
method will not be called.
Pickling and unpickling extension types
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. index::
single: __reduce__() (pickle protocol)
single: __reduce_ex__() (pickle protocol)
single: __safe_for_unpickling__ (pickle protocol)
When the :class:`Pickler` encounters an object of a type it knows nothing about
--- such as an extension type --- it looks in two places for a hint of how to
pickle it. One alternative is for the object to implement a :meth:`__reduce__`
method. If provided, at pickling time :meth:`__reduce__` will be called with no
arguments, and it must return either a string or a tuple.
If a string is returned, it names a global variable whose contents are pickled
as normal. The string returned by :meth:`__reduce__` should be the object's
local name relative to its module; the pickle module searches the module
namespace to determine the object's module.
When a tuple is returned, it must be between two and five elements long.
Optional elements can either be omitted, or ``None`` can be provided as their
value. The contents of this tuple are pickled as normal and used to
reconstruct the object at unpickling time. The semantics of each element are:
* A callable object that will be called to create the initial version of the
object. The next element of the tuple will provide arguments for this callable,
and later elements provide additional state information that will subsequently
be used to fully reconstruct the pickled data.
In the unpickling environment this object must be either a class, a callable
registered as a "safe constructor" (see below), or it must have an attribute
:attr:`__safe_for_unpickling__` with a true value. Otherwise, an
:exc:`UnpicklingError` will be raised in the unpickling environment. Note that
as usual, the callable itself is pickled by name.
* A tuple of arguments for the callable object, not ``None``.
* Optionally, the object's state, which will be passed to the object's
:meth:`__setstate__` method as described in section :ref:`pickle-inst`. If the
object has no :meth:`__setstate__` method, then, as above, the value must be a
dictionary and it will be added to the object's :attr:`__dict__`.
* Optionally, an iterator (and not a sequence) yielding successive list items.
These list items will be pickled, and appended to the object using either
``obj.append(item)`` or ``obj.extend(list_of_items)``. This is primarily used
for list subclasses, but may be used by other classes as long as they have
:meth:`append` and :meth:`extend` methods with the appropriate signature.
(Whether :meth:`append` or :meth:`extend` is used depends on which pickle
protocol version is used as well as the number of items to append, so both must
be supported.)
* Optionally, an iterator (not a sequence) yielding successive dictionary items,
which should be tuples of the form ``(key, value)``. These items will be
pickled and stored to the object using ``obj[key] = value``. This is primarily
used for dictionary subclasses, but may be used by other classes as long as they
implement :meth:`__setitem__`.
It is sometimes useful to know the protocol version when implementing
:meth:`__reduce__`. This can be done by implementing a method named
:meth:`__reduce_ex__` instead of :meth:`__reduce__`. :meth:`__reduce_ex__`, when
it exists, is called in preference over :meth:`__reduce__` (you may still
provide :meth:`__reduce__` for backwards compatibility). The
:meth:`__reduce_ex__` method will be called with a single integer argument, the
protocol version.
The :class:`object` class implements both :meth:`__reduce__` and
:meth:`__reduce_ex__`; however, if a subclass overrides :meth:`__reduce__` but
not :meth:`__reduce_ex__`, the :meth:`__reduce_ex__` implementation detects this
and calls :meth:`__reduce__`.
An alternative to implementing a :meth:`__reduce__` method on the object to be
pickled, is to register the callable with the :mod:`copyreg` module. This
module provides a way for programs to register "reduction functions" and
constructors for user-defined types. Reduction functions have the same
semantics and interface as the :meth:`__reduce__` method described above, except
that they are called with a single argument, the object to be pickled.
The registered constructor is deemed a "safe constructor" for purposes of
unpickling as described above.
.. _pickle-persistent:
Pickling and unpickling external objects
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. index::
single: persistent_id (pickle protocol)
single: persistent_load (pickle protocol)
For the benefit of object persistence, the :mod:`pickle` module supports the
notion of a reference to an object outside the pickled data stream. Such
objects are referenced by a "persistent id", which is just an arbitrary string
of printable ASCII characters. The resolution of such names is not defined by
the :mod:`pickle` module; it will delegate this resolution to user defined
functions on the pickler and unpickler.
To define external persistent id resolution, you need to set the
:attr:`persistent_id` attribute of the pickler object and the
:attr:`persistent_load` attribute of the unpickler object.
To pickle objects that have an external persistent id, the pickler must have a
custom :func:`persistent_id` method that takes an object as an argument and
returns either ``None`` or the persistent id for that object. When ``None`` is
returned, the pickler simply pickles the object as normal. When a persistent id
string is returned, the pickler will pickle that string, along with a marker so
that the unpickler will recognize the string as a persistent id.
To unpickle external objects, the unpickler must have a custom
:func:`persistent_load` function that takes a persistent id string and returns
the referenced object.
Here's a silly example that *might* shed more light::
import pickle
from io import StringIO
src = StringIO()
p = pickle.Pickler(src)
def persistent_id(obj):
if hasattr(obj, 'x'):
return 'the value %d' % obj.x
else:
return None
p.persistent_id = persistent_id
class Integer:
def __init__(self, x):
self.x = x
def __str__(self):
return 'My name is integer %d' % self.x
i = Integer(7)
print(i)
p.dump(i)
datastream = src.getvalue()
print(repr(datastream))
dst = StringIO(datastream)
up = pickle.Unpickler(dst)
class FancyInteger(Integer):
def __str__(self):
return 'I am the integer %d' % self.x
def persistent_load(persid):
if persid.startswith('the value '):
value = int(persid.split()[2])
return FancyInteger(value)
else:
raise pickle.UnpicklingError('Invalid persistent id')
up.persistent_load = persistent_load
j = up.load()
print(j)
.. BAW: pickle supports something called inst_persistent_id()
which appears to give unknown types a second shot at producing a persistent
id. Since Jim Fulton can't remember why it was added or what it's for, I'm
leaving it undocumented.
.. _pickle-sub:
Subclassing Unpicklers
----------------------
.. index::
single: load_global() (pickle protocol)
single: find_global() (pickle protocol)
By default, unpickling will import any class that it finds in the pickle data.
You can control exactly what gets unpickled and what gets called by customizing
your unpickler.
You need to derive a subclass from :class:`Unpickler`, overriding the
:meth:`load_global` method. :meth:`load_global` should read two lines from the
pickle data stream where the first line will the name of the module containing
the class and the second line will be the name of the instance's class. It then
looks up the class, possibly importing the module and digging out the attribute,
then it appends what it finds to the unpickler's stack. Later on, this class
will be assigned to the :attr:`__class__` attribute of an empty class, as a way
of magically creating an instance without calling its class's
:meth:`__init__`. Your job (should you choose to accept it), would be to have
:meth:`load_global` push onto the unpickler's stack, a known safe version of any
class you deem safe to unpickle. It is up to you to produce such a class. Or
you could raise an error if you want to disallow all unpickling of instances.
If this sounds like a hack, you're right. Refer to the source code to make this
work.
The moral of the story is that you should be really careful about the source of
the strings your application unpickles.
.. _pickle-example:
Example
-------
For the simplest code, use the :func:`dump` and :func:`load` functions. Note
that a self-referencing list is pickled and restored correctly. ::
import pickle
data1 = {'a': [1, 2.0, 3, 4+6j],
'b': ("string", "string using Unicode features \u0394"),
'c': None}
selfref_list = [1, 2, 3]
selfref_list.append(selfref_list)
output = open('data.pkl', 'wb')
# Pickle dictionary using protocol 2.
pickle.dump(data1, output, 2)
# Pickle the list using the highest protocol available.
pickle.dump(selfref_list, output, -1)
output.close()
The following example reads the resulting pickled data. When reading a
pickle-containing file, you should open the file in binary mode because you
can't be sure if the ASCII or binary format was used. ::
import pprint, pickle
pkl_file = open('data.pkl', 'rb')
data1 = pickle.load(pkl_file)
pprint.pprint(data1)
data2 = pickle.load(pkl_file)
pprint.pprint(data2)
pkl_file.close()
Here's a larger example that shows how to modify pickling behavior for a class.
The :class:`TextReader` class opens a text file, and returns the line number and
line contents each time its :meth:`readline` method is called. If a
:class:`TextReader` instance is pickled, all attributes *except* the file object
member are saved. When the instance is unpickled, the file is reopened, and
reading resumes from the last location. The :meth:`__setstate__` and
:meth:`__getstate__` methods are used to implement this behavior. ::
#!/usr/local/bin/python
class TextReader:
"""Print and number lines in a text file."""
def __init__(self, file):
self.file = file
self.fh = open(file)
self.lineno = 0
def readline(self):
self.lineno = self.lineno + 1
line = self.fh.readline()
if not line:
return None
if line.endswith("\n"):
line = line[:-1]
return "%d: %s" % (self.lineno, line)
def __getstate__(self):
odict = self.__dict__.copy() # copy the dict since we change it
del odict['fh'] # remove filehandle entry
return odict
def __setstate__(self, dict):
fh = open(dict['file']) # reopen file
count = dict['lineno'] # read from file...
while count: # until line count is restored
fh.readline()
count = count - 1
self.__dict__.update(dict) # update attributes
self.fh = fh # save the file object
A sample usage might be something like this::
>>> import TextReader
>>> obj = TextReader.TextReader("TextReader.py")
>>> obj.readline()
'1: #!/usr/local/bin/python'
>>> obj.readline()
'2: '
>>> obj.readline()
'3: class TextReader:'
>>> import pickle
>>> pickle.dump(obj, open('save.p', 'wb'))
If you want to see that :mod:`pickle` works across Python processes, start
another Python session, before continuing. What follows can happen from either
the same process or a new process. ::
>>> import pickle
>>> reader = pickle.load(open('save.p', 'rb'))
>>> reader.readline()
'4: """Print and number lines in a text file."""'
.. seealso::
Module :mod:`copyreg`
Pickle interface constructor registration for extension types.
Module :mod:`shelve`
Indexed databases of objects; uses :mod:`pickle`.
Module :mod:`copy`
Shallow and deep object copying.
Module :mod:`marshal`
High-performance serialization of built-in types.
.. rubric:: Footnotes
.. [#] Don't confuse this with the :mod:`marshal` module
.. [#] The exception raised will likely be an :exc:`ImportError` or an
:exc:`AttributeError` but it could be something else.
.. [#] These methods can also be used to implement copying class instances.
.. [#] This protocol is also used by the shallow and deep copying operations defined in
the :mod:`copy` module.
|