diff options
author | Antoine Pitrou <antoine@python.org> | 2019-05-26 15:10:09 (GMT) |
---|---|---|
committer | GitHub <noreply@github.com> | 2019-05-26 15:10:09 (GMT) |
commit | 91f4380cedbae32b49adbea2518014a5624c6523 (patch) | |
tree | fbc47b8ee756f9e0a8f6bacf6b055490f2ef9ab3 | |
parent | 22ccb0b4902137275960c008ef77b88fa82729ce (diff) | |
download | cpython-91f4380cedbae32b49adbea2518014a5624c6523.zip cpython-91f4380cedbae32b49adbea2518014a5624c6523.tar.gz cpython-91f4380cedbae32b49adbea2518014a5624c6523.tar.bz2 |
bpo-36785: PEP 574 implementation (GH-7076)
-rw-r--r-- | Doc/library/pickle.rst | 271 | ||||
-rw-r--r-- | Include/Python.h | 1 | ||||
-rw-r--r-- | Include/picklebufobject.h | 31 | ||||
-rw-r--r-- | Lib/pickle.py | 152 | ||||
-rw-r--r-- | Lib/pickletools.py | 80 | ||||
-rw-r--r-- | Lib/test/pickletester.py | 423 | ||||
-rw-r--r-- | Lib/test/test_inspect.py | 11 | ||||
-rw-r--r-- | Lib/test/test_pickle.py | 12 | ||||
-rw-r--r-- | Lib/test/test_picklebuffer.py | 154 | ||||
-rw-r--r-- | Lib/test/test_pickletools.py | 17 | ||||
-rw-r--r-- | Lib/test/test_pyclbr.py | 2 | ||||
-rw-r--r-- | Makefile.pre.in | 4 | ||||
-rw-r--r-- | Misc/NEWS.d/next/Library/2019-05-03-20-47-55.bpo-36785.PQLnPq.rst | 1 | ||||
-rw-r--r-- | Modules/_pickle.c | 511 | ||||
-rw-r--r-- | Modules/clinic/_pickle.c.h | 228 | ||||
-rw-r--r-- | Objects/object.c | 1 | ||||
-rw-r--r-- | Objects/picklebufobject.c | 219 | ||||
-rw-r--r-- | PCbuild/pythoncore.vcxproj | 2 | ||||
-rw-r--r-- | PCbuild/pythoncore.vcxproj.filters | 6 |
19 files changed, 1886 insertions, 240 deletions
diff --git a/Doc/library/pickle.rst b/Doc/library/pickle.rst index f4c41ac..6aa3049 100644 --- a/Doc/library/pickle.rst +++ b/Doc/library/pickle.rst @@ -195,34 +195,29 @@ The :mod:`pickle` module provides the following constants: The :mod:`pickle` module provides the following functions to make the pickling process more convenient: -.. function:: dump(obj, file, protocol=None, \*, fix_imports=True) +.. function:: dump(obj, file, protocol=None, \*, fix_imports=True, buffer_callback=None) Write a pickled representation of *obj* to the open :term:`file object` *file*. This is equivalent to ``Pickler(file, protocol).dump(obj)``. - The optional *protocol* argument, an integer, tells the pickler to use - the given protocol; supported protocols are 0 to :data:`HIGHEST_PROTOCOL`. - If not specified, the default is :data:`DEFAULT_PROTOCOL`. If a negative - number is specified, :data:`HIGHEST_PROTOCOL` is selected. + Arguments *file*, *protocol*, *fix_imports* and *buffer_callback* have + the same meaning as in the :class:`Pickler` constructor. - The *file* argument must have a write() method that accepts a single bytes - argument. It can thus be an on-disk file opened for binary writing, an - :class:`io.BytesIO` instance, or any other custom object that meets this - interface. - - If *fix_imports* is true and *protocol* is less than 3, pickle will try to - map the new Python 3 names to the old module names used in Python 2, so - that the pickle data stream is readable with Python 2. + .. versionchanged:: 3.8 + The *buffer_callback* argument was added. -.. function:: dumps(obj, protocol=None, \*, fix_imports=True) +.. function:: dumps(obj, protocol=None, \*, fix_imports=True, buffer_callback=None) Return the pickled representation of the object as a :class:`bytes` object, instead of writing it to a file. - Arguments *protocol* and *fix_imports* have the same meaning as in - :func:`dump`. + Arguments *protocol*, *fix_imports* and *buffer_callback* have the same + meaning as in the :class:`Pickler` constructor. + + .. versionchanged:: 3.8 + The *buffer_callback* argument was added. -.. function:: load(file, \*, fix_imports=True, encoding="ASCII", errors="strict") +.. function:: load(file, \*, fix_imports=True, encoding="ASCII", errors="strict", buffers=None) Read a pickled object representation from the open :term:`file object` *file* and return the reconstituted object hierarchy specified therein. @@ -232,24 +227,13 @@ process more convenient: protocol argument is needed. Bytes past the pickled object's representation are ignored. - The argument *file* must have two methods, a read() method that takes an - integer argument, and a readline() method that requires no arguments. Both - methods should return bytes. Thus *file* can be an on-disk file opened for - binary reading, an :class:`io.BytesIO` object, or any other custom object - that meets this interface. - - Optional keyword arguments are *fix_imports*, *encoding* and *errors*, - which are used to control compatibility support for pickle stream generated - by Python 2. If *fix_imports* is true, pickle will try to map the old - Python 2 names to the new names used in Python 3. The *encoding* and - *errors* tell pickle how to decode 8-bit string instances pickled by Python - 2; these default to 'ASCII' and 'strict', respectively. The *encoding* can - be 'bytes' to read these 8-bit string instances as bytes objects. - Using ``encoding='latin1'`` is required for unpickling NumPy arrays and - instances of :class:`~datetime.datetime`, :class:`~datetime.date` and - :class:`~datetime.time` pickled by Python 2. + Arguments *file*, *fix_imports*, *encoding*, *errors*, *strict* and *buffers* + have the same meaning as in the :class:`Unpickler` constructor. -.. function:: loads(bytes_object, \*, fix_imports=True, encoding="ASCII", errors="strict") + .. versionchanged:: 3.8 + The *buffers* argument was added. + +.. function:: loads(bytes_object, \*, fix_imports=True, encoding="ASCII", errors="strict", buffers=None) Read a pickled object hierarchy from a :class:`bytes` object and return the reconstituted object hierarchy specified therein. @@ -258,16 +242,11 @@ process more convenient: protocol argument is needed. Bytes past the pickled object's representation are ignored. - Optional keyword arguments are *fix_imports*, *encoding* and *errors*, - which are used to control compatibility support for pickle stream generated - by Python 2. If *fix_imports* is true, pickle will try to map the old - Python 2 names to the new names used in Python 3. The *encoding* and - *errors* tell pickle how to decode 8-bit string instances pickled by Python - 2; these default to 'ASCII' and 'strict', respectively. The *encoding* can - be 'bytes' to read these 8-bit string instances as bytes objects. - Using ``encoding='latin1'`` is required for unpickling NumPy arrays and - instances of :class:`~datetime.datetime`, :class:`~datetime.date` and - :class:`~datetime.time` pickled by Python 2. + Arguments *file*, *fix_imports*, *encoding*, *errors*, *strict* and *buffers* + have the same meaning as in the :class:`Unpickler` constructor. + + .. versionchanged:: 3.8 + The *buffers* argument was added. The :mod:`pickle` module defines three exceptions: @@ -295,10 +274,10 @@ The :mod:`pickle` module defines three exceptions: IndexError. -The :mod:`pickle` module exports two classes, :class:`Pickler` and -:class:`Unpickler`: +The :mod:`pickle` module exports three classes, :class:`Pickler`, +:class:`Unpickler` and :class:`PickleBuffer`: -.. class:: Pickler(file, protocol=None, \*, fix_imports=True) +.. class:: Pickler(file, protocol=None, \*, fix_imports=True, buffer_callback=None) This takes a binary file for writing a pickle data stream. @@ -316,6 +295,20 @@ The :mod:`pickle` module exports two classes, :class:`Pickler` and map the new Python 3 names to the old module names used in Python 2, so that the pickle data stream is readable with Python 2. + If *buffer_callback* is None (the default), buffer views are + serialized into *file* as part of the pickle stream. + + If *buffer_callback* is not None, then it can be called any number + of times with a buffer view. If the callback returns a false value + (such as None), the given buffer is :ref:`out-of-band <pickle-oob>`; + otherwise the buffer is serialized in-band, i.e. inside the pickle stream. + + It is an error if *buffer_callback* is not None and *protocol* is + None or smaller than 5. + + .. versionchanged:: 3.8 + The *buffer_callback* argument was added. + .. method:: dump(obj) Write a pickled representation of *obj* to the open file object given in @@ -379,26 +372,43 @@ The :mod:`pickle` module exports two classes, :class:`Pickler` and Use :func:`pickletools.optimize` if you need more compact pickles. -.. class:: Unpickler(file, \*, fix_imports=True, encoding="ASCII", errors="strict") +.. class:: Unpickler(file, \*, fix_imports=True, encoding="ASCII", errors="strict", buffers=None) This takes a binary file for reading a pickle data stream. The protocol version of the pickle is detected automatically, so no protocol argument is needed. - The argument *file* must have two methods, a read() method that takes an - integer argument, and a readline() method that requires no arguments. Both - methods should return bytes. Thus *file* can be an on-disk file object + The argument *file* must have three methods, a read() method that takes an + integer argument, a readinto() method that takes a buffer argument + and a readline() method that requires no arguments, as in the + :class:`io.BufferedIOBase` interface. Thus *file* can be an on-disk file opened for binary reading, an :class:`io.BytesIO` object, or any other custom object that meets this interface. - Optional keyword arguments are *fix_imports*, *encoding* and *errors*, - which are used to control compatibility support for pickle stream generated - by Python 2. If *fix_imports* is true, pickle will try to map the old - Python 2 names to the new names used in Python 3. The *encoding* and - *errors* tell pickle how to decode 8-bit string instances pickled by Python - 2; these default to 'ASCII' and 'strict', respectively. The *encoding* can + The optional arguments *fix_imports*, *encoding* and *errors* are used + to control compatibility support for pickle stream generated by Python 2. + If *fix_imports* is true, pickle will try to map the old Python 2 names + to the new names used in Python 3. The *encoding* and *errors* tell + pickle how to decode 8-bit string instances pickled by Python 2; + these default to 'ASCII' and 'strict', respectively. The *encoding* can be 'bytes' to read these 8-bit string instances as bytes objects. + Using ``encoding='latin1'`` is required for unpickling NumPy arrays and + instances of :class:`~datetime.datetime`, :class:`~datetime.date` and + :class:`~datetime.time` pickled by Python 2. + + If *buffers* is None (the default), then all data necessary for + deserialization must be contained in the pickle stream. This means + that the *buffer_callback* argument was None when a :class:`Pickler` + was instantiated (or when :func:`dump` or :func:`dumps` was called). + + If *buffers* is not None, it should be an iterable of buffer-enabled + objects that is consumed each time the pickle stream references + an :ref:`out-of-band <pickle-oob>` buffer view. Such buffers have been + given in order to the *buffer_callback* of a Pickler object. + + .. versionchanged:: 3.8 + The *buffers* argument was added. .. method:: load() @@ -429,6 +439,34 @@ The :mod:`pickle` module exports two classes, :class:`Pickler` and .. audit-event:: pickle.find_class "module name" +.. class:: PickleBuffer(buffer) + + A wrapper for a buffer representing picklable data. *buffer* must be a + :ref:`buffer-providing <bufferobjects>` object, such as a + :term:`bytes-like object` or a N-dimensional array. + + :class:`PickleBuffer` is itself a buffer provider, therefore it is + possible to pass it to other APIs expecting a buffer-providing object, + such as :class:`memoryview`. + + :class:`PickleBuffer` objects can only be serialized using pickle + protocol 5 or higher. They are eligible for + :ref:`out-of-band serialization <pickle-oob>`. + + .. versionadded:: 3.8 + + .. method:: raw() + + Return a :class:`memoryview` of the memory area underlying this buffer. + The returned object is a one-dimensional, C-contiguous memoryview + with format ``B`` (unsigned bytes). :exc:`BufferError` is raised if + the buffer is neither C- nor Fortran-contiguous. + + .. method:: release() + + Release the underlying buffer exposed by the PickleBuffer object. + + .. _pickle-picklable: What can be pickled and unpickled? @@ -864,6 +902,125 @@ a given class:: assert unpickled_class.my_attribute == 1 +.. _pickle-oob: + +Out-of-band Buffers +------------------- + +.. versionadded:: 3.8 + +In some contexts, the :mod:`pickle` module is used to transfer massive amounts +of data. Therefore, it can be important to minimize the number of memory +copies, to preserve performance and resource consumption. However, normal +operation of the :mod:`pickle` module, as it transforms a graph-like structure +of objects into a sequential stream of bytes, intrinsically involves copying +data to and from the pickle stream. + +This constraint can be eschewed if both the *provider* (the implementation +of the object types to be transferred) and the *consumer* (the implementation +of the communications system) support the out-of-band transfer facilities +provided by pickle protocol 5 and higher. + +Provider API +^^^^^^^^^^^^ + +The large data objects to be pickled must implement a :meth:`__reduce_ex__` +method specialized for protocol 5 and higher, which returns a +:class:`PickleBuffer` instance (instead of e.g. a :class:`bytes` object) +for any large data. + +A :class:`PickleBuffer` object *signals* that the underlying buffer is +eligible for out-of-band data transfer. Those objects remain compatible +with normal usage of the :mod:`pickle` module. However, consumers can also +opt-in to tell :mod:`pickle` that they will handle those buffers by +themselves. + +Consumer API +^^^^^^^^^^^^ + +A communications system can enable custom handling of the :class:`PickleBuffer` +objects generated when serializing an object graph. + +On the sending side, it needs to pass a *buffer_callback* argument to +:class:`Pickler` (or to the :func:`dump` or :func:`dumps` function), which +will be called with each :class:`PickleBuffer` generated while pickling +the object graph. Buffers accumulated by the *buffer_callback* will not +see their data copied into the pickle stream, only a cheap marker will be +inserted. + +On the receiving side, it needs to pass a *buffers* argument to +:class:`Unpickler` (or to the :func:`load` or :func:`loads` function), +which is an iterable of the buffers which were passed to *buffer_callback*. +That iterable should produce buffers in the same order as they were passed +to *buffer_callback*. Those buffers will provide the data expected by the +reconstructors of the objects whose pickling produced the original +:class:`PickleBuffer` objects. + +Between the sending side and the receiving side, the communications system +is free to implement its own transfer mechanism for out-of-band buffers. +Potential optimizations include the use of shared memory or datatype-dependent +compression. + +Example +^^^^^^^ + +Here is a trivial example where we implement a :class:`bytearray` subclass +able to participate in out-of-band buffer pickling:: + + class ZeroCopyByteArray(bytearray): + + def __reduce_ex__(self, protocol): + if protocol >= 5: + return type(self)._reconstruct, (PickleBuffer(self),), None + else: + # PickleBuffer is forbidden with pickle protocols <= 4. + return type(self)._reconstruct, (bytearray(self),) + + @classmethod + def _reconstruct(cls, obj): + with memoryview(obj) as m: + # Get a handle over the original buffer object + obj = m.obj + if type(obj) is cls: + # Original buffer object is a ZeroCopyByteArray, return it + # as-is. + return obj + else: + return cls(obj) + +The reconstructor (the ``_reconstruct`` class method) returns the buffer's +providing object if it has the right type. This is an easy way to simulate +zero-copy behaviour on this toy example. + +On the consumer side, we can pickle those objects the usual way, which +when unserialized will give us a copy of the original object:: + + b = ZeroCopyByteArray(b"abc") + data = pickle.dumps(b, protocol=5) + new_b = pickle.loads(data) + print(b == new_b) # True + print(b is new_b) # False: a copy was made + +But if we pass a *buffer_callback* and then give back the accumulated +buffers when unserializing, we are able to get back the original object:: + + b = ZeroCopyByteArray(b"abc") + buffers = [] + data = pickle.dumps(b, protocol=5, buffer_callback=buffers.append) + new_b = pickle.loads(data, buffers=buffers) + print(b == new_b) # True + print(b is new_b) # True: no copy was made + +This example is limited by the fact that :class:`bytearray` allocates its +own memory: you cannot create a :class:`bytearray` instance that is backed +by another object's memory. However, third-party datatypes such as NumPy +arrays do not have this limitation, and allow use of zero-copy pickling +(or making as few copies as possible) when transferring between distinct +processes or systems. + +.. seealso:: :pep:`574` -- Pickle protocol 5 with out-of-band data + + .. _pickle-restrict: Restricting Globals diff --git a/Include/Python.h b/Include/Python.h index 55b06ae..17ad5b3 100644 --- a/Include/Python.h +++ b/Include/Python.h @@ -124,6 +124,7 @@ #include "weakrefobject.h" #include "structseq.h" #include "namespaceobject.h" +#include "picklebufobject.h" #include "codecs.h" #include "pyerrors.h" diff --git a/Include/picklebufobject.h b/Include/picklebufobject.h new file mode 100644 index 0000000..f07e900 --- /dev/null +++ b/Include/picklebufobject.h @@ -0,0 +1,31 @@ +/* PickleBuffer object. This is built-in for ease of use from third-party + * C extensions. + */ + +#ifndef Py_PICKLEBUFOBJECT_H +#define Py_PICKLEBUFOBJECT_H +#ifdef __cplusplus +extern "C" { +#endif + +#ifndef Py_LIMITED_API + +PyAPI_DATA(PyTypeObject) PyPickleBuffer_Type; + +#define PyPickleBuffer_Check(op) (Py_TYPE(op) == &PyPickleBuffer_Type) + +/* Create a PickleBuffer redirecting to the given buffer-enabled object */ +PyAPI_FUNC(PyObject *) PyPickleBuffer_FromObject(PyObject *); +/* Get the PickleBuffer's underlying view to the original object + * (NULL if released) + */ +PyAPI_FUNC(const Py_buffer *) PyPickleBuffer_GetBuffer(PyObject *); +/* Release the PickleBuffer. Returns 0 on success, -1 on error. */ +PyAPI_FUNC(int) PyPickleBuffer_Release(PyObject *); + +#endif /* !Py_LIMITED_API */ + +#ifdef __cplusplus +} +#endif +#endif /* !Py_PICKLEBUFOBJECT_H */ diff --git a/Lib/pickle.py b/Lib/pickle.py index be8e381..cb768b2 100644 --- a/Lib/pickle.py +++ b/Lib/pickle.py @@ -36,8 +36,10 @@ import io import codecs import _compat_pickle +from _pickle import PickleBuffer + __all__ = ["PickleError", "PicklingError", "UnpicklingError", "Pickler", - "Unpickler", "dump", "dumps", "load", "loads"] + "Unpickler", "dump", "dumps", "load", "loads", "PickleBuffer"] # Shortcut for use in isinstance testing bytes_types = (bytes, bytearray) @@ -51,10 +53,11 @@ compatible_formats = ["1.0", # Original protocol 0 "2.0", # Protocol 2 "3.0", # Protocol 3 "4.0", # Protocol 4 + "5.0", # Protocol 5 ] # Old format versions we can read # This is the highest protocol number we know how to read. -HIGHEST_PROTOCOL = 4 +HIGHEST_PROTOCOL = 5 # The protocol we write by default. May be less than HIGHEST_PROTOCOL. # Only bump this if the oldest still supported version of Python already @@ -167,6 +170,7 @@ BINBYTES = b'B' # push bytes; counted binary string argument SHORT_BINBYTES = b'C' # " " ; " " " " < 256 bytes # Protocol 4 + SHORT_BINUNICODE = b'\x8c' # push short string; UTF-8 length < 256 bytes BINUNICODE8 = b'\x8d' # push very long string BINBYTES8 = b'\x8e' # push very long bytes string @@ -178,6 +182,12 @@ STACK_GLOBAL = b'\x93' # same as GLOBAL but using names on the stacks MEMOIZE = b'\x94' # store top of the stack in memo FRAME = b'\x95' # indicate the beginning of a new frame +# Protocol 5 + +BYTEARRAY8 = b'\x96' # push bytearray +NEXT_BUFFER = b'\x97' # push next out-of-band buffer +READONLY_BUFFER = b'\x98' # make top of stack readonly + __all__.extend([x for x in dir() if re.match("[A-Z][A-Z0-9_]+$", x)]) @@ -251,6 +261,23 @@ class _Unframer: self.file_readline = file_readline self.current_frame = None + def readinto(self, buf): + if self.current_frame: + n = self.current_frame.readinto(buf) + if n == 0 and len(buf) != 0: + self.current_frame = None + n = len(buf) + buf[:] = self.file_read(n) + return n + if n < len(buf): + raise UnpicklingError( + "pickle exhausted before end of frame") + return n + else: + n = len(buf) + buf[:] = self.file_read(n) + return n + def read(self, n): if self.current_frame: data = self.current_frame.read(n) @@ -371,7 +398,8 @@ def decode_long(data): class _Pickler: - def __init__(self, file, protocol=None, *, fix_imports=True): + def __init__(self, file, protocol=None, *, fix_imports=True, + buffer_callback=None): """This takes a binary file for writing a pickle data stream. The optional *protocol* argument tells the pickler to use the @@ -393,6 +421,17 @@ class _Pickler: will try to map the new Python 3 names to the old module names used in Python 2, so that the pickle data stream is readable with Python 2. + + If *buffer_callback* is None (the default), buffer views are + serialized into *file* as part of the pickle stream. + + If *buffer_callback* is not None, then it can be called any number + of times with a buffer view. If the callback returns a false value + (such as None), the given buffer is out-of-band; otherwise the + buffer is serialized in-band, i.e. inside the pickle stream. + + It is an error if *buffer_callback* is not None and *protocol* + is None or smaller than 5. """ if protocol is None: protocol = DEFAULT_PROTOCOL @@ -400,6 +439,9 @@ class _Pickler: protocol = HIGHEST_PROTOCOL elif not 0 <= protocol <= HIGHEST_PROTOCOL: raise ValueError("pickle protocol must be <= %d" % HIGHEST_PROTOCOL) + if buffer_callback is not None and protocol < 5: + raise ValueError("buffer_callback needs protocol >= 5") + self._buffer_callback = buffer_callback try: self._file_write = file.write except AttributeError: @@ -756,6 +798,46 @@ class _Pickler: self.memoize(obj) dispatch[bytes] = save_bytes + def save_bytearray(self, obj): + if self.proto < 5: + if not obj: # bytearray is empty + self.save_reduce(bytearray, (), obj=obj) + else: + self.save_reduce(bytearray, (bytes(obj),), obj=obj) + return + n = len(obj) + if n >= self.framer._FRAME_SIZE_TARGET: + self._write_large_bytes(BYTEARRAY8 + pack("<Q", n), obj) + else: + self.write(BYTEARRAY8 + pack("<Q", n) + obj) + dispatch[bytearray] = save_bytearray + + def save_picklebuffer(self, obj): + if self.proto < 5: + raise PicklingError("PickleBuffer can only pickled with " + "protocol >= 5") + with obj.raw() as m: + if not m.contiguous: + raise PicklingError("PickleBuffer can not be pickled when " + "pointing to a non-contiguous buffer") + in_band = True + if self._buffer_callback is not None: + in_band = bool(self._buffer_callback(obj)) + if in_band: + # Write data in-band + # XXX The C implementation avoids a copy here + if m.readonly: + self.save_bytes(m.tobytes()) + else: + self.save_bytearray(m.tobytes()) + else: + # Write data out-of-band + self.write(NEXT_BUFFER) + if m.readonly: + self.write(READONLY_BUFFER) + + dispatch[PickleBuffer] = save_picklebuffer + def save_str(self, obj): if self.bin: encoded = obj.encode('utf-8', 'surrogatepass') @@ -1042,7 +1124,7 @@ class _Pickler: class _Unpickler: def __init__(self, file, *, fix_imports=True, - encoding="ASCII", errors="strict"): + encoding="ASCII", errors="strict", buffers=None): """This takes a binary file for reading a pickle data stream. The protocol version of the pickle is detected automatically, so @@ -1061,7 +1143,17 @@ class _Unpickler: reading, a BytesIO object, or any other custom object that meets this interface. - Optional keyword arguments are *fix_imports*, *encoding* and + If *buffers* is not None, it should be an iterable of buffer-enabled + objects that is consumed each time the pickle stream references + an out-of-band buffer view. Such buffers have been given in order + to the *buffer_callback* of a Pickler object. + + If *buffers* is None (the default), then the buffers are taken + from the pickle stream, assuming they are serialized there. + It is an error for *buffers* to be None if the pickle stream + was produced with a non-None *buffer_callback*. + + Other optional arguments are *fix_imports*, *encoding* and *errors*, which are used to control compatibility support for pickle stream generated by Python 2. If *fix_imports* is True, pickle will try to map the old Python 2 names to the new names @@ -1070,6 +1162,7 @@ class _Unpickler: default to 'ASCII' and 'strict', respectively. *encoding* can be 'bytes' to read theses 8-bit string instances as bytes objects. """ + self._buffers = iter(buffers) if buffers is not None else None self._file_readline = file.readline self._file_read = file.read self.memo = {} @@ -1090,6 +1183,7 @@ class _Unpickler: "%s.__init__()" % (self.__class__.__name__,)) self._unframer = _Unframer(self._file_read, self._file_readline) self.read = self._unframer.read + self.readinto = self._unframer.readinto self.readline = self._unframer.readline self.metastack = [] self.stack = [] @@ -1276,6 +1370,34 @@ class _Unpickler: self.append(self.read(len)) dispatch[BINBYTES8[0]] = load_binbytes8 + def load_bytearray8(self): + len, = unpack('<Q', self.read(8)) + if len > maxsize: + raise UnpicklingError("BYTEARRAY8 exceeds system's maximum size " + "of %d bytes" % maxsize) + b = bytearray(len) + self.readinto(b) + self.append(b) + dispatch[BYTEARRAY8[0]] = load_bytearray8 + + def load_next_buffer(self): + if self._buffers is None: + raise UnpicklingError("pickle stream refers to out-of-band data " + "but no *buffers* argument was given") + try: + buf = next(self._buffers) + except StopIteration: + raise UnpicklingError("not enough out-of-band buffers") + self.append(buf) + dispatch[NEXT_BUFFER[0]] = load_next_buffer + + def load_readonly_buffer(self): + buf = self.stack[-1] + with memoryview(buf) as m: + if not m.readonly: + self.stack[-1] = m.toreadonly() + dispatch[READONLY_BUFFER[0]] = load_readonly_buffer + def load_short_binstring(self): len = self.read(1)[0] data = self.read(len) @@ -1600,25 +1722,29 @@ class _Unpickler: # Shorthands -def _dump(obj, file, protocol=None, *, fix_imports=True): - _Pickler(file, protocol, fix_imports=fix_imports).dump(obj) +def _dump(obj, file, protocol=None, *, fix_imports=True, buffer_callback=None): + _Pickler(file, protocol, fix_imports=fix_imports, + buffer_callback=buffer_callback).dump(obj) -def _dumps(obj, protocol=None, *, fix_imports=True): +def _dumps(obj, protocol=None, *, fix_imports=True, buffer_callback=None): f = io.BytesIO() - _Pickler(f, protocol, fix_imports=fix_imports).dump(obj) + _Pickler(f, protocol, fix_imports=fix_imports, + buffer_callback=buffer_callback).dump(obj) res = f.getvalue() assert isinstance(res, bytes_types) return res -def _load(file, *, fix_imports=True, encoding="ASCII", errors="strict"): - return _Unpickler(file, fix_imports=fix_imports, +def _load(file, *, fix_imports=True, encoding="ASCII", errors="strict", + buffers=None): + return _Unpickler(file, fix_imports=fix_imports, buffers=buffers, encoding=encoding, errors=errors).load() -def _loads(s, *, fix_imports=True, encoding="ASCII", errors="strict"): +def _loads(s, *, fix_imports=True, encoding="ASCII", errors="strict", + buffers=None): if isinstance(s, str): raise TypeError("Can't load pickle from unicode string") file = io.BytesIO(s) - return _Unpickler(file, fix_imports=fix_imports, + return _Unpickler(file, fix_imports=fix_imports, buffers=buffers, encoding=encoding, errors=errors).load() # Use the faster _pickle if possible diff --git a/Lib/pickletools.py b/Lib/pickletools.py index ed8bee3..95706e7 100644 --- a/Lib/pickletools.py +++ b/Lib/pickletools.py @@ -565,6 +565,41 @@ bytes8 = ArgumentDescriptor( the number of bytes, and the second argument is that many bytes. """) + +def read_bytearray8(f): + r""" + >>> import io, struct, sys + >>> read_bytearray8(io.BytesIO(b"\x00\x00\x00\x00\x00\x00\x00\x00abc")) + bytearray(b'') + >>> read_bytearray8(io.BytesIO(b"\x03\x00\x00\x00\x00\x00\x00\x00abcdef")) + bytearray(b'abc') + >>> bigsize8 = struct.pack("<Q", sys.maxsize//3) + >>> read_bytearray8(io.BytesIO(bigsize8 + b"abcdef")) #doctest: +ELLIPSIS + Traceback (most recent call last): + ... + ValueError: expected ... bytes in a bytearray8, but only 6 remain + """ + + n = read_uint8(f) + assert n >= 0 + if n > sys.maxsize: + raise ValueError("bytearray8 byte count > sys.maxsize: %d" % n) + data = f.read(n) + if len(data) == n: + return bytearray(data) + raise ValueError("expected %d bytes in a bytearray8, but only %d remain" % + (n, len(data))) + +bytearray8 = ArgumentDescriptor( + name="bytearray8", + n=TAKEN_FROM_ARGUMENT8U, + reader=read_bytearray8, + doc="""A counted bytearray. + + The first argument is an 8-byte little-endian unsigned int giving + the number of bytes, and the second argument is that many bytes. + """) + def read_unicodestringnl(f): r""" >>> import io @@ -970,6 +1005,11 @@ pybytes = StackObject( obtype=bytes, doc="A Python bytes object.") +pybytearray = StackObject( + name='bytearray', + obtype=bytearray, + doc="A Python bytearray object.") + pyunicode = StackObject( name='str', obtype=str, @@ -1005,6 +1045,11 @@ pyfrozenset = StackObject( obtype=set, doc="A Python frozenset object.") +pybuffer = StackObject( + name='buffer', + obtype=object, + doc="A Python buffer-like object.") + anyobject = StackObject( name='any', obtype=object, @@ -1265,7 +1310,7 @@ opcodes = [ object instead. """), - # Bytes (protocol 3 only; older protocols don't support bytes at all) + # Bytes (protocol 3 and higher) I(name='BINBYTES', code='B', @@ -1306,6 +1351,39 @@ opcodes = [ which are taken literally as the string content. """), + # Bytearray (protocol 5 and higher) + + I(name='BYTEARRAY8', + code='\x96', + arg=bytearray8, + stack_before=[], + stack_after=[pybytearray], + proto=5, + doc="""Push a Python bytearray object. + + There are two arguments: the first is an 8-byte unsigned int giving + the number of bytes in the bytearray, and the second is that many bytes, + which are taken literally as the bytearray content. + """), + + # Out-of-band buffer (protocol 5 and higher) + + I(name='NEXT_BUFFER', + code='\x97', + arg=None, + stack_before=[], + stack_after=[pybuffer], + proto=5, + doc="Push an out-of-band buffer object."), + + I(name='READONLY_BUFFER', + code='\x98', + arg=None, + stack_before=[pybuffer], + stack_after=[pybuffer], + proto=5, + doc="Make an out-of-band buffer object read-only."), + # Ways to spell None. I(name='NONE', diff --git a/Lib/test/pickletester.py b/Lib/test/pickletester.py index 4f8c294..f6fda9e 100644 --- a/Lib/test/pickletester.py +++ b/Lib/test/pickletester.py @@ -16,6 +16,16 @@ import weakref from textwrap import dedent from http.cookies import SimpleCookie +try: + import _testbuffer +except ImportError: + _testbuffer = None + +try: + import numpy as np +except ImportError: + np = None + from test import support from test.support import ( TestFailed, TESTFN, run_with_locale, no_tracing, @@ -162,6 +172,139 @@ def create_dynamic_class(name, bases): result.reduce_args = (name, bases) return result + +class ZeroCopyBytes(bytes): + readonly = True + c_contiguous = True + f_contiguous = True + zero_copy_reconstruct = True + + def __reduce_ex__(self, protocol): + if protocol >= 5: + return type(self)._reconstruct, (pickle.PickleBuffer(self),), None + else: + return type(self)._reconstruct, (bytes(self),) + + def __repr__(self): + return "{}({!r})".format(self.__class__.__name__, bytes(self)) + + __str__ = __repr__ + + @classmethod + def _reconstruct(cls, obj): + with memoryview(obj) as m: + obj = m.obj + if type(obj) is cls: + # Zero-copy + return obj + else: + return cls(obj) + + +class ZeroCopyBytearray(bytearray): + readonly = False + c_contiguous = True + f_contiguous = True + zero_copy_reconstruct = True + + def __reduce_ex__(self, protocol): + if protocol >= 5: + return type(self)._reconstruct, (pickle.PickleBuffer(self),), None + else: + return type(self)._reconstruct, (bytes(self),) + + def __repr__(self): + return "{}({!r})".format(self.__class__.__name__, bytes(self)) + + __str__ = __repr__ + + @classmethod + def _reconstruct(cls, obj): + with memoryview(obj) as m: + obj = m.obj + if type(obj) is cls: + # Zero-copy + return obj + else: + return cls(obj) + + +if _testbuffer is not None: + + class PicklableNDArray: + # A not-really-zero-copy picklable ndarray, as the ndarray() + # constructor doesn't allow for it + + zero_copy_reconstruct = False + + def __init__(self, *args, **kwargs): + self.array = _testbuffer.ndarray(*args, **kwargs) + + def __getitem__(self, idx): + cls = type(self) + new = cls.__new__(cls) + new.array = self.array[idx] + return new + + @property + def readonly(self): + return self.array.readonly + + @property + def c_contiguous(self): + return self.array.c_contiguous + + @property + def f_contiguous(self): + return self.array.f_contiguous + + def __eq__(self, other): + if not isinstance(other, PicklableNDArray): + return NotImplemented + return (other.array.format == self.array.format and + other.array.shape == self.array.shape and + other.array.strides == self.array.strides and + other.array.readonly == self.array.readonly and + other.array.tobytes() == self.array.tobytes()) + + def __ne__(self, other): + if not isinstance(other, PicklableNDArray): + return NotImplemented + return not (self == other) + + def __repr__(self): + return (f"{type(self)}(shape={self.array.shape}," + f"strides={self.array.strides}, " + f"bytes={self.array.tobytes()})") + + def __reduce_ex__(self, protocol): + if not self.array.contiguous: + raise NotImplementedError("Reconstructing a non-contiguous " + "ndarray does not seem possible") + ndarray_kwargs = {"shape": self.array.shape, + "strides": self.array.strides, + "format": self.array.format, + "flags": (0 if self.readonly + else _testbuffer.ND_WRITABLE)} + pb = pickle.PickleBuffer(self.array) + if protocol >= 5: + return (type(self)._reconstruct, + (pb, ndarray_kwargs)) + else: + # Need to serialize the bytes in physical order + with pb.raw() as m: + return (type(self)._reconstruct, + (m.tobytes(), ndarray_kwargs)) + + @classmethod + def _reconstruct(cls, obj, kwargs): + with memoryview(obj) as m: + # For some reason, ndarray() wants a list of integers... + # XXX This only works if format == 'B' + items = list(m.tobytes()) + return cls(items, **kwargs) + + # DATA0 .. DATA4 are the pickles we expect under the various protocols, for # the object returned by create_data(). @@ -888,6 +1031,10 @@ class AbstractUnpickleTests(unittest.TestCase): dumped = b'\x80\x04\x8d\4\0\0\0\0\0\0\0\xe2\x82\xac\x00.' self.assertEqual(self.loads(dumped), '\u20ac\x00') + def test_bytearray8(self): + dumped = b'\x80\x05\x96\x03\x00\x00\x00\x00\x00\x00\x00xxx.' + self.assertEqual(self.loads(dumped), bytearray(b'xxx')) + @requires_32b def test_large_32b_binbytes8(self): dumped = b'\x80\x04\x8e\4\0\0\0\1\0\0\0\xe2\x82\xac\x00.' @@ -895,6 +1042,12 @@ class AbstractUnpickleTests(unittest.TestCase): dumped) @requires_32b + def test_large_32b_bytearray8(self): + dumped = b'\x80\x05\x96\4\0\0\0\1\0\0\0\xe2\x82\xac\x00.' + self.check_unpickling_error((pickle.UnpicklingError, OverflowError), + dumped) + + @requires_32b def test_large_32b_binunicode8(self): dumped = b'\x80\x04\x8d\4\0\0\0\1\0\0\0\xe2\x82\xac\x00.' self.check_unpickling_error((pickle.UnpicklingError, OverflowError), @@ -1171,6 +1324,10 @@ class AbstractUnpickleTests(unittest.TestCase): b'\x8e\x03\x00\x00\x00\x00\x00\x00', b'\x8e\x03\x00\x00\x00\x00\x00\x00\x00', b'\x8e\x03\x00\x00\x00\x00\x00\x00\x00ab', + b'\x96', # BYTEARRAY8 + b'\x96\x03\x00\x00\x00\x00\x00\x00', + b'\x96\x03\x00\x00\x00\x00\x00\x00\x00', + b'\x96\x03\x00\x00\x00\x00\x00\x00\x00ab', b'\x95', # FRAME b'\x95\x02\x00\x00\x00\x00\x00\x00', b'\x95\x02\x00\x00\x00\x00\x00\x00\x00', @@ -1482,6 +1639,25 @@ class AbstractPickleTests(unittest.TestCase): p = self.dumps(s, proto) self.assert_is_copy(s, self.loads(p)) + def test_bytearray(self): + for proto in protocols: + for s in b'', b'xyz', b'xyz'*100: + b = bytearray(s) + p = self.dumps(b, proto) + bb = self.loads(p) + self.assertIsNot(bb, b) + self.assert_is_copy(b, bb) + if proto <= 3: + # bytearray is serialized using a global reference + self.assertIn(b'bytearray', p) + self.assertTrue(opcode_in_pickle(pickle.GLOBAL, p)) + elif proto == 4: + self.assertIn(b'bytearray', p) + self.assertTrue(opcode_in_pickle(pickle.STACK_GLOBAL, p)) + elif proto == 5: + self.assertNotIn(b'bytearray', p) + self.assertTrue(opcode_in_pickle(pickle.BYTEARRAY8, p)) + def test_ints(self): for proto in protocols: n = sys.maxsize @@ -2114,7 +2290,8 @@ class AbstractPickleTests(unittest.TestCase): the following consistency check. """ frame_end = frameless_start = None - frameless_opcodes = {'BINBYTES', 'BINUNICODE', 'BINBYTES8', 'BINUNICODE8'} + frameless_opcodes = {'BINBYTES', 'BINUNICODE', 'BINBYTES8', + 'BINUNICODE8', 'BYTEARRAY8'} for op, arg, pos in pickletools.genops(pickled): if frame_end is not None: self.assertLessEqual(pos, frame_end) @@ -2225,19 +2402,20 @@ class AbstractPickleTests(unittest.TestCase): num_frames = 20 # Large byte objects (dict values) intermittent with small objects # (dict keys) - obj = {i: bytes([i]) * frame_size for i in range(num_frames)} + for bytes_type in (bytes, bytearray): + obj = {i: bytes_type([i]) * frame_size for i in range(num_frames)} - for proto in range(4, pickle.HIGHEST_PROTOCOL + 1): - pickled = self.dumps(obj, proto) + for proto in range(4, pickle.HIGHEST_PROTOCOL + 1): + pickled = self.dumps(obj, proto) - frameless_pickle = remove_frames(pickled) - self.assertEqual(count_opcode(pickle.FRAME, frameless_pickle), 0) - self.assertEqual(obj, self.loads(frameless_pickle)) + frameless_pickle = remove_frames(pickled) + self.assertEqual(count_opcode(pickle.FRAME, frameless_pickle), 0) + self.assertEqual(obj, self.loads(frameless_pickle)) - some_frames_pickle = remove_frames(pickled, lambda i: i % 2) - self.assertLess(count_opcode(pickle.FRAME, some_frames_pickle), - count_opcode(pickle.FRAME, pickled)) - self.assertEqual(obj, self.loads(some_frames_pickle)) + some_frames_pickle = remove_frames(pickled, lambda i: i % 2) + self.assertLess(count_opcode(pickle.FRAME, some_frames_pickle), + count_opcode(pickle.FRAME, pickled)) + self.assertEqual(obj, self.loads(some_frames_pickle)) def test_framed_write_sizes_with_delayed_writer(self): class ChunkAccumulator: @@ -2452,6 +2630,186 @@ class AbstractPickleTests(unittest.TestCase): with self.assertRaises((AttributeError, pickle.PicklingError)): pickletools.dis(self.dumps(f, proto)) + # + # PEP 574 tests below + # + + def buffer_like_objects(self): + # Yield buffer-like objects with the bytestring "abcdef" in them + bytestring = b"abcdefgh" + yield ZeroCopyBytes(bytestring) + yield ZeroCopyBytearray(bytestring) + if _testbuffer is not None: + items = list(bytestring) + value = int.from_bytes(bytestring, byteorder='little') + for flags in (0, _testbuffer.ND_WRITABLE): + # 1-D, contiguous + yield PicklableNDArray(items, format='B', shape=(8,), + flags=flags) + # 2-D, C-contiguous + yield PicklableNDArray(items, format='B', shape=(4, 2), + strides=(2, 1), flags=flags) + # 2-D, Fortran-contiguous + yield PicklableNDArray(items, format='B', + shape=(4, 2), strides=(1, 4), + flags=flags) + + def test_in_band_buffers(self): + # Test in-band buffers (PEP 574) + for obj in self.buffer_like_objects(): + for proto in range(0, pickle.HIGHEST_PROTOCOL + 1): + data = self.dumps(obj, proto) + if obj.c_contiguous and proto >= 5: + # The raw memory bytes are serialized in physical order + self.assertIn(b"abcdefgh", data) + self.assertEqual(count_opcode(pickle.NEXT_BUFFER, data), 0) + if proto >= 5: + self.assertEqual(count_opcode(pickle.SHORT_BINBYTES, data), + 1 if obj.readonly else 0) + self.assertEqual(count_opcode(pickle.BYTEARRAY8, data), + 0 if obj.readonly else 1) + # Return a true value from buffer_callback should have + # the same effect + def buffer_callback(obj): + return True + data2 = self.dumps(obj, proto, + buffer_callback=buffer_callback) + self.assertEqual(data2, data) + + new = self.loads(data) + # It's a copy + self.assertIsNot(new, obj) + self.assertIs(type(new), type(obj)) + self.assertEqual(new, obj) + + # XXX Unfortunately cannot test non-contiguous array + # (see comment in PicklableNDArray.__reduce_ex__) + + def test_oob_buffers(self): + # Test out-of-band buffers (PEP 574) + for obj in self.buffer_like_objects(): + for proto in range(0, 5): + # Need protocol >= 5 for buffer_callback + with self.assertRaises(ValueError): + self.dumps(obj, proto, + buffer_callback=[].append) + for proto in range(5, pickle.HIGHEST_PROTOCOL + 1): + buffers = [] + buffer_callback = lambda pb: buffers.append(pb.raw()) + data = self.dumps(obj, proto, + buffer_callback=buffer_callback) + self.assertNotIn(b"abcdefgh", data) + self.assertEqual(count_opcode(pickle.SHORT_BINBYTES, data), 0) + self.assertEqual(count_opcode(pickle.BYTEARRAY8, data), 0) + self.assertEqual(count_opcode(pickle.NEXT_BUFFER, data), 1) + self.assertEqual(count_opcode(pickle.READONLY_BUFFER, data), + 1 if obj.readonly else 0) + + if obj.c_contiguous: + self.assertEqual(bytes(buffers[0]), b"abcdefgh") + # Need buffers argument to unpickle properly + with self.assertRaises(pickle.UnpicklingError): + self.loads(data) + + new = self.loads(data, buffers=buffers) + if obj.zero_copy_reconstruct: + # Zero-copy achieved + self.assertIs(new, obj) + else: + self.assertIs(type(new), type(obj)) + self.assertEqual(new, obj) + # Non-sequence buffers accepted too + new = self.loads(data, buffers=iter(buffers)) + if obj.zero_copy_reconstruct: + # Zero-copy achieved + self.assertIs(new, obj) + else: + self.assertIs(type(new), type(obj)) + self.assertEqual(new, obj) + + def test_oob_buffers_writable_to_readonly(self): + # Test reconstructing readonly object from writable buffer + obj = ZeroCopyBytes(b"foobar") + for proto in range(5, pickle.HIGHEST_PROTOCOL + 1): + buffers = [] + buffer_callback = buffers.append + data = self.dumps(obj, proto, buffer_callback=buffer_callback) + + buffers = map(bytearray, buffers) + new = self.loads(data, buffers=buffers) + self.assertIs(type(new), type(obj)) + self.assertEqual(new, obj) + + def test_picklebuffer_error(self): + # PickleBuffer forbidden with protocol < 5 + pb = pickle.PickleBuffer(b"foobar") + for proto in range(0, 5): + with self.assertRaises(pickle.PickleError): + self.dumps(pb, proto) + + def test_buffer_callback_error(self): + def buffer_callback(buffers): + 1/0 + pb = pickle.PickleBuffer(b"foobar") + with self.assertRaises(ZeroDivisionError): + self.dumps(pb, 5, buffer_callback=buffer_callback) + + def test_buffers_error(self): + pb = pickle.PickleBuffer(b"foobar") + for proto in range(5, pickle.HIGHEST_PROTOCOL + 1): + data = self.dumps(pb, proto, buffer_callback=[].append) + # Non iterable buffers + with self.assertRaises(TypeError): + self.loads(data, buffers=object()) + # Buffer iterable exhausts too early + with self.assertRaises(pickle.UnpicklingError): + self.loads(data, buffers=[]) + + @unittest.skipIf(np is None, "Test needs Numpy") + def test_buffers_numpy(self): + def check_no_copy(x, y): + np.testing.assert_equal(x, y) + self.assertEqual(x.ctypes.data, y.ctypes.data) + + def check_copy(x, y): + np.testing.assert_equal(x, y) + self.assertNotEqual(x.ctypes.data, y.ctypes.data) + + def check_array(arr): + # In-band + for proto in range(0, pickle.HIGHEST_PROTOCOL + 1): + data = self.dumps(arr, proto) + new = self.loads(data) + check_copy(arr, new) + for proto in range(5, pickle.HIGHEST_PROTOCOL + 1): + buffer_callback = lambda _: True + data = self.dumps(arr, proto, buffer_callback=buffer_callback) + new = self.loads(data) + check_copy(arr, new) + # Out-of-band + for proto in range(5, pickle.HIGHEST_PROTOCOL + 1): + buffers = [] + buffer_callback = buffers.append + data = self.dumps(arr, proto, buffer_callback=buffer_callback) + new = self.loads(data, buffers=buffers) + if arr.flags.c_contiguous or arr.flags.f_contiguous: + check_no_copy(arr, new) + else: + check_copy(arr, new) + + # 1-D + arr = np.arange(6) + check_array(arr) + # 1-D, non-contiguous + check_array(arr[::2]) + # 2-D, C-contiguous + arr = np.arange(12).reshape((3, 4)) + check_array(arr) + # 2-D, F-contiguous + check_array(arr.T) + # 2-D, non-contiguous + check_array(arr[::2]) + class BigmemPickleTests(unittest.TestCase): @@ -2736,7 +3094,7 @@ class AbstractPickleModuleTests(unittest.TestCase): def test_highest_protocol(self): # Of course this needs to be changed when HIGHEST_PROTOCOL changes. - self.assertEqual(pickle.HIGHEST_PROTOCOL, 4) + self.assertEqual(pickle.HIGHEST_PROTOCOL, 5) def test_callapi(self): f = io.BytesIO() @@ -2760,6 +3118,47 @@ class AbstractPickleModuleTests(unittest.TestCase): self.assertRaises(pickle.PicklingError, BadPickler().dump, 0) self.assertRaises(pickle.UnpicklingError, BadUnpickler().load) + def check_dumps_loads_oob_buffers(self, dumps, loads): + # No need to do the full gamut of tests here, just enough to + # check that dumps() and loads() redirect their arguments + # to the underlying Pickler and Unpickler, respectively. + obj = ZeroCopyBytes(b"foo") + + for proto in range(0, 5): + # Need protocol >= 5 for buffer_callback + with self.assertRaises(ValueError): + dumps(obj, protocol=proto, + buffer_callback=[].append) + for proto in range(5, pickle.HIGHEST_PROTOCOL + 1): + buffers = [] + buffer_callback = buffers.append + data = dumps(obj, protocol=proto, + buffer_callback=buffer_callback) + self.assertNotIn(b"foo", data) + self.assertEqual(bytes(buffers[0]), b"foo") + # Need buffers argument to unpickle properly + with self.assertRaises(pickle.UnpicklingError): + loads(data) + new = loads(data, buffers=buffers) + self.assertIs(new, obj) + + def test_dumps_loads_oob_buffers(self): + # Test out-of-band buffers (PEP 574) with top-level dumps() and loads() + self.check_dumps_loads_oob_buffers(self.dumps, self.loads) + + def test_dump_load_oob_buffers(self): + # Test out-of-band buffers (PEP 574) with top-level dump() and load() + def dumps(obj, **kwargs): + f = io.BytesIO() + self.dump(obj, f, **kwargs) + return f.getvalue() + + def loads(data, **kwargs): + f = io.BytesIO(data) + return self.load(f, **kwargs) + + self.check_dumps_loads_oob_buffers(dumps, loads) + class AbstractPersistentPicklerTests(unittest.TestCase): diff --git a/Lib/test/test_inspect.py b/Lib/test/test_inspect.py index be52b38..b3aae3a 100644 --- a/Lib/test/test_inspect.py +++ b/Lib/test/test_inspect.py @@ -2894,16 +2894,15 @@ class TestSignatureObject(unittest.TestCase): @unittest.skipIf(MISSING_C_DOCSTRINGS, "Signature information for builtins requires docstrings") def test_signature_on_builtin_class(self): - self.assertEqual(str(inspect.signature(_pickle.Pickler)), - '(file, protocol=None, fix_imports=True)') + expected = ('(file, protocol=None, fix_imports=True, ' + 'buffer_callback=None)') + self.assertEqual(str(inspect.signature(_pickle.Pickler)), expected) class P(_pickle.Pickler): pass class EmptyTrait: pass class P2(EmptyTrait, P): pass - self.assertEqual(str(inspect.signature(P)), - '(file, protocol=None, fix_imports=True)') - self.assertEqual(str(inspect.signature(P2)), - '(file, protocol=None, fix_imports=True)') + self.assertEqual(str(inspect.signature(P)), expected) + self.assertEqual(str(inspect.signature(P2)), expected) class P3(P2): def __init__(self, spam): diff --git a/Lib/test/test_pickle.py b/Lib/test/test_pickle.py index 435c248..5f7a879 100644 --- a/Lib/test/test_pickle.py +++ b/Lib/test/test_pickle.py @@ -57,9 +57,9 @@ class PyPicklerTests(AbstractPickleTests): pickler = pickle._Pickler unpickler = pickle._Unpickler - def dumps(self, arg, proto=None): + def dumps(self, arg, proto=None, **kwargs): f = io.BytesIO() - p = self.pickler(f, proto) + p = self.pickler(f, proto, **kwargs) p.dump(arg) f.seek(0) return bytes(f.read()) @@ -78,8 +78,8 @@ class InMemoryPickleTests(AbstractPickleTests, AbstractUnpickleTests, AttributeError, ValueError, struct.error, IndexError, ImportError) - def dumps(self, arg, protocol=None): - return pickle.dumps(arg, protocol) + def dumps(self, arg, protocol=None, **kwargs): + return pickle.dumps(arg, protocol, **kwargs) def loads(self, buf, **kwds): return pickle.loads(buf, **kwds) @@ -271,7 +271,7 @@ if has_c_implementation: check_sizeof = support.check_sizeof def test_pickler(self): - basesize = support.calcobjsize('6P2n3i2n3i2P') + basesize = support.calcobjsize('7P2n3i2n3i2P') p = _pickle.Pickler(io.BytesIO()) self.assertEqual(object.__sizeof__(p), basesize) MT_size = struct.calcsize('3nP0n') @@ -288,7 +288,7 @@ if has_c_implementation: 0) # Write buffer is cleared after every dump(). def test_unpickler(self): - basesize = support.calcobjsize('2P2n2P 2P2n2i5P 2P3n6P2n2i') + basesize = support.calcobjsize('2P2n2P 2P2n2i5P 2P3n8P2n2i') unpickler = _pickle.Unpickler P = struct.calcsize('P') # Size of memo table entry. n = struct.calcsize('n') # Size of mark table entry. diff --git a/Lib/test/test_picklebuffer.py b/Lib/test/test_picklebuffer.py new file mode 100644 index 0000000..7e72157 --- /dev/null +++ b/Lib/test/test_picklebuffer.py @@ -0,0 +1,154 @@ +"""Unit tests for the PickleBuffer object. + +Pickling tests themselves are in pickletester.py. +""" + +import gc +from pickle import PickleBuffer +import sys +import weakref +import unittest + +from test import support + + +class B(bytes): + pass + + +class PickleBufferTest(unittest.TestCase): + + def check_memoryview(self, pb, equiv): + with memoryview(pb) as m: + with memoryview(equiv) as expected: + self.assertEqual(m.nbytes, expected.nbytes) + self.assertEqual(m.readonly, expected.readonly) + self.assertEqual(m.itemsize, expected.itemsize) + self.assertEqual(m.shape, expected.shape) + self.assertEqual(m.strides, expected.strides) + self.assertEqual(m.c_contiguous, expected.c_contiguous) + self.assertEqual(m.f_contiguous, expected.f_contiguous) + self.assertEqual(m.format, expected.format) + self.assertEqual(m.tobytes(), expected.tobytes()) + + def test_constructor_failure(self): + with self.assertRaises(TypeError): + PickleBuffer() + with self.assertRaises(TypeError): + PickleBuffer("foo") + # Released memoryview fails taking a buffer + m = memoryview(b"foo") + m.release() + with self.assertRaises(ValueError): + PickleBuffer(m) + + def test_basics(self): + pb = PickleBuffer(b"foo") + self.assertEqual(b"foo", bytes(pb)) + with memoryview(pb) as m: + self.assertTrue(m.readonly) + + pb = PickleBuffer(bytearray(b"foo")) + self.assertEqual(b"foo", bytes(pb)) + with memoryview(pb) as m: + self.assertFalse(m.readonly) + m[0] = 48 + self.assertEqual(b"0oo", bytes(pb)) + + def test_release(self): + pb = PickleBuffer(b"foo") + pb.release() + with self.assertRaises(ValueError) as raises: + memoryview(pb) + self.assertIn("operation forbidden on released PickleBuffer object", + str(raises.exception)) + # Idempotency + pb.release() + + def test_cycle(self): + b = B(b"foo") + pb = PickleBuffer(b) + b.cycle = pb + wpb = weakref.ref(pb) + del b, pb + gc.collect() + self.assertIsNone(wpb()) + + def test_ndarray_2d(self): + # C-contiguous + ndarray = support.import_module("_testbuffer").ndarray + arr = ndarray(list(range(12)), shape=(4, 3), format='<i') + self.assertTrue(arr.c_contiguous) + self.assertFalse(arr.f_contiguous) + pb = PickleBuffer(arr) + self.check_memoryview(pb, arr) + # Non-contiguous + arr = arr[::2] + self.assertFalse(arr.c_contiguous) + self.assertFalse(arr.f_contiguous) + pb = PickleBuffer(arr) + self.check_memoryview(pb, arr) + # F-contiguous + arr = ndarray(list(range(12)), shape=(3, 4), strides=(4, 12), format='<i') + self.assertTrue(arr.f_contiguous) + self.assertFalse(arr.c_contiguous) + pb = PickleBuffer(arr) + self.check_memoryview(pb, arr) + + # Tests for PickleBuffer.raw() + + def check_raw(self, obj, equiv): + pb = PickleBuffer(obj) + with pb.raw() as m: + self.assertIsInstance(m, memoryview) + self.check_memoryview(m, equiv) + + def test_raw(self): + for obj in (b"foo", bytearray(b"foo")): + with self.subTest(obj=obj): + self.check_raw(obj, obj) + + def test_raw_ndarray(self): + # 1-D, contiguous + ndarray = support.import_module("_testbuffer").ndarray + arr = ndarray(list(range(3)), shape=(3,), format='<h') + equiv = b"\x00\x00\x01\x00\x02\x00" + self.check_raw(arr, equiv) + # 2-D, C-contiguous + arr = ndarray(list(range(6)), shape=(2, 3), format='<h') + equiv = b"\x00\x00\x01\x00\x02\x00\x03\x00\x04\x00\x05\x00" + self.check_raw(arr, equiv) + # 2-D, F-contiguous + arr = ndarray(list(range(6)), shape=(2, 3), strides=(2, 4), + format='<h') + # Note this is different from arr.tobytes() + equiv = b"\x00\x00\x01\x00\x02\x00\x03\x00\x04\x00\x05\x00" + self.check_raw(arr, equiv) + # 0-D + arr = ndarray(456, shape=(), format='<i') + equiv = b'\xc8\x01\x00\x00' + self.check_raw(arr, equiv) + + def check_raw_non_contiguous(self, obj): + pb = PickleBuffer(obj) + with self.assertRaisesRegex(BufferError, "non-contiguous"): + pb.raw() + + def test_raw_non_contiguous(self): + # 1-D + ndarray = support.import_module("_testbuffer").ndarray + arr = ndarray(list(range(6)), shape=(6,), format='<i')[::2] + self.check_raw_non_contiguous(arr) + # 2-D + arr = ndarray(list(range(12)), shape=(4, 3), format='<i')[::2] + self.check_raw_non_contiguous(arr) + + def test_raw_released(self): + pb = PickleBuffer(b"foo") + pb.release() + with self.assertRaises(ValueError) as raises: + pb.raw() + + +if __name__ == "__main__": + unittest.main() diff --git a/Lib/test/test_pickletools.py b/Lib/test/test_pickletools.py index e40a958..8cc6ca5 100644 --- a/Lib/test/test_pickletools.py +++ b/Lib/test/test_pickletools.py @@ -6,8 +6,8 @@ import unittest class OptimizedPickleTests(AbstractPickleTests): - def dumps(self, arg, proto=None): - return pickletools.optimize(pickle.dumps(arg, proto)) + def dumps(self, arg, proto=None, **kwargs): + return pickletools.optimize(pickle.dumps(arg, proto, **kwargs)) def loads(self, buf, **kwds): return pickle.loads(buf, **kwds) @@ -71,23 +71,24 @@ class MiscTestCase(unittest.TestCase): 'read_uint8', 'read_stringnl', 'read_stringnl_noescape', 'read_stringnl_noescape_pair', 'read_string1', 'read_string4', 'read_bytes1', 'read_bytes4', - 'read_bytes8', 'read_unicodestringnl', + 'read_bytes8', 'read_bytearray8', 'read_unicodestringnl', 'read_unicodestring1', 'read_unicodestring4', 'read_unicodestring8', 'read_decimalnl_short', 'read_decimalnl_long', 'read_floatnl', 'read_float8', 'read_long1', 'read_long4', 'uint1', 'uint2', 'int4', 'uint4', 'uint8', 'stringnl', 'stringnl_noescape', 'stringnl_noescape_pair', 'string1', - 'string4', 'bytes1', 'bytes4', 'bytes8', + 'string4', 'bytes1', 'bytes4', 'bytes8', 'bytearray8', 'unicodestringnl', 'unicodestring1', 'unicodestring4', 'unicodestring8', 'decimalnl_short', 'decimalnl_long', 'floatnl', 'float8', 'long1', 'long4', 'StackObject', 'pyint', 'pylong', 'pyinteger_or_bool', 'pybool', 'pyfloat', - 'pybytes_or_str', 'pystring', 'pybytes', 'pyunicode', - 'pynone', 'pytuple', 'pylist', 'pydict', 'pyset', - 'pyfrozenset', 'anyobject', 'markobject', 'stackslice', - 'OpcodeInfo', 'opcodes', 'code2op', + 'pybytes_or_str', 'pystring', 'pybytes', 'pybytearray', + 'pyunicode', 'pynone', 'pytuple', 'pylist', 'pydict', + 'pyset', 'pyfrozenset', 'pybuffer', 'anyobject', + 'markobject', 'stackslice', 'OpcodeInfo', 'opcodes', + 'code2op', } support.check__all__(self, pickletools, blacklist=blacklist) diff --git a/Lib/test/test_pyclbr.py b/Lib/test/test_pyclbr.py index 839c58f..0b3934f 100644 --- a/Lib/test/test_pyclbr.py +++ b/Lib/test/test_pyclbr.py @@ -224,7 +224,7 @@ class PyclbrTest(TestCase): # These were once about the 10 longest modules cm('random', ignore=('Random',)) # from _random import Random as CoreGenerator cm('cgi', ignore=('log',)) # set with = in module - cm('pickle', ignore=('partial',)) + cm('pickle', ignore=('partial', 'PickleBuffer')) # TODO(briancurtin): openfp is deprecated as of 3.7. # Update this once it has been removed. cm('aifc', ignore=('openfp', '_aifc_params')) # set with = in module diff --git a/Makefile.pre.in b/Makefile.pre.in index ee94ad5..8071a94 100644 --- a/Makefile.pre.in +++ b/Makefile.pre.in @@ -382,6 +382,7 @@ OBJECT_OBJS= \ Objects/bytearrayobject.o \ Objects/bytesobject.o \ Objects/call.o \ + Objects/capsule.o \ Objects/cellobject.o \ Objects/classobject.o \ Objects/codeobject.o \ @@ -406,7 +407,7 @@ OBJECT_OBJS= \ Objects/namespaceobject.o \ Objects/object.o \ Objects/obmalloc.o \ - Objects/capsule.o \ + Objects/picklebufobject.o \ Objects/rangeobject.o \ Objects/setobject.o \ Objects/sliceobject.o \ @@ -1009,6 +1010,7 @@ PYTHON_HEADERS= \ $(srcdir)/Include/osdefs.h \ $(srcdir)/Include/osmodule.h \ $(srcdir)/Include/patchlevel.h \ + $(srcdir)/Include/picklebufobject.h \ $(srcdir)/Include/pyarena.h \ $(srcdir)/Include/pycapsule.h \ $(srcdir)/Include/pyctype.h \ diff --git a/Misc/NEWS.d/next/Library/2019-05-03-20-47-55.bpo-36785.PQLnPq.rst b/Misc/NEWS.d/next/Library/2019-05-03-20-47-55.bpo-36785.PQLnPq.rst new file mode 100644 index 0000000..0a86054 --- /dev/null +++ b/Misc/NEWS.d/next/Library/2019-05-03-20-47-55.bpo-36785.PQLnPq.rst @@ -0,0 +1 @@ +Implement PEP 574 (pickle protocol 5 with out-of-band buffers). diff --git a/Modules/_pickle.c b/Modules/_pickle.c index 24a5d22..a3f02ae 100644 --- a/Modules/_pickle.c +++ b/Modules/_pickle.c @@ -27,7 +27,7 @@ class _pickle.UnpicklerMemoProxy "UnpicklerMemoProxyObject *" "&UnpicklerMemoPro Bump DEFAULT_PROTOCOL only when the oldest still supported version of Python already includes it. */ enum { - HIGHEST_PROTOCOL = 4, + HIGHEST_PROTOCOL = 5, DEFAULT_PROTOCOL = 4 }; @@ -104,7 +104,12 @@ enum opcode { NEWOBJ_EX = '\x92', STACK_GLOBAL = '\x93', MEMOIZE = '\x94', - FRAME = '\x95' + FRAME = '\x95', + + /* Protocol 5 */ + BYTEARRAY8 = '\x96', + NEXT_BUFFER = '\x97', + READONLY_BUFFER = '\x98' }; enum { @@ -643,6 +648,7 @@ typedef struct PicklerObject { int fix_imports; /* Indicate whether Pickler should fix the name of globals for Python 2.x. */ PyObject *fast_memo; + PyObject *buffer_callback; /* Callback for out-of-band buffers, or NULL */ } PicklerObject; typedef struct UnpicklerObject { @@ -667,8 +673,10 @@ typedef struct UnpicklerObject { Py_ssize_t prefetched_idx; /* index of first prefetched byte */ PyObject *read; /* read() method of the input stream. */ + PyObject *readinto; /* readinto() method of the input stream. */ PyObject *readline; /* readline() method of the input stream. */ PyObject *peek; /* peek() method of the input stream, or NULL */ + PyObject *buffers; /* iterable of out-of-band buffers, or NULL */ char *encoding; /* Name of the encoding to be used for decoding strings pickled using Python @@ -1102,6 +1110,7 @@ _Pickler_New(void) self->pers_func = NULL; self->dispatch_table = NULL; + self->buffer_callback = NULL; self->write = NULL; self->proto = 0; self->bin = 0; @@ -1174,6 +1183,23 @@ _Pickler_SetOutputStream(PicklerObject *self, PyObject *file) return 0; } +static int +_Pickler_SetBufferCallback(PicklerObject *self, PyObject *buffer_callback) +{ + if (buffer_callback == Py_None) { + buffer_callback = NULL; + } + if (buffer_callback != NULL && self->proto < 5) { + PyErr_SetString(PyExc_ValueError, + "buffer_callback needs protocol >= 5"); + return -1; + } + + Py_XINCREF(buffer_callback); + self->buffer_callback = buffer_callback; + return 0; +} + /* Returns the size of the input on success, -1 on failure. This takes its own reference to `input`. */ static Py_ssize_t @@ -1198,6 +1224,7 @@ bad_readline(void) return -1; } +/* Skip any consumed data that was only prefetched using peek() */ static int _Unpickler_SkipConsumed(UnpicklerObject *self) { @@ -1305,6 +1332,7 @@ _Unpickler_ReadImpl(UnpicklerObject *self, char **s, Py_ssize_t n) if (!self->read) return bad_readline(); + /* Extend the buffer to satisfy desired size */ num_read = _Unpickler_ReadFromFile(self, n); if (num_read < 0) return -1; @@ -1315,6 +1343,66 @@ _Unpickler_ReadImpl(UnpicklerObject *self, char **s, Py_ssize_t n) return n; } +/* Read `n` bytes from the unpickler's data source, storing the result in `buf`. + * + * This should only be used for non-small data reads where potentially + * avoiding a copy is beneficial. This method does not try to prefetch + * more data into the input buffer. + * + * _Unpickler_Read() is recommended in most cases. + */ +static Py_ssize_t +_Unpickler_ReadInto(UnpicklerObject *self, char *buf, Py_ssize_t n) +{ + assert(n != READ_WHOLE_LINE); + + /* Read from available buffer data, if any */ + Py_ssize_t in_buffer = self->input_len - self->next_read_idx; + if (in_buffer > 0) { + Py_ssize_t to_read = Py_MIN(in_buffer, n); + memcpy(buf, self->input_buffer + self->next_read_idx, to_read); + self->next_read_idx += to_read; + buf += to_read; + n -= to_read; + if (n == 0) { + /* Entire read was satisfied from buffer */ + return n; + } + } + + /* Read from file */ + if (!self->readinto) { + return bad_readline(); + } + if (_Unpickler_SkipConsumed(self) < 0) { + return -1; + } + + /* Call readinto() into user buffer */ + PyObject *buf_obj = PyMemoryView_FromMemory(buf, n, PyBUF_WRITE); + if (buf_obj == NULL) { + return -1; + } + PyObject *read_size_obj = _Pickle_FastCall(self->readinto, buf_obj); + if (read_size_obj == NULL) { + return -1; + } + Py_ssize_t read_size = PyLong_AsSsize_t(read_size_obj); + Py_DECREF(read_size_obj); + + if (read_size < 0) { + if (!PyErr_Occurred()) { + PyErr_SetString(PyExc_ValueError, + "readinto() returned negative size"); + } + return -1; + } + if (read_size < n) { + return bad_readline(); + } + return n; +} + /* Read `n` bytes from the unpickler's data source, storing the result in `*s`. This should be used for all data reads, rather than accessing the unpickler's @@ -1482,8 +1570,10 @@ _Unpickler_New(void) self->next_read_idx = 0; self->prefetched_idx = 0; self->read = NULL; + self->readinto = NULL; self->readline = NULL; self->peek = NULL; + self->buffers = NULL; self->encoding = NULL; self->errors = NULL; self->marks = NULL; @@ -1507,25 +1597,29 @@ _Unpickler_New(void) } /* Returns -1 (with an exception set) on failure, 0 on success. This may - be called once on a freshly created Pickler. */ + be called once on a freshly created Unpickler. */ static int _Unpickler_SetInputStream(UnpicklerObject *self, PyObject *file) { _Py_IDENTIFIER(peek); _Py_IDENTIFIER(read); + _Py_IDENTIFIER(readinto); _Py_IDENTIFIER(readline); if (_PyObject_LookupAttrId(file, &PyId_peek, &self->peek) < 0) { return -1; } (void)_PyObject_LookupAttrId(file, &PyId_read, &self->read); + (void)_PyObject_LookupAttrId(file, &PyId_readinto, &self->readinto); (void)_PyObject_LookupAttrId(file, &PyId_readline, &self->readline); - if (self->readline == NULL || self->read == NULL) { + if (!self->readline || !self->readinto || !self->read) { if (!PyErr_Occurred()) { PyErr_SetString(PyExc_TypeError, - "file must have 'read' and 'readline' attributes"); + "file must have 'read', 'readinto' and " + "'readline' attributes"); } Py_CLEAR(self->read); + Py_CLEAR(self->readinto); Py_CLEAR(self->readline); Py_CLEAR(self->peek); return -1; @@ -1534,7 +1628,7 @@ _Unpickler_SetInputStream(UnpicklerObject *self, PyObject *file) } /* Returns -1 (with an exception set) on failure, 0 on success. This may - be called once on a freshly created Pickler. */ + be called once on a freshly created Unpickler. */ static int _Unpickler_SetInputEncoding(UnpicklerObject *self, const char *encoding, @@ -1554,6 +1648,23 @@ _Unpickler_SetInputEncoding(UnpicklerObject *self, return 0; } +/* Returns -1 (with an exception set) on failure, 0 on success. This may + be called once on a freshly created Unpickler. */ +static int +_Unpickler_SetBuffers(UnpicklerObject *self, PyObject *buffers) +{ + if (buffers == NULL) { + self->buffers = NULL; + } + else { + self->buffers = PyObject_GetIter(buffers); + if (self->buffers == NULL) { + return -1; + } + } + return 0; +} + /* Generate a GET opcode for an object stored in the memo. */ static int memo_get(PicklerObject *self, PyObject *key) @@ -2210,6 +2321,54 @@ _Pickler_write_bytes(PicklerObject *self, } static int +_save_bytes_data(PicklerObject *self, PyObject *obj, const char *data, + Py_ssize_t size) +{ + assert(self->proto >= 3); + + char header[9]; + Py_ssize_t len; + + if (size < 0) + return -1; + + if (size <= 0xff) { + header[0] = SHORT_BINBYTES; + header[1] = (unsigned char)size; + len = 2; + } + else if ((size_t)size <= 0xffffffffUL) { + header[0] = BINBYTES; + header[1] = (unsigned char)(size & 0xff); + header[2] = (unsigned char)((size >> 8) & 0xff); + header[3] = (unsigned char)((size >> 16) & 0xff); + header[4] = (unsigned char)((size >> 24) & 0xff); + len = 5; + } + else if (self->proto >= 4) { + header[0] = BINBYTES8; + _write_size64(header + 1, size); + len = 9; + } + else { + PyErr_SetString(PyExc_OverflowError, + "serializing a bytes object larger than 4 GiB " + "requires pickle protocol 4 or higher"); + return -1; + } + + if (_Pickler_write_bytes(self, header, len, data, size, obj) < 0) { + return -1; + } + + if (memo_put(self, obj) < 0) { + return -1; + } + + return 0; +} + +static int save_bytes(PicklerObject *self, PyObject *obj) { if (self->proto < 3) { @@ -2255,49 +2414,132 @@ save_bytes(PicklerObject *self, PyObject *obj) return status; } else { - Py_ssize_t size; - char header[9]; - Py_ssize_t len; + return _save_bytes_data(self, obj, PyBytes_AS_STRING(obj), + PyBytes_GET_SIZE(obj)); + } +} - size = PyBytes_GET_SIZE(obj); - if (size < 0) +static int +_save_bytearray_data(PicklerObject *self, PyObject *obj, const char *data, + Py_ssize_t size) +{ + assert(self->proto >= 5); + + char header[9]; + Py_ssize_t len; + + if (size < 0) + return -1; + + header[0] = BYTEARRAY8; + _write_size64(header + 1, size); + len = 9; + + if (_Pickler_write_bytes(self, header, len, data, size, obj) < 0) { + return -1; + } + + if (memo_put(self, obj) < 0) { + return -1; + } + + return 0; +} + +static int +save_bytearray(PicklerObject *self, PyObject *obj) +{ + if (self->proto < 5) { + /* Older pickle protocols do not have an opcode for pickling + * bytearrays. */ + PyObject *reduce_value = NULL; + int status; + + if (PyByteArray_GET_SIZE(obj) == 0) { + reduce_value = Py_BuildValue("(O())", + (PyObject *) &PyByteArray_Type); + } + else { + PyObject *bytes_obj = PyBytes_FromObject(obj); + if (bytes_obj != NULL) { + reduce_value = Py_BuildValue("(O(O))", + (PyObject *) &PyByteArray_Type, + bytes_obj); + Py_DECREF(bytes_obj); + } + } + if (reduce_value == NULL) return -1; - if (size <= 0xff) { - header[0] = SHORT_BINBYTES; - header[1] = (unsigned char)size; - len = 2; + /* save_reduce() will memoize the object automatically. */ + status = save_reduce(self, reduce_value, obj); + Py_DECREF(reduce_value); + return status; + } + else { + return _save_bytearray_data(self, obj, PyByteArray_AS_STRING(obj), + PyByteArray_GET_SIZE(obj)); + } +} + +static int +save_picklebuffer(PicklerObject *self, PyObject *obj) +{ + if (self->proto < 5) { + PickleState *st = _Pickle_GetGlobalState(); + PyErr_SetString(st->PicklingError, + "PickleBuffer can only pickled with protocol >= 5"); + return -1; + } + const Py_buffer* view = PyPickleBuffer_GetBuffer(obj); + if (view == NULL) { + return -1; + } + if (view->suboffsets != NULL || !PyBuffer_IsContiguous(view, 'A')) { + PickleState *st = _Pickle_GetGlobalState(); + PyErr_SetString(st->PicklingError, + "PickleBuffer can not be pickled when " + "pointing to a non-contiguous buffer"); + return -1; + } + int in_band = 1; + if (self->buffer_callback != NULL) { + PyObject *ret = PyObject_CallFunctionObjArgs(self->buffer_callback, + obj, NULL); + if (ret == NULL) { + return -1; } - else if ((size_t)size <= 0xffffffffUL) { - header[0] = BINBYTES; - header[1] = (unsigned char)(size & 0xff); - header[2] = (unsigned char)((size >> 8) & 0xff); - header[3] = (unsigned char)((size >> 16) & 0xff); - header[4] = (unsigned char)((size >> 24) & 0xff); - len = 5; + in_band = PyObject_IsTrue(ret); + Py_DECREF(ret); + if (in_band == -1) { + return -1; } - else if (self->proto >= 4) { - header[0] = BINBYTES8; - _write_size64(header + 1, size); - len = 9; + } + if (in_band) { + /* Write data in-band */ + if (view->readonly) { + return _save_bytes_data(self, obj, (const char*) view->buf, + view->len); } else { - PyErr_SetString(PyExc_OverflowError, - "cannot serialize a bytes object larger than 4 GiB"); - return -1; /* string too large */ + return _save_bytearray_data(self, obj, (const char*) view->buf, + view->len); } - - if (_Pickler_write_bytes(self, header, len, - PyBytes_AS_STRING(obj), size, obj) < 0) - { + } + else { + /* Write data out-of-band */ + const char next_buffer_op = NEXT_BUFFER; + if (_Pickler_Write(self, &next_buffer_op, 1) < 0) { return -1; } - - if (memo_put(self, obj) < 0) - return -1; - - return 0; + if (view->readonly) { + const char readonly_buffer_op = READONLY_BUFFER; + if (_Pickler_Write(self, &readonly_buffer_op, 1) < 0) { + return -1; + } + } } + return 0; } /* A copy of PyUnicode_EncodeRawUnicodeEscape() that also translates @@ -2417,7 +2659,8 @@ write_unicode_binary(PicklerObject *self, PyObject *obj) } else { PyErr_SetString(PyExc_OverflowError, - "cannot serialize a string larger than 4GiB"); + "serializing a string larger than 4 GiB " + "requires pickle protocol 4 or higher"); Py_XDECREF(encoded); return -1; } @@ -4062,6 +4305,14 @@ save(PicklerObject *self, PyObject *obj, int pers_save) status = save_tuple(self, obj); goto done; } + else if (type == &PyByteArray_Type) { + status = save_bytearray(self, obj); + goto done; + } + else if (type == &PyPickleBuffer_Type) { + status = save_picklebuffer(self, obj); + goto done; + } /* Now, check reducer_override. If it returns NotImplemented, * fallback to save_type or save_global, and then perhaps to the @@ -4342,6 +4593,7 @@ Pickler_dealloc(PicklerObject *self) Py_XDECREF(self->dispatch_table); Py_XDECREF(self->fast_memo); Py_XDECREF(self->reducer_override); + Py_XDECREF(self->buffer_callback); PyMemoTable_Del(self->memo); @@ -4356,6 +4608,7 @@ Pickler_traverse(PicklerObject *self, visitproc visit, void *arg) Py_VISIT(self->dispatch_table); Py_VISIT(self->fast_memo); Py_VISIT(self->reducer_override); + Py_VISIT(self->buffer_callback); return 0; } @@ -4368,6 +4621,7 @@ Pickler_clear(PicklerObject *self) Py_CLEAR(self->dispatch_table); Py_CLEAR(self->fast_memo); Py_CLEAR(self->reducer_override); + Py_CLEAR(self->buffer_callback); if (self->memo != NULL) { PyMemoTable *memo = self->memo; @@ -4385,6 +4639,7 @@ _pickle.Pickler.__init__ file: object protocol: object = NULL fix_imports: bool = True + buffer_callback: object = NULL This takes a binary file for writing a pickle data stream. @@ -4404,12 +4659,25 @@ this interface. If *fix_imports* is True and protocol is less than 3, pickle will try to map the new Python 3 names to the old module names used in Python 2, so that the pickle data stream is readable with Python 2. + +If *buffer_callback* is None (the default), buffer views are +serialized into *file* as part of the pickle stream. + +If *buffer_callback* is not None, then it can be called any number +of times with a buffer view. If the callback returns a false value +(such as None), the given buffer is out-of-band; otherwise the +buffer is serialized in-band, i.e. inside the pickle stream. + +It is an error if *buffer_callback* is not None and *protocol* +is None or smaller than 5. + [clinic start generated code]*/ static int _pickle_Pickler___init___impl(PicklerObject *self, PyObject *file, - PyObject *protocol, int fix_imports) -/*[clinic end generated code: output=b5f31078dab17fb0 input=4faabdbc763c2389]*/ + PyObject *protocol, int fix_imports, + PyObject *buffer_callback) +/*[clinic end generated code: output=0abedc50590d259b input=9a43a1c50ab91652]*/ { _Py_IDENTIFIER(persistent_id); _Py_IDENTIFIER(dispatch_table); @@ -4424,6 +4692,9 @@ _pickle_Pickler___init___impl(PicklerObject *self, PyObject *file, if (_Pickler_SetOutputStream(self, file) < 0) return -1; + if (_Pickler_SetBufferCallback(self, buffer_callback) < 0) + return -1; + /* memo and output_buffer may have already been created in _Pickler_New */ if (self->memo == NULL) { self->memo = PyMemoTable_New(); @@ -5212,18 +5483,101 @@ load_counted_binbytes(UnpicklerObject *self, int nbytes) return -1; } - if (_Unpickler_Read(self, &s, size) < 0) - return -1; - - bytes = PyBytes_FromStringAndSize(s, size); + bytes = PyBytes_FromStringAndSize(NULL, size); if (bytes == NULL) return -1; + if (_Unpickler_ReadInto(self, PyBytes_AS_STRING(bytes), size) < 0) { + Py_DECREF(bytes); + return -1; + } PDATA_PUSH(self->stack, bytes, -1); return 0; } static int +load_counted_bytearray(UnpicklerObject *self) +{ + PyObject *bytearray; + Py_ssize_t size; + char *s; + + if (_Unpickler_Read(self, &s, 8) < 0) { + return -1; + } + + size = calc_binsize(s, 8); + if (size < 0) { + PyErr_Format(PyExc_OverflowError, + "BYTEARRAY8 exceeds system's maximum size of %zd bytes", + PY_SSIZE_T_MAX); + return -1; + } + + bytearray = PyByteArray_FromStringAndSize(NULL, size); + if (bytearray == NULL) { + return -1; + } + if (_Unpickler_ReadInto(self, PyByteArray_AS_STRING(bytearray), size) < 0) { + Py_DECREF(bytearray); + return -1; + } + + PDATA_PUSH(self->stack, bytearray, -1); + return 0; +} + +static int +load_next_buffer(UnpicklerObject *self) +{ + if (self->buffers == NULL) { + PickleState *st = _Pickle_GetGlobalState(); + PyErr_SetString(st->UnpicklingError, + "pickle stream refers to out-of-band data " + "but no *buffers* argument was given"); + return -1; + } + PyObject *buf = PyIter_Next(self->buffers); + if (buf == NULL) { + if (!PyErr_Occurred()) { + PickleState *st = _Pickle_GetGlobalState(); + PyErr_SetString(st->UnpicklingError, + "not enough out-of-band buffers"); + } + return -1; + } + + PDATA_PUSH(self->stack, buf, -1); + return 0; +} + +static int +load_readonly_buffer(UnpicklerObject *self) +{ + Py_ssize_t len = Py_SIZE(self->stack); + if (len <= self->stack->fence) { + return Pdata_stack_underflow(self->stack); + } + + PyObject *obj = self->stack->data[len - 1]; + PyObject *view = PyMemoryView_FromObject(obj); + if (view == NULL) { + return -1; + } + if (!PyMemoryView_GET_BUFFER(view)->readonly) { + /* Original object is writable */ + PyMemoryView_GET_BUFFER(view)->readonly = 1; + self->stack->data[len - 1] = view; + Py_DECREF(obj); + } + else { + /* Original object is read-only, no need to replace it */ + Py_DECREF(view); + } + return 0; +} + +static int load_unicode(UnpicklerObject *self) { PyObject *str; @@ -6511,6 +6865,9 @@ load(UnpicklerObject *self) OP_ARG(SHORT_BINBYTES, load_counted_binbytes, 1) OP_ARG(BINBYTES, load_counted_binbytes, 4) OP_ARG(BINBYTES8, load_counted_binbytes, 8) + OP(BYTEARRAY8, load_counted_bytearray) + OP(NEXT_BUFFER, load_next_buffer) + OP(READONLY_BUFFER, load_readonly_buffer) OP_ARG(SHORT_BINSTRING, load_counted_binstring, 1) OP_ARG(BINSTRING, load_counted_binstring, 4) OP(STRING, load_string) @@ -6771,10 +7128,12 @@ Unpickler_dealloc(UnpicklerObject *self) { PyObject_GC_UnTrack((PyObject *)self); Py_XDECREF(self->readline); + Py_XDECREF(self->readinto); Py_XDECREF(self->read); Py_XDECREF(self->peek); Py_XDECREF(self->stack); Py_XDECREF(self->pers_func); + Py_XDECREF(self->buffers); if (self->buffer.buf != NULL) { PyBuffer_Release(&self->buffer); self->buffer.buf = NULL; @@ -6793,10 +7152,12 @@ static int Unpickler_traverse(UnpicklerObject *self, visitproc visit, void *arg) { Py_VISIT(self->readline); + Py_VISIT(self->readinto); Py_VISIT(self->read); Py_VISIT(self->peek); Py_VISIT(self->stack); Py_VISIT(self->pers_func); + Py_VISIT(self->buffers); return 0; } @@ -6804,10 +7165,12 @@ static int Unpickler_clear(UnpicklerObject *self) { Py_CLEAR(self->readline); + Py_CLEAR(self->readinto); Py_CLEAR(self->read); Py_CLEAR(self->peek); Py_CLEAR(self->stack); Py_CLEAR(self->pers_func); + Py_CLEAR(self->buffers); if (self->buffer.buf != NULL) { PyBuffer_Release(&self->buffer); self->buffer.buf = NULL; @@ -6835,6 +7198,7 @@ _pickle.Unpickler.__init__ fix_imports: bool = True encoding: str = 'ASCII' errors: str = 'strict' + buffers: object = NULL This takes a binary file for reading a pickle data stream. @@ -6861,8 +7225,8 @@ string instances as bytes objects. static int _pickle_Unpickler___init___impl(UnpicklerObject *self, PyObject *file, int fix_imports, const char *encoding, - const char *errors) -/*[clinic end generated code: output=e2c8ce748edc57b0 input=f9b7da04f5f4f335]*/ + const char *errors, PyObject *buffers) +/*[clinic end generated code: output=09f0192649ea3f85 input=da4b62d9edb68700]*/ { _Py_IDENTIFIER(persistent_load); @@ -6876,6 +7240,9 @@ _pickle_Unpickler___init___impl(UnpicklerObject *self, PyObject *file, if (_Unpickler_SetInputEncoding(self, encoding, errors) < 0) return -1; + if (_Unpickler_SetBuffers(self, buffers) < 0) + return -1; + self->fix_imports = fix_imports; if (init_method_ref((PyObject *)self, &PyId_persistent_load, @@ -7254,6 +7621,7 @@ _pickle.dump protocol: object = NULL * fix_imports: bool = True + buffer_callback: object = NULL Write a pickled representation of obj to the open file object file. @@ -7277,12 +7645,18 @@ this interface. If *fix_imports* is True and protocol is less than 3, pickle will try to map the new Python 3 names to the old module names used in Python 2, so that the pickle data stream is readable with Python 2. + +If *buffer_callback* is None (the default), buffer views are serialized +into *file* as part of the pickle stream. It is an error if +*buffer_callback* is not None and *protocol* is None or smaller than 5. + [clinic start generated code]*/ static PyObject * _pickle_dump_impl(PyObject *module, PyObject *obj, PyObject *file, - PyObject *protocol, int fix_imports) -/*[clinic end generated code: output=a4774d5fde7d34de input=93f1408489a87472]*/ + PyObject *protocol, int fix_imports, + PyObject *buffer_callback) +/*[clinic end generated code: output=706186dba996490c input=2f035f02cc0f9547]*/ { PicklerObject *pickler = _Pickler_New(); @@ -7295,6 +7669,9 @@ _pickle_dump_impl(PyObject *module, PyObject *obj, PyObject *file, if (_Pickler_SetOutputStream(pickler, file) < 0) goto error; + if (_Pickler_SetBufferCallback(pickler, buffer_callback) < 0) + goto error; + if (dump(pickler, obj) < 0) goto error; @@ -7317,6 +7694,7 @@ _pickle.dumps protocol: object = NULL * fix_imports: bool = True + buffer_callback: object = NULL Return the pickled representation of the object as a bytes object. @@ -7332,12 +7710,17 @@ version of Python needed to read the pickle produced. If *fix_imports* is True and *protocol* is less than 3, pickle will try to map the new Python 3 names to the old module names used in Python 2, so that the pickle data stream is readable with Python 2. + +If *buffer_callback* is None (the default), buffer views are serialized +into *file* as part of the pickle stream. It is an error if +*buffer_callback* is not None and *protocol* is None or smaller than 5. + [clinic start generated code]*/ static PyObject * _pickle_dumps_impl(PyObject *module, PyObject *obj, PyObject *protocol, - int fix_imports) -/*[clinic end generated code: output=d75d5cda456fd261 input=b6efb45a7d19b5ab]*/ + int fix_imports, PyObject *buffer_callback) +/*[clinic end generated code: output=fbab0093a5580fdf input=001f167df711b9f1]*/ { PyObject *result; PicklerObject *pickler = _Pickler_New(); @@ -7348,6 +7731,9 @@ _pickle_dumps_impl(PyObject *module, PyObject *obj, PyObject *protocol, if (_Pickler_SetProtocol(pickler, protocol, fix_imports) < 0) goto error; + if (_Pickler_SetBufferCallback(pickler, buffer_callback) < 0) + goto error; + if (dump(pickler, obj) < 0) goto error; @@ -7369,6 +7755,7 @@ _pickle.load fix_imports: bool = True encoding: str = 'ASCII' errors: str = 'strict' + buffers: object = NULL Read and return an object from the pickle data stored in a file. @@ -7397,8 +7784,9 @@ string instances as bytes objects. static PyObject * _pickle_load_impl(PyObject *module, PyObject *file, int fix_imports, - const char *encoding, const char *errors) -/*[clinic end generated code: output=69e298160285199e input=01b44dd3fc07afa7]*/ + const char *encoding, const char *errors, + PyObject *buffers) +/*[clinic end generated code: output=250452d141c23e76 input=29fae982fe778156]*/ { PyObject *result; UnpicklerObject *unpickler = _Unpickler_New(); @@ -7412,6 +7800,9 @@ _pickle_load_impl(PyObject *module, PyObject *file, int fix_imports, if (_Unpickler_SetInputEncoding(unpickler, encoding, errors) < 0) goto error; + if (_Unpickler_SetBuffers(unpickler, buffers) < 0) + goto error; + unpickler->fix_imports = fix_imports; result = load(unpickler); @@ -7432,6 +7823,7 @@ _pickle.loads fix_imports: bool = True encoding: str = 'ASCII' errors: str = 'strict' + buffers: object = NULL Read and return an object from the given pickle data. @@ -7451,8 +7843,9 @@ string instances as bytes objects. static PyObject * _pickle_loads_impl(PyObject *module, PyObject *data, int fix_imports, - const char *encoding, const char *errors) -/*[clinic end generated code: output=1e7cb2343f2c440f input=70605948a719feb9]*/ + const char *encoding, const char *errors, + PyObject *buffers) +/*[clinic end generated code: output=82ac1e6b588e6d02 input=c6004393f8276867]*/ { PyObject *result; UnpicklerObject *unpickler = _Unpickler_New(); @@ -7466,6 +7859,9 @@ _pickle_loads_impl(PyObject *module, PyObject *data, int fix_imports, if (_Unpickler_SetInputEncoding(unpickler, encoding, errors) < 0) goto error; + if (_Unpickler_SetBuffers(unpickler, buffers) < 0) + goto error; + unpickler->fix_imports = fix_imports; result = load(unpickler); @@ -7558,12 +7954,17 @@ PyInit__pickle(void) if (m == NULL) return NULL; + /* Add types */ Py_INCREF(&Pickler_Type); if (PyModule_AddObject(m, "Pickler", (PyObject *)&Pickler_Type) < 0) return NULL; Py_INCREF(&Unpickler_Type); if (PyModule_AddObject(m, "Unpickler", (PyObject *)&Unpickler_Type) < 0) return NULL; + Py_INCREF(&PyPickleBuffer_Type); + if (PyModule_AddObject(m, "PickleBuffer", + (PyObject *)&PyPickleBuffer_Type) < 0) + return NULL; st = _Pickle_GetState(m); diff --git a/Modules/clinic/_pickle.c.h b/Modules/clinic/_pickle.c.h index 1da2f93..8ac723f 100644 --- a/Modules/clinic/_pickle.c.h +++ b/Modules/clinic/_pickle.c.h @@ -63,7 +63,7 @@ exit: } PyDoc_STRVAR(_pickle_Pickler___init____doc__, -"Pickler(file, protocol=None, fix_imports=True)\n" +"Pickler(file, protocol=None, fix_imports=True, buffer_callback=None)\n" "--\n" "\n" "This takes a binary file for writing a pickle data stream.\n" @@ -83,27 +83,40 @@ PyDoc_STRVAR(_pickle_Pickler___init____doc__, "\n" "If *fix_imports* is True and protocol is less than 3, pickle will try\n" "to map the new Python 3 names to the old module names used in Python\n" -"2, so that the pickle data stream is readable with Python 2."); +"2, so that the pickle data stream is readable with Python 2.\n" +"\n" +"If *buffer_callback* is None (the default), buffer views are\n" +"serialized into *file* as part of the pickle stream.\n" +"\n" +"If *buffer_callback* is not None, then it can be called any number\n" +"of times with a buffer view. If the callback returns a false value\n" +"(such as None), the given buffer is out-of-band; otherwise the\n" +"buffer is serialized in-band, i.e. inside the pickle stream.\n" +"\n" +"It is an error if *buffer_callback* is not None and *protocol*\n" +"is None or smaller than 5."); static int _pickle_Pickler___init___impl(PicklerObject *self, PyObject *file, - PyObject *protocol, int fix_imports); + PyObject *protocol, int fix_imports, + PyObject *buffer_callback); static int _pickle_Pickler___init__(PyObject *self, PyObject *args, PyObject *kwargs) { int return_value = -1; - static const char * const _keywords[] = {"file", "protocol", "fix_imports", NULL}; + static const char * const _keywords[] = {"file", "protocol", "fix_imports", "buffer_callback", NULL}; static _PyArg_Parser _parser = {NULL, _keywords, "Pickler", 0}; - PyObject *argsbuf[3]; + PyObject *argsbuf[4]; PyObject * const *fastargs; Py_ssize_t nargs = PyTuple_GET_SIZE(args); Py_ssize_t noptargs = nargs + (kwargs ? PyDict_GET_SIZE(kwargs) : 0) - 1; PyObject *file; PyObject *protocol = NULL; int fix_imports = 1; + PyObject *buffer_callback = NULL; - fastargs = _PyArg_UnpackKeywords(_PyTuple_CAST(args)->ob_item, nargs, kwargs, NULL, &_parser, 1, 3, 0, argsbuf); + fastargs = _PyArg_UnpackKeywords(_PyTuple_CAST(args)->ob_item, nargs, kwargs, NULL, &_parser, 1, 4, 0, argsbuf); if (!fastargs) { goto exit; } @@ -117,12 +130,18 @@ _pickle_Pickler___init__(PyObject *self, PyObject *args, PyObject *kwargs) goto skip_optional_pos; } } - fix_imports = PyObject_IsTrue(fastargs[2]); - if (fix_imports < 0) { - goto exit; + if (fastargs[2]) { + fix_imports = PyObject_IsTrue(fastargs[2]); + if (fix_imports < 0) { + goto exit; + } + if (!--noptargs) { + goto skip_optional_pos; + } } + buffer_callback = fastargs[3]; skip_optional_pos: - return_value = _pickle_Pickler___init___impl((PicklerObject *)self, file, protocol, fix_imports); + return_value = _pickle_Pickler___init___impl((PicklerObject *)self, file, protocol, fix_imports, buffer_callback); exit: return return_value; @@ -272,7 +291,8 @@ exit: } PyDoc_STRVAR(_pickle_Unpickler___init____doc__, -"Unpickler(file, *, fix_imports=True, encoding=\'ASCII\', errors=\'strict\')\n" +"Unpickler(file, *, fix_imports=True, encoding=\'ASCII\', errors=\'strict\',\n" +" buffers=None)\n" "--\n" "\n" "This takes a binary file for reading a pickle data stream.\n" @@ -299,15 +319,15 @@ PyDoc_STRVAR(_pickle_Unpickler___init____doc__, static int _pickle_Unpickler___init___impl(UnpicklerObject *self, PyObject *file, int fix_imports, const char *encoding, - const char *errors); + const char *errors, PyObject *buffers); static int _pickle_Unpickler___init__(PyObject *self, PyObject *args, PyObject *kwargs) { int return_value = -1; - static const char * const _keywords[] = {"file", "fix_imports", "encoding", "errors", NULL}; + static const char * const _keywords[] = {"file", "fix_imports", "encoding", "errors", "buffers", NULL}; static _PyArg_Parser _parser = {NULL, _keywords, "Unpickler", 0}; - PyObject *argsbuf[4]; + PyObject *argsbuf[5]; PyObject * const *fastargs; Py_ssize_t nargs = PyTuple_GET_SIZE(args); Py_ssize_t noptargs = nargs + (kwargs ? PyDict_GET_SIZE(kwargs) : 0) - 1; @@ -315,6 +335,7 @@ _pickle_Unpickler___init__(PyObject *self, PyObject *args, PyObject *kwargs) int fix_imports = 1; const char *encoding = "ASCII"; const char *errors = "strict"; + PyObject *buffers = NULL; fastargs = _PyArg_UnpackKeywords(_PyTuple_CAST(args)->ob_item, nargs, kwargs, NULL, &_parser, 1, 1, 0, argsbuf); if (!fastargs) { @@ -351,21 +372,27 @@ _pickle_Unpickler___init__(PyObject *self, PyObject *args, PyObject *kwargs) goto skip_optional_kwonly; } } - if (!PyUnicode_Check(fastargs[3])) { - _PyArg_BadArgument("Unpickler", 4, "str", fastargs[3]); - goto exit; - } - Py_ssize_t errors_length; - errors = PyUnicode_AsUTF8AndSize(fastargs[3], &errors_length); - if (errors == NULL) { - goto exit; - } - if (strlen(errors) != (size_t)errors_length) { - PyErr_SetString(PyExc_ValueError, "embedded null character"); - goto exit; + if (fastargs[3]) { + if (!PyUnicode_Check(fastargs[3])) { + _PyArg_BadArgument("Unpickler", 4, "str", fastargs[3]); + goto exit; + } + Py_ssize_t errors_length; + errors = PyUnicode_AsUTF8AndSize(fastargs[3], &errors_length); + if (errors == NULL) { + goto exit; + } + if (strlen(errors) != (size_t)errors_length) { + PyErr_SetString(PyExc_ValueError, "embedded null character"); + goto exit; + } + if (!--noptargs) { + goto skip_optional_kwonly; + } } + buffers = fastargs[4]; skip_optional_kwonly: - return_value = _pickle_Unpickler___init___impl((UnpicklerObject *)self, file, fix_imports, encoding, errors); + return_value = _pickle_Unpickler___init___impl((UnpicklerObject *)self, file, fix_imports, encoding, errors, buffers); exit: return return_value; @@ -426,7 +453,8 @@ _pickle_UnpicklerMemoProxy___reduce__(UnpicklerMemoProxyObject *self, PyObject * } PyDoc_STRVAR(_pickle_dump__doc__, -"dump($module, /, obj, file, protocol=None, *, fix_imports=True)\n" +"dump($module, /, obj, file, protocol=None, *, fix_imports=True,\n" +" buffer_callback=None)\n" "--\n" "\n" "Write a pickled representation of obj to the open file object file.\n" @@ -450,27 +478,33 @@ PyDoc_STRVAR(_pickle_dump__doc__, "\n" "If *fix_imports* is True and protocol is less than 3, pickle will try\n" "to map the new Python 3 names to the old module names used in Python\n" -"2, so that the pickle data stream is readable with Python 2."); +"2, so that the pickle data stream is readable with Python 2.\n" +"\n" +"If *buffer_callback* is None (the default), buffer views are serialized\n" +"into *file* as part of the pickle stream. It is an error if\n" +"*buffer_callback* is not None and *protocol* is None or smaller than 5."); #define _PICKLE_DUMP_METHODDEF \ {"dump", (PyCFunction)(void(*)(void))_pickle_dump, METH_FASTCALL|METH_KEYWORDS, _pickle_dump__doc__}, static PyObject * _pickle_dump_impl(PyObject *module, PyObject *obj, PyObject *file, - PyObject *protocol, int fix_imports); + PyObject *protocol, int fix_imports, + PyObject *buffer_callback); static PyObject * _pickle_dump(PyObject *module, PyObject *const *args, Py_ssize_t nargs, PyObject *kwnames) { PyObject *return_value = NULL; - static const char * const _keywords[] = {"obj", "file", "protocol", "fix_imports", NULL}; + static const char * const _keywords[] = {"obj", "file", "protocol", "fix_imports", "buffer_callback", NULL}; static _PyArg_Parser _parser = {NULL, _keywords, "dump", 0}; - PyObject *argsbuf[4]; + PyObject *argsbuf[5]; Py_ssize_t noptargs = nargs + (kwnames ? PyTuple_GET_SIZE(kwnames) : 0) - 2; PyObject *obj; PyObject *file; PyObject *protocol = NULL; int fix_imports = 1; + PyObject *buffer_callback = NULL; args = _PyArg_UnpackKeywords(args, nargs, NULL, kwnames, &_parser, 2, 3, 0, argsbuf); if (!args) { @@ -491,19 +525,26 @@ skip_optional_pos: if (!noptargs) { goto skip_optional_kwonly; } - fix_imports = PyObject_IsTrue(args[3]); - if (fix_imports < 0) { - goto exit; + if (args[3]) { + fix_imports = PyObject_IsTrue(args[3]); + if (fix_imports < 0) { + goto exit; + } + if (!--noptargs) { + goto skip_optional_kwonly; + } } + buffer_callback = args[4]; skip_optional_kwonly: - return_value = _pickle_dump_impl(module, obj, file, protocol, fix_imports); + return_value = _pickle_dump_impl(module, obj, file, protocol, fix_imports, buffer_callback); exit: return return_value; } PyDoc_STRVAR(_pickle_dumps__doc__, -"dumps($module, /, obj, protocol=None, *, fix_imports=True)\n" +"dumps($module, /, obj, protocol=None, *, fix_imports=True,\n" +" buffer_callback=None)\n" "--\n" "\n" "Return the pickled representation of the object as a bytes object.\n" @@ -519,26 +560,31 @@ PyDoc_STRVAR(_pickle_dumps__doc__, "\n" "If *fix_imports* is True and *protocol* is less than 3, pickle will\n" "try to map the new Python 3 names to the old module names used in\n" -"Python 2, so that the pickle data stream is readable with Python 2."); +"Python 2, so that the pickle data stream is readable with Python 2.\n" +"\n" +"If *buffer_callback* is None (the default), buffer views are serialized\n" +"into *file* as part of the pickle stream. It is an error if\n" +"*buffer_callback* is not None and *protocol* is None or smaller than 5."); #define _PICKLE_DUMPS_METHODDEF \ {"dumps", (PyCFunction)(void(*)(void))_pickle_dumps, METH_FASTCALL|METH_KEYWORDS, _pickle_dumps__doc__}, static PyObject * _pickle_dumps_impl(PyObject *module, PyObject *obj, PyObject *protocol, - int fix_imports); + int fix_imports, PyObject *buffer_callback); static PyObject * _pickle_dumps(PyObject *module, PyObject *const *args, Py_ssize_t nargs, PyObject *kwnames) { PyObject *return_value = NULL; - static const char * const _keywords[] = {"obj", "protocol", "fix_imports", NULL}; + static const char * const _keywords[] = {"obj", "protocol", "fix_imports", "buffer_callback", NULL}; static _PyArg_Parser _parser = {NULL, _keywords, "dumps", 0}; - PyObject *argsbuf[3]; + PyObject *argsbuf[4]; Py_ssize_t noptargs = nargs + (kwnames ? PyTuple_GET_SIZE(kwnames) : 0) - 1; PyObject *obj; PyObject *protocol = NULL; int fix_imports = 1; + PyObject *buffer_callback = NULL; args = _PyArg_UnpackKeywords(args, nargs, NULL, kwnames, &_parser, 1, 2, 0, argsbuf); if (!args) { @@ -558,12 +604,18 @@ skip_optional_pos: if (!noptargs) { goto skip_optional_kwonly; } - fix_imports = PyObject_IsTrue(args[2]); - if (fix_imports < 0) { - goto exit; + if (args[2]) { + fix_imports = PyObject_IsTrue(args[2]); + if (fix_imports < 0) { + goto exit; + } + if (!--noptargs) { + goto skip_optional_kwonly; + } } + buffer_callback = args[3]; skip_optional_kwonly: - return_value = _pickle_dumps_impl(module, obj, protocol, fix_imports); + return_value = _pickle_dumps_impl(module, obj, protocol, fix_imports, buffer_callback); exit: return return_value; @@ -571,7 +623,7 @@ exit: PyDoc_STRVAR(_pickle_load__doc__, "load($module, /, file, *, fix_imports=True, encoding=\'ASCII\',\n" -" errors=\'strict\')\n" +" errors=\'strict\', buffers=None)\n" "--\n" "\n" "Read and return an object from the pickle data stored in a file.\n" @@ -603,20 +655,22 @@ PyDoc_STRVAR(_pickle_load__doc__, static PyObject * _pickle_load_impl(PyObject *module, PyObject *file, int fix_imports, - const char *encoding, const char *errors); + const char *encoding, const char *errors, + PyObject *buffers); static PyObject * _pickle_load(PyObject *module, PyObject *const *args, Py_ssize_t nargs, PyObject *kwnames) { PyObject *return_value = NULL; - static const char * const _keywords[] = {"file", "fix_imports", "encoding", "errors", NULL}; + static const char * const _keywords[] = {"file", "fix_imports", "encoding", "errors", "buffers", NULL}; static _PyArg_Parser _parser = {NULL, _keywords, "load", 0}; - PyObject *argsbuf[4]; + PyObject *argsbuf[5]; Py_ssize_t noptargs = nargs + (kwnames ? PyTuple_GET_SIZE(kwnames) : 0) - 1; PyObject *file; int fix_imports = 1; const char *encoding = "ASCII"; const char *errors = "strict"; + PyObject *buffers = NULL; args = _PyArg_UnpackKeywords(args, nargs, NULL, kwnames, &_parser, 1, 1, 0, argsbuf); if (!args) { @@ -653,21 +707,27 @@ _pickle_load(PyObject *module, PyObject *const *args, Py_ssize_t nargs, PyObject goto skip_optional_kwonly; } } - if (!PyUnicode_Check(args[3])) { - _PyArg_BadArgument("load", 4, "str", args[3]); - goto exit; - } - Py_ssize_t errors_length; - errors = PyUnicode_AsUTF8AndSize(args[3], &errors_length); - if (errors == NULL) { - goto exit; - } - if (strlen(errors) != (size_t)errors_length) { - PyErr_SetString(PyExc_ValueError, "embedded null character"); - goto exit; + if (args[3]) { + if (!PyUnicode_Check(args[3])) { + _PyArg_BadArgument("load", 4, "str", args[3]); + goto exit; + } + Py_ssize_t errors_length; + errors = PyUnicode_AsUTF8AndSize(args[3], &errors_length); + if (errors == NULL) { + goto exit; + } + if (strlen(errors) != (size_t)errors_length) { + PyErr_SetString(PyExc_ValueError, "embedded null character"); + goto exit; + } + if (!--noptargs) { + goto skip_optional_kwonly; + } } + buffers = args[4]; skip_optional_kwonly: - return_value = _pickle_load_impl(module, file, fix_imports, encoding, errors); + return_value = _pickle_load_impl(module, file, fix_imports, encoding, errors, buffers); exit: return return_value; @@ -675,7 +735,7 @@ exit: PyDoc_STRVAR(_pickle_loads__doc__, "loads($module, /, data, *, fix_imports=True, encoding=\'ASCII\',\n" -" errors=\'strict\')\n" +" errors=\'strict\', buffers=None)\n" "--\n" "\n" "Read and return an object from the given pickle data.\n" @@ -698,20 +758,22 @@ PyDoc_STRVAR(_pickle_loads__doc__, static PyObject * _pickle_loads_impl(PyObject *module, PyObject *data, int fix_imports, - const char *encoding, const char *errors); + const char *encoding, const char *errors, + PyObject *buffers); static PyObject * _pickle_loads(PyObject *module, PyObject *const *args, Py_ssize_t nargs, PyObject *kwnames) { PyObject *return_value = NULL; - static const char * const _keywords[] = {"data", "fix_imports", "encoding", "errors", NULL}; + static const char * const _keywords[] = {"data", "fix_imports", "encoding", "errors", "buffers", NULL}; static _PyArg_Parser _parser = {NULL, _keywords, "loads", 0}; - PyObject *argsbuf[4]; + PyObject *argsbuf[5]; Py_ssize_t noptargs = nargs + (kwnames ? PyTuple_GET_SIZE(kwnames) : 0) - 1; PyObject *data; int fix_imports = 1; const char *encoding = "ASCII"; const char *errors = "strict"; + PyObject *buffers = NULL; args = _PyArg_UnpackKeywords(args, nargs, NULL, kwnames, &_parser, 1, 1, 0, argsbuf); if (!args) { @@ -748,23 +810,29 @@ _pickle_loads(PyObject *module, PyObject *const *args, Py_ssize_t nargs, PyObjec goto skip_optional_kwonly; } } - if (!PyUnicode_Check(args[3])) { - _PyArg_BadArgument("loads", 4, "str", args[3]); - goto exit; - } - Py_ssize_t errors_length; - errors = PyUnicode_AsUTF8AndSize(args[3], &errors_length); - if (errors == NULL) { - goto exit; - } - if (strlen(errors) != (size_t)errors_length) { - PyErr_SetString(PyExc_ValueError, "embedded null character"); - goto exit; + if (args[3]) { + if (!PyUnicode_Check(args[3])) { + _PyArg_BadArgument("loads", 4, "str", args[3]); + goto exit; + } + Py_ssize_t errors_length; + errors = PyUnicode_AsUTF8AndSize(args[3], &errors_length); + if (errors == NULL) { + goto exit; + } + if (strlen(errors) != (size_t)errors_length) { + PyErr_SetString(PyExc_ValueError, "embedded null character"); + goto exit; + } + if (!--noptargs) { + goto skip_optional_kwonly; + } } + buffers = args[4]; skip_optional_kwonly: - return_value = _pickle_loads_impl(module, data, fix_imports, encoding, errors); + return_value = _pickle_loads_impl(module, data, fix_imports, encoding, errors, buffers); exit: return return_value; } -/*[clinic end generated code: output=8f972562c8f71e2b input=a9049054013a1b77]*/ +/*[clinic end generated code: output=8dc0e862f96c4afe input=a9049054013a1b77]*/ diff --git a/Objects/object.c b/Objects/object.c index f842ab3..6d79165 100644 --- a/Objects/object.c +++ b/Objects/object.c @@ -1839,6 +1839,7 @@ _PyTypes_Init(void) INIT_TYPE(&PyMethodDescr_Type, "method descr"); INIT_TYPE(&PyCallIter_Type, "call iter"); INIT_TYPE(&PySeqIter_Type, "sequence iterator"); + INIT_TYPE(&PyPickleBuffer_Type, "pickle.PickleBuffer"); INIT_TYPE(&PyCoro_Type, "coroutine"); INIT_TYPE(&_PyCoroWrapper_Type, "coroutine wrapper"); INIT_TYPE(&_PyInterpreterID_Type, "interpreter ID"); diff --git a/Objects/picklebufobject.c b/Objects/picklebufobject.c new file mode 100644 index 0000000..a135e55 --- /dev/null +++ b/Objects/picklebufobject.c @@ -0,0 +1,219 @@ +/* PickleBuffer object implementation */ + +#define PY_SSIZE_T_CLEAN +#include "Python.h" +#include <stddef.h> + +typedef struct { + PyObject_HEAD + /* The view exported by the original object */ + Py_buffer view; + PyObject *weakreflist; +} PyPickleBufferObject; + +/* C API */ + +PyObject * +PyPickleBuffer_FromObject(PyObject *base) +{ + PyTypeObject *type = &PyPickleBuffer_Type; + PyPickleBufferObject *self; + + self = (PyPickleBufferObject *) type->tp_alloc(type, 0); + if (self == NULL) { + return NULL; + } + self->view.obj = NULL; + self->weakreflist = NULL; + if (PyObject_GetBuffer(base, &self->view, PyBUF_FULL_RO) < 0) { + Py_DECREF(self); + return NULL; + } + return (PyObject *) self; +} + +const Py_buffer * +PyPickleBuffer_GetBuffer(PyObject *obj) +{ + PyPickleBufferObject *self = (PyPickleBufferObject *) obj; + + if (!PyPickleBuffer_Check(obj)) { + PyErr_Format(PyExc_TypeError, + "expected PickleBuffer, %.200s found", + Py_TYPE(obj)->tp_name); + return NULL; + } + if (self->view.obj == NULL) { + PyErr_SetString(PyExc_ValueError, + "operation forbidden on released PickleBuffer object"); + return NULL; + } + return &self->view; +} + +int +PyPickleBuffer_Release(PyObject *obj) +{ + PyPickleBufferObject *self = (PyPickleBufferObject *) obj; + + if (!PyPickleBuffer_Check(obj)) { + PyErr_Format(PyExc_TypeError, + "expected PickleBuffer, %.200s found", + Py_TYPE(obj)->tp_name); + return -1; + } + PyBuffer_Release(&self->view); + return 0; +} + +static PyObject * +picklebuf_new(PyTypeObject *type, PyObject *args, PyObject *kwds) +{ + PyPickleBufferObject *self; + PyObject *base; + char *keywords[] = {"", NULL}; + + if (!PyArg_ParseTupleAndKeywords(args, kwds, "O:PickleBuffer", + keywords, &base)) { + return NULL; + } + + self = (PyPickleBufferObject *) type->tp_alloc(type, 0); + if (self == NULL) { + return NULL; + } + self->view.obj = NULL; + self->weakreflist = NULL; + if (PyObject_GetBuffer(base, &self->view, PyBUF_FULL_RO) < 0) { + Py_DECREF(self); + return NULL; + } + return (PyObject *) self; +} + +static int +picklebuf_traverse(PyPickleBufferObject *self, visitproc visit, void *arg) +{ + Py_VISIT(self->view.obj); + return 0; +} + +static int +picklebuf_clear(PyPickleBufferObject *self) +{ + PyBuffer_Release(&self->view); + return 0; +} + +static void +picklebuf_dealloc(PyPickleBufferObject *self) +{ + PyObject_GC_UnTrack(self); + if (self->weakreflist != NULL) + PyObject_ClearWeakRefs((PyObject *) self); + PyBuffer_Release(&self->view); + Py_TYPE(self)->tp_free((PyObject *) self); +} + +/* Buffer API */ + +static int +picklebuf_getbuf(PyPickleBufferObject *self, Py_buffer *view, int flags) +{ + if (self->view.obj == NULL) { + PyErr_SetString(PyExc_ValueError, + "operation forbidden on released PickleBuffer object"); + return -1; + } + return PyObject_GetBuffer(self->view.obj, view, flags); +} + +static void +picklebuf_releasebuf(PyPickleBufferObject *self, Py_buffer *view) +{ + /* Since our bf_getbuffer redirects to the original object, this + * implementation is never called. It only exists to signal that + * buffers exported by PickleBuffer have non-trivial releasing + * behaviour (see check in Python/getargs.c). + */ +} + +static PyBufferProcs picklebuf_as_buffer = { + .bf_getbuffer = (getbufferproc) picklebuf_getbuf, + .bf_releasebuffer = (releasebufferproc) picklebuf_releasebuf, +}; + +/* Methods */ + +static PyObject * +picklebuf_raw(PyPickleBufferObject *self, PyObject *Py_UNUSED(ignored)) +{ + if (self->view.obj == NULL) { + PyErr_SetString(PyExc_ValueError, + "operation forbidden on released PickleBuffer object"); + return NULL; + } + if (self->view.suboffsets != NULL + || !PyBuffer_IsContiguous(&self->view, 'A')) { + PyErr_SetString(PyExc_BufferError, + "cannot extract raw buffer from non-contiguous buffer"); + return NULL; + } + PyObject *m = PyMemoryView_FromObject((PyObject *) self); + if (m == NULL) { + return NULL; + } + PyMemoryViewObject *mv = (PyMemoryViewObject *) m; + assert(mv->view.suboffsets == NULL); + /* Mutate memoryview instance to make it a "raw" memoryview */ + mv->view.format = "B"; + mv->view.ndim = 1; + mv->view.itemsize = 1; + /* shape = (length,) */ + mv->view.shape = &mv->view.len; + /* strides = (1,) */ + mv->view.strides = &mv->view.itemsize; + /* Fix memoryview state flags */ + /* XXX Expose memoryobject.c's init_flags() instead? */ + mv->flags = _Py_MEMORYVIEW_C | _Py_MEMORYVIEW_FORTRAN; + return m; +} + +PyDoc_STRVAR(picklebuf_raw_doc, +"raw($self, /)\n--\n\ +\n\ +Return a memoryview of the raw memory underlying this buffer.\n\ +Will raise BufferError is the buffer isn't contiguous."); + +static PyObject * +picklebuf_release(PyPickleBufferObject *self, PyObject *Py_UNUSED(ignored)) +{ + PyBuffer_Release(&self->view); + Py_RETURN_NONE; +} + +PyDoc_STRVAR(picklebuf_release_doc, +"release($self, /)\n--\n\ +\n\ +Release the underlying buffer exposed by the PickleBuffer object."); + +static PyMethodDef picklebuf_methods[] = { + {"raw", (PyCFunction) picklebuf_raw, METH_NOARGS, picklebuf_raw_doc}, + {"release", (PyCFunction) picklebuf_release, METH_NOARGS, picklebuf_release_doc}, + {NULL, NULL} +}; + +PyTypeObject PyPickleBuffer_Type = { + PyVarObject_HEAD_INIT(NULL, 0) + .tp_name = "pickle.PickleBuffer", + .tp_doc = "Wrapper for potentially out-of-band buffers", + .tp_basicsize = sizeof(PyPickleBufferObject), + .tp_flags = Py_TPFLAGS_DEFAULT | Py_TPFLAGS_HAVE_GC, + .tp_new = picklebuf_new, + .tp_dealloc = (destructor) picklebuf_dealloc, + .tp_traverse = (traverseproc) picklebuf_traverse, + .tp_clear = (inquiry) picklebuf_clear, + .tp_weaklistoffset = offsetof(PyPickleBufferObject, weakreflist), + .tp_as_buffer = &picklebuf_as_buffer, + .tp_methods = picklebuf_methods, +}; diff --git a/PCbuild/pythoncore.vcxproj b/PCbuild/pythoncore.vcxproj index 10f51dd..db691cd 100644 --- a/PCbuild/pythoncore.vcxproj +++ b/PCbuild/pythoncore.vcxproj @@ -197,6 +197,7 @@ <ClInclude Include="..\Include\osmodule.h" /> <ClInclude Include="..\Include\parsetok.h" /> <ClInclude Include="..\Include\patchlevel.h" /> + <ClInclude Include="..\Include\picklebufobject.h" /> <ClInclude Include="..\Include\pyhash.h" /> <ClInclude Include="..\Include\py_curses.h" /> <ClInclude Include="..\Include\pyarena.h" /> @@ -383,6 +384,7 @@ <ClCompile Include="..\Objects\object.c" /> <ClCompile Include="..\Objects\obmalloc.c" /> <ClCompile Include="..\Objects\odictobject.c" /> + <ClCompile Include="..\Objects\picklebufobject.c" /> <ClCompile Include="..\Objects\rangeobject.c" /> <ClCompile Include="..\Objects\setobject.c" /> <ClCompile Include="..\Objects\sliceobject.c" /> diff --git a/PCbuild/pythoncore.vcxproj.filters b/PCbuild/pythoncore.vcxproj.filters index 396d146..dba47e9 100644 --- a/PCbuild/pythoncore.vcxproj.filters +++ b/PCbuild/pythoncore.vcxproj.filters @@ -285,6 +285,9 @@ <ClInclude Include="..\Include\patchlevel.h"> <Filter>Include</Filter> </ClInclude> + <ClInclude Include="..\Include\picklebufobject.h"> + <Filter>Include</Filter> + </ClInclude> <ClInclude Include="..\Include\py_curses.h"> <Filter>Include</Filter> </ClInclude> @@ -818,6 +821,9 @@ <ClCompile Include="..\Objects\obmalloc.c"> <Filter>Objects</Filter> </ClCompile> + <ClCompile Include="..\Objects\picklebufobject.c"> + <Filter>Objects</Filter> + </ClCompile> <ClCompile Include="..\Objects\rangeobject.c"> <Filter>Objects</Filter> </ClCompile> |