summaryrefslogtreecommitdiffstats
path: root/Lib/pickletools.py
diff options
context:
space:
mode:
authorVictor Stinner <victor.stinner@gmail.com>2013-03-26 00:11:54 (GMT)
committerVictor Stinner <victor.stinner@gmail.com>2013-03-26 00:11:54 (GMT)
commit765531d2d083c7a4e9478fcd960eebe04ac6b192 (patch)
tree5bbaa431016613dcbe05081e2ab29aed2f36fd6a /Lib/pickletools.py
parent1f8898a5916b942c86ee8716c37d2db388e7bf2f (diff)
downloadcpython-765531d2d083c7a4e9478fcd960eebe04ac6b192.zip
cpython-765531d2d083c7a4e9478fcd960eebe04ac6b192.tar.gz
cpython-765531d2d083c7a4e9478fcd960eebe04ac6b192.tar.bz2
Issue #17516: use comment syntax for comments, instead of multiline string
Diffstat (limited to 'Lib/pickletools.py')
-rw-r--r--Lib/pickletools.py225
1 files changed, 112 insertions, 113 deletions
diff --git a/Lib/pickletools.py b/Lib/pickletools.py
index 66f4edd..9f90e3e 100644
--- a/Lib/pickletools.py
+++ b/Lib/pickletools.py
@@ -33,119 +33,118 @@ bytes_types = pickle.bytes_types
# by a later GET.
-"""
-"A pickle" is a program for a virtual pickle machine (PM, but more accurately
-called an unpickling machine). It's a sequence of opcodes, interpreted by the
-PM, building an arbitrarily complex Python object.
-
-For the most part, the PM is very simple: there are no looping, testing, or
-conditional instructions, no arithmetic and no function calls. Opcodes are
-executed once each, from first to last, until a STOP opcode is reached.
-
-The PM has two data areas, "the stack" and "the memo".
-
-Many opcodes push Python objects onto the stack; e.g., INT pushes a Python
-integer object on the stack, whose value is gotten from a decimal string
-literal immediately following the INT opcode in the pickle bytestream. Other
-opcodes take Python objects off the stack. The result of unpickling is
-whatever object is left on the stack when the final STOP opcode is executed.
-
-The memo is simply an array of objects, or it can be implemented as a dict
-mapping little integers to objects. The memo serves as the PM's "long term
-memory", and the little integers indexing the memo are akin to variable
-names. Some opcodes pop a stack object into the memo at a given index,
-and others push a memo object at a given index onto the stack again.
-
-At heart, that's all the PM has. Subtleties arise for these reasons:
-
-+ Object identity. Objects can be arbitrarily complex, and subobjects
- may be shared (for example, the list [a, a] refers to the same object a
- twice). It can be vital that unpickling recreate an isomorphic object
- graph, faithfully reproducing sharing.
-
-+ Recursive objects. For example, after "L = []; L.append(L)", L is a
- list, and L[0] is the same list. This is related to the object identity
- point, and some sequences of pickle opcodes are subtle in order to
- get the right result in all cases.
-
-+ Things pickle doesn't know everything about. Examples of things pickle
- does know everything about are Python's builtin scalar and container
- types, like ints and tuples. They generally have opcodes dedicated to
- them. For things like module references and instances of user-defined
- classes, pickle's knowledge is limited. Historically, many enhancements
- have been made to the pickle protocol in order to do a better (faster,
- and/or more compact) job on those.
-
-+ Backward compatibility and micro-optimization. As explained below,
- pickle opcodes never go away, not even when better ways to do a thing
- get invented. The repertoire of the PM just keeps growing over time.
- For example, protocol 0 had two opcodes for building Python integers (INT
- and LONG), protocol 1 added three more for more-efficient pickling of short
- integers, and protocol 2 added two more for more-efficient pickling of
- long integers (before protocol 2, the only ways to pickle a Python long
- took time quadratic in the number of digits, for both pickling and
- unpickling). "Opcode bloat" isn't so much a subtlety as a source of
- wearying complication.
-
-
-Pickle protocols:
-
-For compatibility, the meaning of a pickle opcode never changes. Instead new
-pickle opcodes get added, and each version's unpickler can handle all the
-pickle opcodes in all protocol versions to date. So old pickles continue to
-be readable forever. The pickler can generally be told to restrict itself to
-the subset of opcodes available under previous protocol versions too, so that
-users can create pickles under the current version readable by older
-versions. However, a pickle does not contain its version number embedded
-within it. If an older unpickler tries to read a pickle using a later
-protocol, the result is most likely an exception due to seeing an unknown (in
-the older unpickler) opcode.
-
-The original pickle used what's now called "protocol 0", and what was called
-"text mode" before Python 2.3. The entire pickle bytestream is made up of
-printable 7-bit ASCII characters, plus the newline character, in protocol 0.
-That's why it was called text mode. Protocol 0 is small and elegant, but
-sometimes painfully inefficient.
-
-The second major set of additions is now called "protocol 1", and was called
-"binary mode" before Python 2.3. This added many opcodes with arguments
-consisting of arbitrary bytes, including NUL bytes and unprintable "high bit"
-bytes. Binary mode pickles can be substantially smaller than equivalent
-text mode pickles, and sometimes faster too; e.g., BININT represents a 4-byte
-int as 4 bytes following the opcode, which is cheaper to unpickle than the
-(perhaps) 11-character decimal string attached to INT. Protocol 1 also added
-a number of opcodes that operate on many stack elements at once (like APPENDS
-and SETITEMS), and "shortcut" opcodes (like EMPTY_DICT and EMPTY_TUPLE).
-
-The third major set of additions came in Python 2.3, and is called "protocol
-2". This added:
-
-- A better way to pickle instances of new-style classes (NEWOBJ).
-
-- A way for a pickle to identify its protocol (PROTO).
-
-- Time- and space- efficient pickling of long ints (LONG{1,4}).
-
-- Shortcuts for small tuples (TUPLE{1,2,3}}.
-
-- Dedicated opcodes for bools (NEWTRUE, NEWFALSE).
-
-- The "extension registry", a vector of popular objects that can be pushed
- efficiently by index (EXT{1,2,4}). This is akin to the memo and GET, but
- the registry contents are predefined (there's nothing akin to the memo's
- PUT).
-
-Another independent change with Python 2.3 is the abandonment of any
-pretense that it might be safe to load pickles received from untrusted
-parties -- no sufficient security analysis has been done to guarantee
-this and there isn't a use case that warrants the expense of such an
-analysis.
-
-To this end, all tests for __safe_for_unpickling__ or for
-copyreg.safe_constructors are removed from the unpickling code.
-References to these variables in the descriptions below are to be seen
-as describing unpickling in Python 2.2 and before.
-"""
+# "A pickle" is a program for a virtual pickle machine (PM, but more accurately
+# called an unpickling machine). It's a sequence of opcodes, interpreted by the
+# PM, building an arbitrarily complex Python object.
+#
+# For the most part, the PM is very simple: there are no looping, testing, or
+# conditional instructions, no arithmetic and no function calls. Opcodes are
+# executed once each, from first to last, until a STOP opcode is reached.
+#
+# The PM has two data areas, "the stack" and "the memo".
+#
+# Many opcodes push Python objects onto the stack; e.g., INT pushes a Python
+# integer object on the stack, whose value is gotten from a decimal string
+# literal immediately following the INT opcode in the pickle bytestream. Other
+# opcodes take Python objects off the stack. The result of unpickling is
+# whatever object is left on the stack when the final STOP opcode is executed.
+#
+# The memo is simply an array of objects, or it can be implemented as a dict
+# mapping little integers to objects. The memo serves as the PM's "long term
+# memory", and the little integers indexing the memo are akin to variable
+# names. Some opcodes pop a stack object into the memo at a given index,
+# and others push a memo object at a given index onto the stack again.
+#
+# At heart, that's all the PM has. Subtleties arise for these reasons:
+#
+# + Object identity. Objects can be arbitrarily complex, and subobjects
+# may be shared (for example, the list [a, a] refers to the same object a
+# twice). It can be vital that unpickling recreate an isomorphic object
+# graph, faithfully reproducing sharing.
+#
+# + Recursive objects. For example, after "L = []; L.append(L)", L is a
+# list, and L[0] is the same list. This is related to the object identity
+# point, and some sequences of pickle opcodes are subtle in order to
+# get the right result in all cases.
+#
+# + Things pickle doesn't know everything about. Examples of things pickle
+# does know everything about are Python's builtin scalar and container
+# types, like ints and tuples. They generally have opcodes dedicated to
+# them. For things like module references and instances of user-defined
+# classes, pickle's knowledge is limited. Historically, many enhancements
+# have been made to the pickle protocol in order to do a better (faster,
+# and/or more compact) job on those.
+#
+# + Backward compatibility and micro-optimization. As explained below,
+# pickle opcodes never go away, not even when better ways to do a thing
+# get invented. The repertoire of the PM just keeps growing over time.
+# For example, protocol 0 had two opcodes for building Python integers (INT
+# and LONG), protocol 1 added three more for more-efficient pickling of short
+# integers, and protocol 2 added two more for more-efficient pickling of
+# long integers (before protocol 2, the only ways to pickle a Python long
+# took time quadratic in the number of digits, for both pickling and
+# unpickling). "Opcode bloat" isn't so much a subtlety as a source of
+# wearying complication.
+#
+#
+# Pickle protocols:
+#
+# For compatibility, the meaning of a pickle opcode never changes. Instead new
+# pickle opcodes get added, and each version's unpickler can handle all the
+# pickle opcodes in all protocol versions to date. So old pickles continue to
+# be readable forever. The pickler can generally be told to restrict itself to
+# the subset of opcodes available under previous protocol versions too, so that
+# users can create pickles under the current version readable by older
+# versions. However, a pickle does not contain its version number embedded
+# within it. If an older unpickler tries to read a pickle using a later
+# protocol, the result is most likely an exception due to seeing an unknown (in
+# the older unpickler) opcode.
+#
+# The original pickle used what's now called "protocol 0", and what was called
+# "text mode" before Python 2.3. The entire pickle bytestream is made up of
+# printable 7-bit ASCII characters, plus the newline character, in protocol 0.
+# That's why it was called text mode. Protocol 0 is small and elegant, but
+# sometimes painfully inefficient.
+#
+# The second major set of additions is now called "protocol 1", and was called
+# "binary mode" before Python 2.3. This added many opcodes with arguments
+# consisting of arbitrary bytes, including NUL bytes and unprintable "high bit"
+# bytes. Binary mode pickles can be substantially smaller than equivalent
+# text mode pickles, and sometimes faster too; e.g., BININT represents a 4-byte
+# int as 4 bytes following the opcode, which is cheaper to unpickle than the
+# (perhaps) 11-character decimal string attached to INT. Protocol 1 also added
+# a number of opcodes that operate on many stack elements at once (like APPENDS
+# and SETITEMS), and "shortcut" opcodes (like EMPTY_DICT and EMPTY_TUPLE).
+#
+# The third major set of additions came in Python 2.3, and is called "protocol
+# 2". This added:
+#
+# - A better way to pickle instances of new-style classes (NEWOBJ).
+#
+# - A way for a pickle to identify its protocol (PROTO).
+#
+# - Time- and space- efficient pickling of long ints (LONG{1,4}).
+#
+# - Shortcuts for small tuples (TUPLE{1,2,3}}.
+#
+# - Dedicated opcodes for bools (NEWTRUE, NEWFALSE).
+#
+# - The "extension registry", a vector of popular objects that can be pushed
+# efficiently by index (EXT{1,2,4}). This is akin to the memo and GET, but
+# the registry contents are predefined (there's nothing akin to the memo's
+# PUT).
+#
+# Another independent change with Python 2.3 is the abandonment of any
+# pretense that it might be safe to load pickles received from untrusted
+# parties -- no sufficient security analysis has been done to guarantee
+# this and there isn't a use case that warrants the expense of such an
+# analysis.
+#
+# To this end, all tests for __safe_for_unpickling__ or for
+# copyreg.safe_constructors are removed from the unpickling code.
+# References to these variables in the descriptions below are to be seen
+# as describing unpickling in Python 2.2 and before.
+
# Meta-rule: Descriptions are stored in instances of descriptor objects,
# with plain constructors. No meta-language is defined from which