summaryrefslogtreecommitdiffstats
path: root/Objects/lnotab_notes.txt
diff options
context:
space:
mode:
Diffstat (limited to 'Objects/lnotab_notes.txt')
-rw-r--r--Objects/lnotab_notes.txt108
1 files changed, 100 insertions, 8 deletions
diff --git a/Objects/lnotab_notes.txt b/Objects/lnotab_notes.txt
index 71a2979..046f753 100644
--- a/Objects/lnotab_notes.txt
+++ b/Objects/lnotab_notes.txt
@@ -1,11 +1,103 @@
-All about co_lnotab, the line number table.
-
-Code objects store a field named co_lnotab. This is an array of unsigned bytes
-disguised as a Python bytes object. It is used to map bytecode offsets to
-source code line #s for tracebacks and to identify line number boundaries for
-line tracing. Because of internals of the peephole optimizer, it's possible
-for lnotab to contain bytecode offsets that are no longer valid (for example
-if the optimizer removed the last line in a function).
+Description of the internal format of the line number table
+
+Conceptually, the line number table consists of a sequence of triples:
+ start-offset (inclusive), end-offset (exclusive), line-number.
+
+Note that note all byte codes have a line number so we need handle `None` for the line-number.
+
+However, storing the above sequence directly would be very inefficient as we would need 12 bytes per entry.
+
+First of all, we can note that the end of one entry is the same as the start of the next, so we can overlap entries.
+Secondly we also note that we don't really need arbitrary access to the sequence, so we can store deltas.
+
+We just need to store (end - start, line delta) pairs. The start offset of the first entry is always zero.
+
+Thirdly, most deltas are small, so we can use a single byte for each value, as long we allow several entries for the same line.
+
+Consider the following table
+ Start End Line
+ 0 6 1
+ 6 50 2
+ 50 350 7
+ 350 360 No line number
+ 360 376 8
+ 376 380 208
+
+Stripping the redundant ends gives:
+
+ End-Start Line-delta
+ 6 +1
+ 44 +1
+ 300 +5
+ 10 No line number
+ 16 +1
+ 4 +200
+
+
+Note that the end - start value is always positive.
+
+Finally in order, to fit into a single byte we need to convert start deltas to the range 0 <= delta <= 254,
+and line deltas to the range -127 <= delta <= 127.
+A line delta of -128 is used to indicate no line number.
+A start delta of 255 is used as a sentinel to mark the end of the table.
+Also note that a delta of zero indicates that there are no bytecodes in the given range,
+which means can use an invalidate line number for that range.
+
+Final form:
+
+ Start delta Line delta
+ 6 +1
+ 44 +1
+ 254 +5
+ 46 0
+ 10 -128 (No line number, treated as a delta of zero)
+ 16 +1
+ 0 +127 (line 135, but the range is empty as no bytecodes are at line 135)
+ 4 +73
+ 255 (end mark) ---
+
+Iterating over the table.
+-------------------------
+
+For the `co_lines` attribute we want to emit the full form, omitting the (350, 360, No line number) and empty entries.
+
+The code is as follows:
+
+def co_lines(code):
+ line = code.co_firstlineno
+ end = 0
+ table_iter = iter(code.internal_line_table):
+ for sdelta, ldelta in table_iter:
+ if sdelta == 255:
+ break
+ if ldelta == 0: # No change to line number, just accumulate changes to end
+ end += odelta
+ continue
+ start = end
+ end = start + sdelta
+ if ldelta == -128: # No valid line number -- skip entry
+ continue
+ line += ldelta
+ if end == start: # Empty range, omit.
+ continue
+ yield start, end, line
+
+
+
+
+The historical co_lnotab format
+-------------------------------
+
+prior to 3.10 code objects stored a field named co_lnotab.
+This was an array of unsigned bytes disguised as a Python bytes object.
+
+The old co_lnotab did not account for the presence of bytecodes without a line number,
+nor was it well suited to tracing as a number of workarounds were required.
+
+The old format can still be accessed via `code.co_lnotab`, which is lazily computed from the new format.
+
+Below is the description of the old co_lnotab format:
+
The array is conceptually a compressed list of
(bytecode offset increment, line number increment)