diff options
Diffstat (limited to 'Objects/frame_layout.md')
| -rw-r--r-- | Objects/frame_layout.md | 122 |
1 files changed, 122 insertions, 0 deletions
diff --git a/Objects/frame_layout.md b/Objects/frame_layout.md new file mode 100644 index 0000000..11688f6 --- /dev/null +++ b/Objects/frame_layout.md @@ -0,0 +1,122 @@ +# The Frame Stack + +Each call to a Python function has an activation record, +commonly known as a "frame". +Python semantics allows frames to outlive the activation, +so they have (before 3.11) been allocated on the heap. +This is expensive as it requires many allocations and +results in poor locality of reference. + +In 3.11, rather than have these frames scattered about memory, +as happens for heap-allocated objects, frames are allocated +contiguously in a per-thread stack. +This improves performance significantly for two reasons: +* It reduces allocation overhead to a pointer comparison and increment. +* Stack allocated data has the best possible locality and will always be in + CPU cache. + +Generator and coroutines still need heap allocated activation records, but +can be linked into the per-thread stack so as to not impact performance too much. + +## Layout + +Each activation record consists of four conceptual sections: + +* Local variables (including arguments, cells and free variables) +* Evaluation stack +* Specials: The per-frame object references needed by the VM: globals dict, + code object, etc. +* Linkage: Pointer to the previous activation record, stack depth, etc. + +### Layout + +The specials and linkage sections are a fixed size, so are grouped together. + +Each activation record is laid out as: +* Specials and linkage +* Locals +* Stack + +This seems to provide the best performance without excessive complexity. +It needs the interpreter to hold two pointers, a frame pointer and a stack pointer. + +#### Alternative layout + +An alternative layout that was used for part of 3.11 alpha was: + +* Locals +* Specials and linkage +* Stack + +This has the advantage that no copying is required when making a call, +as the arguments on the stack are (usually) already in the correct +location for the parameters. However, it requires the VM to maintain +an extra pointer for the locals, which can hurt performance. + +A variant that only needs the need two pointers is to reverse the numbering +of the locals, so that the last one is numbered `0`, and the first in memory +is numbered `N-1`. +This allows the locals, specials and linkage to accessed from the frame pointer. +We may implement this in the future. + +#### Note: + +> In a contiguous stack, we would need to save one fewer registers, as the +> top of the caller's activation record would be the same at the base of the +> callee's. However, since some activation records are kept on the heap we +> cannot do this. + +### Generators and Coroutines + +Generators and coroutines contain a `_PyInterpreterFrame` +The specials sections contains the following pointers: + +* Globals dict +* Builtins dict +* Locals dict (not the "fast" locals, but the locals for eval and class creation) +* Code object +* Heap allocated `PyFrameObject` for this activation record, if any. +* The function. + +The pointer to the function is not strictly required, but it is cheaper to +store a strong reference to the function and borrowed references to the globals +and builtins, than strong references to both globals and builtins. + +### Frame objects + +When creating a backtrace or when calling `sys._getframe()` the frame becomes +visible to Python code. When this happens a new `PyFrameObject` is created +and a strong reference to it placed in the `frame_obj` field of the specials +section. The `frame_obj` field is initially `NULL`. + +The `PyFrameObject` may outlive a stack-allocated `_PyInterpreterFrame`. +If it does then `_PyInterpreterFrame` is copied into the `PyFrameObject`, +except the evaluation stack which must be empty at this point. +The linkage section is updated to reflect the new location of the frame. + +This mechanism provides the appearance of persistent, heap-allocated +frames for each activation, but with low runtime overhead. + +### Generators and Coroutines + + +Generator objects have a `_PyInterpreterFrame` embedded in them. +This means that creating a generator requires only a single allocation, +reducing allocation overhead and improving locality of reference. +The embedded frame is linked into the per-thread frame when iterated or +awaited. + +If a frame object associated with a generator outlives the generator, then +the embedded `_PyInterpreterFrame` is copied into the frame object. + + +All the above applies to coroutines and async generators as well. + +### Field names + +Many of the fields in `_PyInterpreterFrame` were copied from the 3.10 `PyFrameObject`. +Thus, some of the field names may be a bit misleading. + +For example the `f_globals` field has a `f_` prefix implying it belongs to the +`PyFrameObject` struct, although it belongs to the `_PyInterpreterFrame` struct. +We may rationalize this naming scheme for 3.12.
\ No newline at end of file |
