diff options
author | Mark Shannon <mark@hotpy.org> | 2022-04-11 15:05:20 (GMT) |
---|---|---|
committer | GitHub <noreply@github.com> | 2022-04-11 15:05:20 (GMT) |
commit | f6e43e834c9712d548ce9dd2f7a1d9033e45ce51 (patch) | |
tree | 35d08fe781204544a1c79a065ed4cd144cb0d148 | |
parent | 5f2abae61ec69264b835dcabe2cdabe57b9a990e (diff) | |
download | cpython-f6e43e834c9712d548ce9dd2f7a1d9033e45ce51.zip cpython-f6e43e834c9712d548ce9dd2f7a1d9033e45ce51.tar.gz cpython-f6e43e834c9712d548ce9dd2f7a1d9033e45ce51.tar.bz2 |
GH-89480: Document motivation, design and implementation of 3.11 frame stack. (GH-32304)
-rw-r--r-- | Include/internal/pycore_frame.h | 8 | ||||
-rw-r--r-- | Misc/NEWS.d/next/Core and Builtins/2022-04-04-15-12-38.bpo-45317.UDLOt8.rst | 1 | ||||
-rw-r--r-- | Objects/frame_layout.md | 122 |
3 files changed, 131 insertions, 0 deletions
diff --git a/Include/internal/pycore_frame.h b/Include/internal/pycore_frame.h index 49bdc63..405afd6 100644 --- a/Include/internal/pycore_frame.h +++ b/Include/internal/pycore_frame.h @@ -7,6 +7,11 @@ extern "C" { #include <stdbool.h> #include <stddef.h> +/* See Objects/frame_layout.md for an explanation of the frame stack + * including explanation of the PyFrameObject and _PyInterpreterFrame + * structs. */ + + struct _frame { PyObject_HEAD PyFrameObject *f_back; /* previous frame, or NULL */ @@ -40,12 +45,14 @@ enum _frameowner { }; typedef struct _PyInterpreterFrame { + /* "Specials" section */ PyFunctionObject *f_func; /* Strong reference */ PyObject *f_globals; /* Borrowed reference */ PyObject *f_builtins; /* Borrowed reference */ PyObject *f_locals; /* Strong reference, may be NULL */ PyCodeObject *f_code; /* Strong reference */ PyFrameObject *frame_obj; /* Strong reference, may be NULL */ + /* Linkage section */ struct _PyInterpreterFrame *previous; // NOTE: This is not necessarily the last instruction started in the given // frame. Rather, it is the code unit *prior to* the *next* instruction. For @@ -55,6 +62,7 @@ typedef struct _PyInterpreterFrame { int stacktop; /* Offset of TOS from localsplus */ bool is_entry; // Whether this is the "root" frame for the current _PyCFrame. char owner; + /* Locals and stack */ PyObject *localsplus[1]; } _PyInterpreterFrame; diff --git a/Misc/NEWS.d/next/Core and Builtins/2022-04-04-15-12-38.bpo-45317.UDLOt8.rst b/Misc/NEWS.d/next/Core and Builtins/2022-04-04-15-12-38.bpo-45317.UDLOt8.rst new file mode 100644 index 0000000..dd2da46 --- /dev/null +++ b/Misc/NEWS.d/next/Core and Builtins/2022-04-04-15-12-38.bpo-45317.UDLOt8.rst @@ -0,0 +1 @@ +Add internal documentation explaining design of new (for 3.11) frame stack. diff --git a/Objects/frame_layout.md b/Objects/frame_layout.md new file mode 100644 index 0000000..11688f6 --- /dev/null +++ b/Objects/frame_layout.md @@ -0,0 +1,122 @@ +# The Frame Stack + +Each call to a Python function has an activation record, +commonly known as a "frame". +Python semantics allows frames to outlive the activation, +so they have (before 3.11) been allocated on the heap. +This is expensive as it requires many allocations and +results in poor locality of reference. + +In 3.11, rather than have these frames scattered about memory, +as happens for heap-allocated objects, frames are allocated +contiguously in a per-thread stack. +This improves performance significantly for two reasons: +* It reduces allocation overhead to a pointer comparison and increment. +* Stack allocated data has the best possible locality and will always be in + CPU cache. + +Generator and coroutines still need heap allocated activation records, but +can be linked into the per-thread stack so as to not impact performance too much. + +## Layout + +Each activation record consists of four conceptual sections: + +* Local variables (including arguments, cells and free variables) +* Evaluation stack +* Specials: The per-frame object references needed by the VM: globals dict, + code object, etc. +* Linkage: Pointer to the previous activation record, stack depth, etc. + +### Layout + +The specials and linkage sections are a fixed size, so are grouped together. + +Each activation record is laid out as: +* Specials and linkage +* Locals +* Stack + +This seems to provide the best performance without excessive complexity. +It needs the interpreter to hold two pointers, a frame pointer and a stack pointer. + +#### Alternative layout + +An alternative layout that was used for part of 3.11 alpha was: + +* Locals +* Specials and linkage +* Stack + +This has the advantage that no copying is required when making a call, +as the arguments on the stack are (usually) already in the correct +location for the parameters. However, it requires the VM to maintain +an extra pointer for the locals, which can hurt performance. + +A variant that only needs the need two pointers is to reverse the numbering +of the locals, so that the last one is numbered `0`, and the first in memory +is numbered `N-1`. +This allows the locals, specials and linkage to accessed from the frame pointer. +We may implement this in the future. + +#### Note: + +> In a contiguous stack, we would need to save one fewer registers, as the +> top of the caller's activation record would be the same at the base of the +> callee's. However, since some activation records are kept on the heap we +> cannot do this. + +### Generators and Coroutines + +Generators and coroutines contain a `_PyInterpreterFrame` +The specials sections contains the following pointers: + +* Globals dict +* Builtins dict +* Locals dict (not the "fast" locals, but the locals for eval and class creation) +* Code object +* Heap allocated `PyFrameObject` for this activation record, if any. +* The function. + +The pointer to the function is not strictly required, but it is cheaper to +store a strong reference to the function and borrowed references to the globals +and builtins, than strong references to both globals and builtins. + +### Frame objects + +When creating a backtrace or when calling `sys._getframe()` the frame becomes +visible to Python code. When this happens a new `PyFrameObject` is created +and a strong reference to it placed in the `frame_obj` field of the specials +section. The `frame_obj` field is initially `NULL`. + +The `PyFrameObject` may outlive a stack-allocated `_PyInterpreterFrame`. +If it does then `_PyInterpreterFrame` is copied into the `PyFrameObject`, +except the evaluation stack which must be empty at this point. +The linkage section is updated to reflect the new location of the frame. + +This mechanism provides the appearance of persistent, heap-allocated +frames for each activation, but with low runtime overhead. + +### Generators and Coroutines + + +Generator objects have a `_PyInterpreterFrame` embedded in them. +This means that creating a generator requires only a single allocation, +reducing allocation overhead and improving locality of reference. +The embedded frame is linked into the per-thread frame when iterated or +awaited. + +If a frame object associated with a generator outlives the generator, then +the embedded `_PyInterpreterFrame` is copied into the frame object. + + +All the above applies to coroutines and async generators as well. + +### Field names + +Many of the fields in `_PyInterpreterFrame` were copied from the 3.10 `PyFrameObject`. +Thus, some of the field names may be a bit misleading. + +For example the `f_globals` field has a `f_` prefix implying it belongs to the +`PyFrameObject` struct, although it belongs to the `_PyInterpreterFrame` struct. +We may rationalize this naming scheme for 3.12.
\ No newline at end of file |