summaryrefslogtreecommitdiffstats
path: root/Python/specialize.c
Commit message (Collapse)AuthorAgeFilesLines
* gh-115999: Specialize `CALL_KW` in free-threaded builds (#127713)mpage2024-12-111-13/+4
| | | | | | | | | | | | | * Enable specialization of CALL_KW * Fix bug pushing frame in _PY_FRAME_KW `_PY_FRAME_KW` pushes a pointer to the new frame onto the stack for consumption by the next uop. When pushing the frame fails, we do not want to push the result, `NULL`, to the stack because it is not a valid stackref. This works in the default build because `PyStackRef_NULL` and `NULL` are the same value, so the `PyStackRef_XCLOSE()` in the error handler ignores it. In the free-threaded build the values are not the same; `PyStackRef_XCLOSE()` will attempt to decref a null pointer.
* gh-125610: Fix `STORE_ATTR_INSTANCE_VALUE` specialization check (GH-125612)Sam Gross2024-12-061-1/+4
| | | | The `STORE_ATTR_INSTANCE_VALUE` opcode doesn't support objects with non-NULL managed dictionaries, so don't specialize to that op in that case.
* gh-115999: Enable specialization of `CALL` instructions in free-threaded ↵mpage2024-12-031-46/+66
| | | | | | | | | | | | | | | | | | | | | | builds (#127123) The CALL family of instructions were mostly thread-safe already and only required a small number of changes, which are documented below. A few changes were needed to make CALL_ALLOC_AND_ENTER_INIT thread-safe: Added _PyType_LookupRefAndVersion, which returns the type version corresponding to the returned ref. Added _PyType_CacheInitForSpecialization, which takes an init method and the corresponding type version and only populates the specialization cache if the current type version matches the supplied version. This prevents potentially caching a stale value in free-threaded builds if we race with an update to __init__. Only cache __init__ functions that are deferred in free-threaded builds. This ensures that the reference to __init__ that is stored in the specialization cache is valid if the type version guard in _CHECK_AND_ALLOCATE_OBJECT passes. Fix a bug in _CREATE_INIT_FRAME where the frame is pushed to the stack on failure. A few other miscellaneous changes were also needed: Use {LOCK,UNLOCK}_OBJECT in LIST_APPEND. This ensures that the list's per-object lock is held while we are appending to it. Add missing co_tlbc for _Py_InitCleanup. Stop/start the world around setting the eval frame hook. This allows us to read interp->eval_frame non-atomically and preserves the behavior of _CHECK_PEP_523 documented below.
* gh-115999: Add free-threaded specialization for `SEND` (gh-127426)Neil Schemenauer2024-12-031-11/+4
| | | | | No additional thread safety changes are required. Note that sending to a generator that is shared between threads is currently not safe in the free-threaded build.
* gh-115999: Specialize `LOAD_SUPER_ATTR` in free-threaded builds (gh-127128)Neil Schemenauer2024-12-031-14/+5
| | | | | | Use existing helpers to atomically modify the bytecode. Add unit tests to ensure specializing is happening as expected. Add test_specialize.py that can be used with ThreadSanitizer to detect data races. Fix thread safety issue with cell_set_contents().
* gh-127518: Fix pystats build after #127169 (#127526)Michael Droettboom2024-12-021-2/+3
| | | gh-127518: Fix pystats build after #127619
* GH-126491: GC: Mark objects reachable from roots before doing cycle ↵Mark Shannon2024-12-021-0/+2
| | | | | | | | | | | | | collection (GH-127110) * Mark almost all reachable objects before doing collection phase * Add stats for objects marked * Visit new frames before each increment * Update docs * Clearer calculation of work to do.
* gh-115999: Add partial free-thread specialization for BINARY_SUBSCR (gh-127227)Donghee Na2024-12-021-14/+11
|
* gh-115999: Add free-threaded specialization for `STORE_SUBSCR` (#127169)Sam Gross2024-11-261-62/+60
| | | | | | | | | The specialization only depends on the type, so no special thread-safety considerations there. STORE_SUBSCR_LIST_INT needs to lock the list before modifying it. `_PyDict_SetItem_Take2` already internally locks the dictionary using a critical section.
* gh-115999: Record success in `specialize` (#127167)mpage2024-11-221-0/+1
| | | | | | | Record success in `specialize` This matches the existing behavior where we increment the success stat for the generic opcode each time we successfully specialize an instruction.
* gh-115999: Add free-threaded specialization for `UNPACK_SEQUENCE` (#126600)Kirill Podoprigora2024-11-221-18/+12
| | | | | | | | | | | Add free-threaded specialization for `UNPACK_SEQUENCE` opcode. `UNPACK_SEQUENCE_TUPLE/UNPACK_SEQUENCE_TWO_TUPLE` are already thread safe since tuples are immutable. `UNPACK_SEQUENCE_LIST` is not thread safe because of nature of lists (there is nothing preventing another thread from adding items to or removing them the list while the instruction is executing). To achieve thread safety we add a critical section to the implementation of `UNPACK_SEQUENCE_LIST`, especially around the parts where we check the size of the list and push items onto the stack. --------- Co-authored-by: Matt Page <mpage@meta.com> Co-authored-by: mpage <mpage@cs.stanford.edu>
* gh-115999: Add free-threaded specialization for ``TO_BOOL`` (gh-126616)Donghee Na2024-11-211-62/+67
|
* gh-115999: Specialize `LOAD_GLOBAL` in free-threaded builds (#126607)mpage2024-11-211-22/+24
| | | | | | | | | | | | | | Enable specialization of LOAD_GLOBAL in free-threaded builds. Thread-safety of specialization in free-threaded builds is provided by the following: A critical section is held on both the globals and builtins objects during specialization. This ensures we get an atomic view of both builtins and globals during specialization. Generation of new keys versions is made atomic in free-threaded builds. Existing helpers are used to atomically modify the opcode. Thread-safety of specialized instructions in free-threaded builds is provided by the following: Relaxed atomics are used when loading and storing dict keys versions. This avoids potential data races as the dict keys versions are read without holding the dictionary's per-object lock in version guards. Dicts keys objects are passed from keys version guards to the downstream uops. This ensures that we are loading from the correct offset in the keys object. Once a unicode key has been stored in a keys object for a combined dictionary in free-threaded builds, the offset that it is stored in will never be reused for a different key. Once the version guard passes, we know that we are reading from the correct offset. The dictionary read fast-path is used to read values from the dictionary once we know the correct offset.
* gh-115999: Don't take a reason in unspecialize (#127030)mpage2024-11-201-4/+9
| | | | | | | Don't take a reason in unspecialize We only want to compute the reason if stats are enabled. Optimizing compilers should optimize this away for us (gcc and clang do), but it's better to be safe than sorry.
* Revert "GH-126491: GC: Mark objects reachable from roots before doing cycle ↵Hugo van Kemenade2024-11-191-2/+0
| | | | collection (GH-126502)" (#126983)
* GH-126491: GC: Mark objects reachable from roots before doing cycle ↵Mark Shannon2024-11-181-0/+2
| | | | | | | | | | | | | | | | collection (GH-126502) * Mark almost all reachable objects before doing collection phase * Add stats for objects marked * Visit new frames before each increment * Remove lazy dict tracking * Update docs * Clearer calculation of work to do.
* gh-103951: enable optimization for fast attribute access on module ↵Sergey B Kirpichev2024-11-151-1/+1
| | | | | subclasses (GH-126264) Co-authored-by: Nicolas Tessore <n.tessore@ucl.ac.uk>
* gh-126513: Use helpers for `_Py_Specialize_ConstainsOp` (#126517)Kirill Podoprigora2024-11-061-17/+7
| | | | | * Use helpers for _Py_Specialize_ConstainsOp * Remove unnecessary variable
* gh-115999: Introduce helpers for (un)specializing instructions (#126414)mpage2024-11-061-48/+84
| | | | | | | | Introduce helpers for (un)specializing instructions Consolidate the code to specialize/unspecialize instructions into two helper functions and use them in `_Py_Specialize_BinaryOp`. The resulting code is more concise and keeps all of the logic at the point where we decide to specialize/unspecialize an instruction.
* gh-115999: Add free-threaded specialization for CONTAINS_OP (gh-126450)Donghee Na2024-11-061-4/+6
| | | | - The specialization logic determines the appropriate specialization using only the operand's type, which is safe to read non-atomically (changing it requires stopping the world). We are guaranteed that the type will not change in between when it is checked and when we specialize the bytecode because the types involved are immutable (you cannot assign to `__class__` for exact instances of `dict`, `set`, or `frozenset`). The bytecode is mutated atomically using helpers. - The specialized instructions rely on the operand type not changing in between the `DEOPT_IF` checks and the calls to the appropriate type-specific helpers (e.g. `_PySet_Contains`). This is a correctness requirement in the default builds and there are no changes to the opcodes in the free-threaded builds that would invalidate this.
* gh-115999: Implement thread-local bytecode and enable specialization for ↵mpage2024-11-041-19/+49
| | | | | | | | | `BINARY_OP` (#123926) Each thread specializes a thread-local copy of the bytecode, created on the first RESUME, in free-threaded builds. All copies of the bytecode for a code object are stored in the co_tlbc array on the code object. Threads reserve a globally unique index identifying its copy of the bytecode in all co_tlbc arrays at thread creation and release the index at thread destruction. The first entry in every co_tlbc array always points to the "main" copy of the bytecode that is stored at the end of the code object. This ensures that no bytecode is copied for programs that do not use threads. Thread-local bytecode can be disabled at runtime by providing either -X tlbc=0 or PYTHON_TLBC=0. Disabling thread-local bytecode also disables specialization. Concurrent modifications to the bytecode made by the specializing interpreter and instrumentation use atomics, with specialization taking care not to overwrite an instruction that was instrumented concurrently.
* GH-125837: Split `LOAD_CONST` into three. (GH-125972)Mark Shannon2024-10-291-0/+14
| | | | | | | | * Add LOAD_CONST_IMMORTAL opcode * Add LOAD_SMALL_INT opcode * Remove RETURN_CONST opcode
* gh-115999: Stop the world when invalidating function versions (#124997)mpage2024-10-081-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Stop the world when invalidating function versions The tier1 interpreter specializes `CALL` instructions based on the values of certain function attributes (e.g. `__code__`, `__defaults__`). The tier1 interpreter uses function versions to verify that the attributes of a function during execution of a specialization match those seen during specialization. A function's version is initialized in `MAKE_FUNCTION` and is invalidated when any of the critical function attributes are changed. The tier1 interpreter stores the function version in the inline cache during specialization. A guard is used by the specialized instruction to verify that the version of the function on the operand stack matches the cached version (and therefore has all of the expected attributes). It is assumed that once the guard passes, all attributes will remain unchanged while executing the rest of the specialized instruction. Stopping the world when invalidating function versions ensures that all critical function attributes will remain unchanged after the function version guard passes in free-threaded builds. It's important to note that this is only true if the remainder of the specialized instruction does not enter and exit a stop-the-world point. We will stop the world the first time any of the following function attributes are mutated: - defaults - vectorcall - kwdefaults - closure - code This should happen rarely and only happens once per function, so the performance impact on majority of code should be minimal. Additionally, refactor the API for manipulating function versions to more clearly match the stated semantics.
* GH-124284: Add stats for refcount operations on immortal objects (GH-124288)Mark Shannon2024-09-231-4/+8
|
* GH-123232: Fix "not specialized" stats (GH-123236)Mark Shannon2024-08-231-2/+12
|
* GH-123040: Specialize shadowed `LOAD_ATTR`. (GH-123219)Mark Shannon2024-08-231-136/+212
|
* GH-123197: Only count an instruction as deferred if it hasn't deopted first. ↵Mark Shannon2024-08-221-0/+1
| | | | | (GH-123222) Only count an instruction as deferred if hasn't deopted first.
* GH-118093: Specialize calls to non-vectorcall classes as ↵Brandt Bucher2024-08-221-5/+1
| | | | | `CALL_NON_PY_GENERAL` (GH-123212) Specialize classes without vectorcall as CALL_NON_PY_GENERAL
* GH-115776: Allow any fixed sized object to have inline values (GH-123192)Mark Shannon2024-08-211-6/+10
|
* GH-118093: Make `CALL_ALLOC_AND_ENTER_INIT` suitable for tier 2. (GH-123140)Mark Shannon2024-08-201-5/+1
| | | | | * Convert CALL_ALLOC_AND_ENTER_INIT to micro-ops such that tier 2 supports it * Allow inexact arguments for CALL_ALLOC_AND_ENTER_INIT.
* GH-118093: Specialize `CALL_KW` (GH-123006)Mark Shannon2024-08-161-0/+67
|
* GH-122390: Replace `_Py_GetbaseOpcode` with `_Py_GetBaseCodeUnit` (GH-122942)Mark Shannon2024-08-131-3/+3
|
* GH-118093: Add tier two support for LOAD_ATTR_PROPERTY (GH-122283)Brandt Bucher2024-07-251-5/+0
|
* GH-121583: Remove dependency from pystats.h to internal header file (GH-121587)Michael Droettboom2024-07-161-0/+4
| | | Co-authored-by: Peter Bierma <zintensitydev@gmail.com>
* gh-121082: Fix build failure when the developer use `--enable-pystats` ↵Nadeshiko Manju2024-06-271-2/+3
| | | | | | | | arguments in configuration command after #118450 (#121083) Signed-off-by: Manjusaka <me@manjusaka.me> Co-authored-by: Ken Jin <kenjin4096@gmail.com>
* gh-117139: Convert the evaluation stack to stack refs (#118450)Ken Jin2024-06-261-16/+43
| | | | | | | | | | | | | | | | | This PR sets up tagged pointers for CPython. The general idea is to create a separate struct _PyStackRef for everything on the evaluation stack to store the bits. This forces the C compiler to warn us if we try to cast things or pull things out of the struct directly. Only for free threading: We tag the low bit if something is deferred - that means we skip incref and decref operations on it. This behavior may change in the future if Mark's plans to defer all objects in the interpreter loop pans out. This implies a strict stack reference discipline is required. ALL incref and decref operations on stackrefs must use the stackref variants. It is unsafe to untag something then do normal incref/decref ops on it. The new incref and decref variants are called dup and close. They mimic a "handle" API operating on these stackrefs. Please read Include/internal/pycore_stackref.h for more information! --------- Co-authored-by: Mark Shannon <9448417+markshannon@users.noreply.github.com>
* Fix typos in comments (#120481)Xie Yanbo2024-06-201-1/+1
|
* gh-83754: Use the Py_TYPE() macro (#120599)Victor Stinner2024-06-171-1/+1
| | | | Don't access directly PyObject.ob_type, but use the Py_TYPE() macro instead.
* GH-118095: Use broader specializations of CALL in tier 1, for better tier 2 ↵Mark Shannon2024-05-041-105/+19
| | | | | | | | | | support of calls. (GH-118322) * Add CALL_PY_GENERAL, CALL_BOUND_METHOD_GENERAL and call CALL_NON_PY_GENERAL specializations. * Remove CALL_PY_WITH_DEFAULTS specialization * Use CALL_NON_PY_GENERAL in more cases when otherwise failing to specialize
* GH-118095: Unify the behavior of tier 2 FOR_ITER branch micro-ops (GH-118420)Mark Shannon2024-05-021-2/+4
| | | | | | * Target _FOR_ITER_TIER_TWO at POP_TOP following the matching END_FOR * Modify _GUARD_NOT_EXHAUSTED_RANGE, _GUARD_NOT_EXHAUSTED_LIST and _GUARD_NOT_EXHAUSTED_TUPLE so that they also target the POP_TOP following the matching END_FOR
* gh-112075: Make instance attributes stored in inline "dict" thread safe ↵Dino Viehland2024-04-221-2/+1
| | | | | (#114742) Make instance attributes stored in inline "dict" thread safe on free-threaded builds
* gh-115178: Add Counts of UOp Pairs to pystats (GH-115181)Jeff Glass2024-04-161-0/+9
|
* gh-116968: Reimplement Tier 2 counters (#117144)Guido van Rossum2024-04-041-5/+3
| | | | | | | | | | | | Introduce a unified 16-bit backoff counter type (``_Py_BackoffCounter``), shared between the Tier 1 adaptive specializer and the Tier 2 optimizer. The API used for adaptive specialization counters is changed but the behavior is (supposed to be) identical. The behavior of the Tier 2 counters is changed: - There are no longer dynamic thresholds (we never varied these). - All counters now use the same exponential backoff. - The counter for ``JUMP_BACKWARD`` starts counting down from 16. - The ``temperature`` in side exits starts counting down from 64.
* GH-115776: Embed the values array into the object, for "normal" Python ↵Mark Shannon2024-04-021-26/+25
| | | | objects. (GH-116115)
* A few minor tweaks to get stats working and compiling cleanly. (#117219)Mark Shannon2024-03-251-1/+0
| | | | Fixes a compilation error when configured with `--enable-pystats`, an array size issue, and an unused variable.
* gh-116996: Add pystats about _Py_uop_analyse_and_optimize (GH-116997)Michael Droettboom2024-03-211-0/+13
|
* gh-116381: Remove bad specializations, add fail stats (GH-116464)Ken Jin2024-03-071-14/+26
| | | * Remove bad specializations, add fail stats
* gh-116381: Specialize CONTAINS_OP (GH-116385)Ken Jin2024-03-061-0/+38
| | | | | | | | | | | * Specialize CONTAINS_OP * 📜🤖 Added by blurb_it. * Add PyAPI_FUNC for JIT --------- Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
* gh-115168: Add pystats counter for invalidated executors (GH-115169)Michael Droettboom2024-02-261-0/+1
|
* Tier 2 cleanups and tweaks (#115534)Guido van Rossum2024-02-201-8/+4
| | | | | | | | | * Rename `_testinternalcapi.get_{uop,counter}_optimizer` to `new_*_optimizer` * Use `_PyUOpName()` instead of` _PyOpcode_uop_name[]` * Add `target` to executor iterator items -- `list(ex)` now returns `(opcode, oparg, target, operand)` quadruples * Add executor methods `get_opcode()` and `get_oparg()` to get `vmdata.opcode`, `vmdata.oparg` * Define a helper for printing uops, and unify various places where they are printed * Add a hack to summarize_stats.py to fix legacy uop names (e.g. `POP_TOP` -> `_POP_TOP`) * Define helpers in `test_opt.py` for accessing the set or list of opnames of an executor