summaryrefslogtreecommitdiffstats
path: root/Python/optimizer_analysis.c
Commit message (Collapse)AuthorAgeFilesLines
* gh-134584: Remove custom float decref ops (GH-142576)Ken Jin12 days1-9/+2
|
* gh-142276: Watch attribute loads when promoting JIT constants (GH-142303)Ken Jin2025-12-081-1/+3
| | | | Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com> Co-authored-by: Savannah Ostrowski <savannah@python.org>
* gh-141976: Check stack bounds in JIT optimizer (GH-142201)Ken Jin2025-12-041-4/+21
|
* GH-141794: Limit size of generated machine code. (GH-142228)Mark Shannon2025-12-031-0/+30
| | | | | | * Factor out bodies of the largest uops, to reduce jit code size. * Factor out common assert, also reducing jit code size. * Limit size of jitted code for a single executor to 1MB.
* gh-139109: A new tracing JIT compiler frontend for CPython (GH-140310)Ken Jin2025-11-131-25/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This PR changes the current JIT model from trace projection to trace recording. Benchmarking: better pyperformance (about 1.7% overall) geomean versus current https://raw.githubusercontent.com/facebookexperimental/free-threading-benchmarking/refs/heads/main/results/bm-20251108-3.15.0a1%2B-7e2bc1d-JIT/bm-20251108-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-7e2bc1d-vs-base.svg, 100% faster Richards on the most improved benchmark versus the current JIT. Slowdown of about 10-15% on the worst benchmark versus the current JIT. **Note: the fastest version isn't the one merged, as it relies on fixing bugs in the specializing interpreter, which is left to another PR**. The speedup in the merged version is about 1.1%. https://raw.githubusercontent.com/facebookexperimental/free-threading-benchmarking/refs/heads/main/results/bm-20251112-3.15.0a1%2B-f8a764a-JIT/bm-20251112-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-f8a764a-vs-base.svg Stats: 50% more uops executed, 30% more traces entered the last time we ran them. It also suggests our trace lengths for a real trace recording JIT are too short, as a lot of trace too long aborts https://github.com/facebookexperimental/free-threading-benchmarking/blob/main/results/bm-20251023-3.15.0a1%2B-eb73378-CLANG%2CJIT/bm-20251023-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-eb73378-pystats-vs-base.md . This new JIT frontend is already able to record/execute significantly more instructions than the previous JIT frontend. In this PR, we are now able to record through custom dunders, simple object creation, generators, etc. None of these were done by the old JIT frontend. Some custom dunders uops were discovered to be broken as part of this work gh-140277 The optimizer stack space check is disabled, as it's no longer valid to deal with underflow. Pros: * Ignoring the generated tracer code as it's automatically created, this is only additional 1k lines of code. The maintenance burden is handled by the DSL and code generator. * `optimizer.c` is now significantly simpler, as we don't have to do strange things to recover the bytecode from a trace. * The new JIT frontend is able to handle a lot more control-flow than the old one. * Tracing is very low overhead. We use the tail calling interpreter/computed goto interpreter to switch between tracing mode and non-tracing mode. I call this mechanism dual dispatch, as we have two dispatch tables dispatching to each other. Specialization is still enabled while tracing. * Better handling of polymorphism. We leverage the specializing interpreter for this. Cons: * (For now) requires tail calling interpreter or computed gotos. This means no Windows JIT for now :(. Not to fret, tail calling is coming soon to Windows though https://github.com/python/cpython/pull/139962 Design: * After each instruction, the `record_previous_inst` function/label is executed. This does as the name suggests. * The tracing interpreter lowers bytecode to uops directly so that it can obtain "fresh" values at the point of lowering. * The tracing version behaves nearly identical to the normal interpreter, in fact it even has specialization! This allows it to run without much of a slowdown when tracing. The actual cost of tracing is only a function call and writes to memory. * The tracing interpreter uses the specializing interpreter's deopt to naturally form the side exit chains. This allows it to side exit chain effectively, without repeating much code. We force a re-specializing when tracing a deopt. * The tracing interpreter can even handle goto errors/exceptions, but I chose to disable them for now as it's not tested. * Because we do not share interpreter dispatch, there is should be no significant slowdown to the original specializing interpreter on tailcall and computed got with JIT disabled. With JIT enabled, there might be a slowdown in the form of the JIT trying to trace. * Things that could have dynamic instruction pointer effects are guarded on. The guard deopts to a new instruction --- `_DYNAMIC_EXIT`.
* GH-138378: Move globals-to-consts pass into main optimizer pass (GH-138379)Mark Shannon2025-09-181-207/+31
|
* gh-132732: Clear errors in JIT optimizer on error (GH-136048)Ken Jin2025-09-151-3/+10
|
* gh-137136: Suppress build warnings when build on Windows with ↵AN Long2025-09-031-1/+1
| | | | --experimental-jit-interpreter (GH-137137)
* gh-137728 gh-137762: Fix bugs in the JIT with many local variables (GH-137764)Ken Jin2025-08-201-4/+1
|
* GH-132732: Use pure op machinery to optimize `COMPARE_OP_INT/FLOAT/STR` ↵Savannah Bailey2025-07-261-0/+1
| | | | | (#137062) Co-authored-by: Ken Jin <kenjin4096@gmail.com>
* gh-132732: Automatically constant evaluate pure operations (GH-132733)Ken Jin2025-06-271-0/+7
| | | | | | This adds a "macro" to the optimizer DSL called "REPLACE_OPCODE_IF_EVALUATES_PURE", which allows automatically constant evaluating a bytecode body if certain inputs have no side effects upon evaluations (such as ints, strings, and floats). Co-authored-by: Tomas R. <tomas.roun8@gmail.com>
* gh-134584: Specialize POP_TOP by reference and type in JIT (GH-135761)Ken Jin2025-06-231-1/+1
|
* gh-135608: Add a null check for attribute promotion to fix a JIT crash ↵Ken Jin2025-06-201-0/+4
| | | | | (GH-135613) Co-authored-by: devdanzin <74280297+devdanzin@users.noreply.github.com>
* GH-135379: Specialize int operations for compact ints only (GH-135668)Mark Shannon2025-06-191-0/+3
|
* gh-134584: Decref elimination for float ops in the JIT (GH-134588)Ken Jin2025-06-171-4/+11
| | | This PR adds a PyJitRef API to the JIT's optimizer that mimics the _PyStackRef API. This allows it to track references and their stack lifetimes properly. Thus opening up the doorway to refcount elimination in the JIT.
* gh-131798: Small improvements to `remove_unneeded_uops` (GH-134554)Tomas R.2025-05-231-1/+2
| | | Improve remove_unneeded_uops
* GH-131798: Optimize away isinstance calls in the JIT (GH-134369)Tomas R.2025-05-221-8/+29
|
* GH-131798: Optimize cached class attributes and methods in the JIT (GH-134403)Brandt Bucher2025-05-221-1/+30
|
* GH-131798: Narrow types more aggressively in the JIT (GH-134373)Brandt Bucher2025-05-201-41/+33
|
* Remove duplicate includes: Python/{bytecodes,ceval,optimizer_analysis}.c ↵Adam Turner2025-05-011-3/+0
| | | | (#132622)
* GH-130415: Improve the JIT's unneeded uop removal pass (GH-132333)Brandt Bucher2025-04-211-15/+38
|
* GH-131726: Split up _CHECK_VALIDITY_AND_SET_IP (GH-131810)Brandt Bucher2025-04-011-17/+2
|
* gh-130704: Strength reduce `LOAD_FAST{_LOAD_FAST}` (#130708)mpage2025-04-011-0/+1
| | | Optimize `LOAD_FAST` opcodes into faster versions that load borrowed references onto the operand stack when we can prove that the lifetime of the local outlives the lifetime of the temporary that is loaded onto the stack.
* GH-131498: Remove conditional stack effects (GH-131499)Mark Shannon2025-03-201-0/+1
| | | * Adds some missing #includes
* GH-130415: Use boolean guards to narrow types to values in the JIT (GH-130659)Brandt Bucher2025-03-021-21/+2
|
* GH-130296: Avoid stack transients in four instructions. (GH-130310)Mark Shannon2025-02-281-50/+37
| | | | | | | | | * Combine _GUARD_GLOBALS_VERSION_PUSH_KEYS and _LOAD_GLOBAL_MODULE_FROM_KEYS into _LOAD_GLOBAL_MODULE * Combine _GUARD_BUILTINS_VERSION_PUSH_KEYS and _LOAD_GLOBAL_BUILTINS_FROM_KEYS into _LOAD_GLOBAL_BUILTINS * Combine _CHECK_ATTR_MODULE_PUSH_KEYS and _LOAD_ATTR_MODULE_FROM_KEYS into _LOAD_ATTR_MODULE * Remove stack transient in LOAD_ATTR_WITH_HINT
* GH-129715: Remove _DYNAMIC_EXIT (GH-129716)Brandt Bucher2025-02-071-1/+0
|
* GH-128914: Remove all but one conditional stack effects (GH-129226)Mark Shannon2025-01-271-2/+6
| | | | | | | | | | | | | * Remove all 'if (0)' and 'if (1)' conditional stack effects * Use array instead of conditional for BUILD_SLICE args * Refactor LOAD_GLOBAL to use a common conditional uop * Remove conditional stack effects from LOAD_ATTR specializations * Replace conditional stack effects in LOAD_ATTR with a 0 or 1 sized array. * Remove conditional stack effects from CALL_FUNCTION_EX
* Revert "GH-128914: Remove conditional stack effects from `bytecodes.c` and ↵Sam Gross2025-01-231-2/+2
| | | | | | | the code generators (GH-128918)" (GH-129202) The commit introduced a ~2.5-3% regression in the free threading build. This reverts commit ab61d3f4303d14a413bc9ae6557c730ffdf7579e.
* GH-128914: Remove conditional stack effects from `bytecodes.c` and the code ↵Mark Shannon2025-01-201-2/+2
| | | | generators (GH-128918)
* GH-128939: Refactor JIT optimize structs (GH-128940)Mark Shannon2025-01-201-6/+10
|
* gh-115999: Specialize loading attributes from modules in free-threaded ↵mpage2024-12-131-1/+1
| | | | | | | | | builds (#127711) We use the same approach that was used for specialization of LOAD_GLOBAL in free-threaded builds: _CHECK_ATTR_MODULE is renamed to _CHECK_ATTR_MODULE_PUSH_KEYS; it pushes the keys object for the following _LOAD_ATTR_MODULE_FROM_KEYS (nee _LOAD_ATTR_MODULE). This arrangement avoids having to recheck the keys version. _LOAD_ATTR_MODULE is renamed to _LOAD_ATTR_MODULE_FROM_KEYS; it loads the value from the keys object pushed by the preceding _CHECK_ATTR_MODULE_PUSH_KEYS at the cached index.
* gh-120619: Strength reduce function guards, support 2-operand uop forms ↵Ken Jin2024-11-091-14/+14
| | | | | (GH-124846) Co-authored-by: Brandt Bucher <brandtbucher@gmail.com>
* gh-115999: Refactor `LOAD_GLOBAL` specializations to avoid reloading ↵mpage2024-10-091-1/+41
| | | | | | | | | | | | | | | | | | {globals, builtins} keys (gh-124953) Each of the `LOAD_GLOBAL` specializations is implemented roughly as: 1. Load keys version. 2. Load cached keys version. 3. Deopt if (1) and (2) don't match. 4. Load keys. 5. Load cached index into keys. 6. Load object from (4) at offset from (5). This is not thread-safe in free-threaded builds; the keys object may be replaced in between steps (3) and (4). This change refactors the specializations to avoid reloading the keys object and instead pass the keys object from guards to be consumed by downstream uops.
* gh-120619: Optimize through `_Py_FRAME_GENERAL` (GH-124518)Ken Jin2024-10-021-0/+24
| | | | | * Optimize through _Py_FRAME_GENERAL * refactor
* gh-124296: Remove private dictionary version tag (PEP 699) (#124472)Sam Gross2024-10-011-2/+2
|
* gh-123923: Defer refcounting for `f_funcobj` in `_PyInterpreterFrame` (#124026)Sam Gross2024-09-241-1/+1
| | | | | | Use a `_PyStackRef` and defer the reference to `f_funcobj` when possible. This avoids some reference count contention in the common case of executing the same code object from multiple threads concurrently in the free-threaded build.
* GH-118095: Add tier two support for BINARY_SUBSCR_GETITEM (GH-120793)Mark Shannon2024-08-011-9/+2
|
* gh-120642: Move private PyCode APIs to the internal C API (#120643)Victor Stinner2024-06-261-1/+0
| | | | | | | | | | | | | | | * Move _Py_CODEUNIT and related functions to pycore_code.h. * Move _Py_BackoffCounter to pycore_backoff.h. * Move Include/cpython/optimizer.h content to pycore_optimizer.h. * Remove Include/cpython/optimizer.h. * Remove PyUnstable_Replace_Executor(). Rename functions: * PyUnstable_GetExecutor() => _Py_GetExecutor() * PyUnstable_GetOptimizer() => _Py_GetOptimizer() * PyUnstable_SetOptimizer() => _Py_SetTier2Optimizer() * PyUnstable_Optimizer_NewCounter() => _PyOptimizer_NewCounter() * PyUnstable_Optimizer_NewUOpOptimizer() => _PyOptimizer_NewUOpOptimizer()
* GH-120982: Add stack check assertions to generated interpreter code (GH-120992)Mark Shannon2024-06-251-0/+5
|
* GH-120619: Clean up `RETURN_VALUE` instruction (GH-120624)Mark Shannon2024-06-171-3/+3
| | | | | * Rename _POP_FRAME to _RETURN_VALUE as it returns a value as well as popping a frame. * Remove remaining _POP_FRAMEs
* gh-119258: Eliminate Type Guards in Tier 2 Optimizer with Watcher (GH-119365)Saul Shanabrook2024-06-081-1/+15
| | | | | | | Co-authored-by: parmeggiani <parmeggiani@spaziodati.eu> Co-authored-by: dpdani <git@danieleparmeggiani.me> Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com> Co-authored-by: Brandt Bucher <brandtbucher@microsoft.com> Co-authored-by: Ken Jin <kenjin@python.org>
* GH-118910: Less boilerplate in the tier 2 optimizer (#118913)Mark Shannon2024-05-101-46/+34
|
* gh-118335: Configure Tier 2 interpreter at build time (#118339)Guido van Rossum2024-05-011-0/+4
| | | | | | | | | | | | | | | | | | | | | | The code for Tier 2 is now only compiled when configured with `--enable-experimental-jit[=yes|interpreter]`. We drop support for `PYTHON_UOPS` and -`Xuops`, but you can disable the interpreter or JIT at runtime by setting `PYTHON_JIT=0`. You can also build it without enabling it by default using `--enable-experimental-jit=yes-off`; enable with `PYTHON_JIT=1`. On Windows, the `build.bat` script supports `--experimental-jit`, `--experimental-jit-off`, `--experimental-interpreter`. In the C code, `_Py_JIT` is defined as before when the JIT is enabled; the new variable `_Py_TIER2` is defined when the JIT *or* the interpreter is enabled. It is actually a bitmask: 1: JIT; 2: default-off; 4: interpreter.
* GH-118095: Handle `RETURN_GENERATOR` in tier 2 (GH-118180)Mark Shannon2024-04-251-1/+1
|
* GH-115419: Move setting the instruction pointer to error exit stubs (GH-118088)Mark Shannon2024-04-241-3/+0
|
* GH-115480: Reduce guard strength for binary ops when type of one operand is ↵Mark Shannon2024-04-221-0/+1
| | | | known already (GH-118050)
* GH-116202: Incorporate invalidation check into _START_EXECUTOR. (GH-118044)Mark Shannon2024-04-191-0/+3
|
* GH-115419: Improve list of escaping functions (GH-118054)Mark Shannon2024-04-191-1/+3
|
* GH-115419: Tidy up tier 2 optimizer. Merge peephole pass into main pass ↵Mark Shannon2024-04-181-124/+39
| | | | (GH-117997)