summaryrefslogtreecommitdiffstats
path: root/Python/optimizer_symbols.c
Commit message (Collapse)AuthorAgeFilesLines
* gh-139109: A new tracing JIT compiler frontend for CPython (GH-140310)Ken Jin2025-11-131-5/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This PR changes the current JIT model from trace projection to trace recording. Benchmarking: better pyperformance (about 1.7% overall) geomean versus current https://raw.githubusercontent.com/facebookexperimental/free-threading-benchmarking/refs/heads/main/results/bm-20251108-3.15.0a1%2B-7e2bc1d-JIT/bm-20251108-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-7e2bc1d-vs-base.svg, 100% faster Richards on the most improved benchmark versus the current JIT. Slowdown of about 10-15% on the worst benchmark versus the current JIT. **Note: the fastest version isn't the one merged, as it relies on fixing bugs in the specializing interpreter, which is left to another PR**. The speedup in the merged version is about 1.1%. https://raw.githubusercontent.com/facebookexperimental/free-threading-benchmarking/refs/heads/main/results/bm-20251112-3.15.0a1%2B-f8a764a-JIT/bm-20251112-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-f8a764a-vs-base.svg Stats: 50% more uops executed, 30% more traces entered the last time we ran them. It also suggests our trace lengths for a real trace recording JIT are too short, as a lot of trace too long aborts https://github.com/facebookexperimental/free-threading-benchmarking/blob/main/results/bm-20251023-3.15.0a1%2B-eb73378-CLANG%2CJIT/bm-20251023-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-eb73378-pystats-vs-base.md . This new JIT frontend is already able to record/execute significantly more instructions than the previous JIT frontend. In this PR, we are now able to record through custom dunders, simple object creation, generators, etc. None of these were done by the old JIT frontend. Some custom dunders uops were discovered to be broken as part of this work gh-140277 The optimizer stack space check is disabled, as it's no longer valid to deal with underflow. Pros: * Ignoring the generated tracer code as it's automatically created, this is only additional 1k lines of code. The maintenance burden is handled by the DSL and code generator. * `optimizer.c` is now significantly simpler, as we don't have to do strange things to recover the bytecode from a trace. * The new JIT frontend is able to handle a lot more control-flow than the old one. * Tracing is very low overhead. We use the tail calling interpreter/computed goto interpreter to switch between tracing mode and non-tracing mode. I call this mechanism dual dispatch, as we have two dispatch tables dispatching to each other. Specialization is still enabled while tracing. * Better handling of polymorphism. We leverage the specializing interpreter for this. Cons: * (For now) requires tail calling interpreter or computed gotos. This means no Windows JIT for now :(. Not to fret, tail calling is coming soon to Windows though https://github.com/python/cpython/pull/139962 Design: * After each instruction, the `record_previous_inst` function/label is executed. This does as the name suggests. * The tracing interpreter lowers bytecode to uops directly so that it can obtain "fresh" values at the point of lowering. * The tracing version behaves nearly identical to the normal interpreter, in fact it even has specialization! This allows it to run without much of a slowdown when tracing. The actual cost of tracing is only a function call and writes to memory. * The tracing interpreter uses the specializing interpreter's deopt to naturally form the side exit chains. This allows it to side exit chain effectively, without repeating much code. We force a re-specializing when tracing a deopt. * The tracing interpreter can even handle goto errors/exceptions, but I chose to disable them for now as it's not tested. * Because we do not share interpreter dispatch, there is should be no significant slowdown to the original specializing interpreter on tailcall and computed got with JIT disabled. With JIT enabled, there might be a slowdown in the form of the JIT trying to trace. * Things that could have dynamic instruction pointer effects are guarded on. The guard deopts to a new instruction --- `_DYNAMIC_EXIT`.
* gh-111489: Remove _PyTuple_FromArray() alias (#139973)Victor Stinner2025-10-111-2/+1
| | | | Replace _PyTuple_FromArray() with PyTuple_FromArray(). Remove pycore_tuple.h includes.
* GH-138378: Move globals-to-consts pass into main optimizer pass (GH-138379)Mark Shannon2025-09-181-0/+4
|
* gh-137136: Suppress build warnings when build on Windows with ↵AN Long2025-09-031-2/+2
| | | | --experimental-jit-interpreter (GH-137137)
* gh-137728 gh-137762: Fix bugs in the JIT with many local variables (GH-137764)Ken Jin2025-08-201-0/+7
|
* gh-132732: Fix up pure types in JIT (GH-136050)Ken Jin2025-06-281-1/+1
| | | Fix up pure types in JIT
* gh-132732: JIT: Only allow compact ints in pure evaluation (GH-136040)Ken Jin2025-06-271-2/+4
|
* gh-132732: Automatically constant evaluate pure operations (GH-132733)Ken Jin2025-06-271-0/+39
| | | | | | This adds a "macro" to the optimizer DSL called "REPLACE_OPCODE_IF_EVALUATES_PURE", which allows automatically constant evaluating a bytecode body if certain inputs have no side effects upon evaluations (such as ints, strings, and floats). Co-authored-by: Tomas R. <tomas.roun8@gmail.com>
* gh-134584: Specialize POP_TOP by reference and type in JIT (GH-135761)Ken Jin2025-06-231-3/+0
|
* gh-135379: Move PyLong_CheckCompact to private header and rename it (GH-135707)Ken Jin2025-06-191-3/+3
|
* GH-135379: Specialize int operations for compact ints only (GH-135668)Mark Shannon2025-06-191-11/+136
|
* gh-134584: Decref elimination for float ops in the JIT (GH-134588)Ken Jin2025-06-171-150/+182
| | | This PR adds a PyJitRef API to the JIT's optimizer that mimics the _PyStackRef API. This allows it to track references and their stack lifetimes properly. Thus opening up the doorway to refcount elimination in the JIT.
* GH-131798: Optimize cached class attributes and methods in the JIT (GH-134403)Brandt Bucher2025-05-221-33/+49
|
* GH-131798: Narrow types more aggressively in the JIT (GH-134373)Brandt Bucher2025-05-201-9/+31
|
* GH-131331: Rename "not" to "invert" (GH-131334)Bénédikt Tran2025-03-201-8/+8
|
* GH-131498: Remove conditional stack effects (GH-131499)Mark Shannon2025-03-201-0/+1
| | | * Adds some missing #includes
* GH-131238: More refactoring of core header files (GH-131351)Mark Shannon2025-03-171-0/+1
| | | | Adds new pycore_stats.h header file to help break dependencies involving the pycore_code.h header.
* GH-130415: Narrow str to "" based on boolean tests (GH-130476)Amit Lavon2025-03-041-0/+3
|
* GH-130415: Narrow int to 0 based on boolean tests (GH-130772)Klaus1172025-03-041-0/+3
|
* GH-130415: Use boolean guards to narrow types to values in the JIT (GH-130659)Brandt Bucher2025-03-021-35/+148
|
* GH-128939: Refactor JIT optimize structs (GH-128940)Mark Shannon2025-01-201-155/+358
|
* gh-120642: Move private PyCode APIs to the internal C API (#120643)Victor Stinner2024-06-261-1/+0
| | | | | | | | | | | | | | | * Move _Py_CODEUNIT and related functions to pycore_code.h. * Move _Py_BackoffCounter to pycore_backoff.h. * Move Include/cpython/optimizer.h content to pycore_optimizer.h. * Remove Include/cpython/optimizer.h. * Remove PyUnstable_Replace_Executor(). Rename functions: * PyUnstable_GetExecutor() => _Py_GetExecutor() * PyUnstable_GetOptimizer() => _Py_GetOptimizer() * PyUnstable_SetOptimizer() => _Py_SetTier2Optimizer() * PyUnstable_Optimizer_NewCounter() => _PyOptimizer_NewCounter() * PyUnstable_Optimizer_NewUOpOptimizer() => _PyOptimizer_NewUOpOptimizer()
* gh-119258: Eliminate Type Guards in Tier 2 Optimizer with Watcher (GH-119365)Saul Shanabrook2024-06-081-8/+38
| | | | | | | Co-authored-by: parmeggiani <parmeggiani@spaziodati.eu> Co-authored-by: dpdani <git@danieleparmeggiani.me> Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com> Co-authored-by: Brandt Bucher <brandtbucher@microsoft.com> Co-authored-by: Ken Jin <kenjin@python.org>
* GH-118910: Less boilerplate in the tier 2 optimizer (#118913)Mark Shannon2024-05-101-52/+54
|
* GH-118095: Unify the behavior of tier 2 FOR_ITER branch micro-ops (GH-118420)Mark Shannon2024-05-021-3/+10
| | | | | | * Target _FOR_ITER_TIER_TWO at POP_TOP following the matching END_FOR * Modify _GUARD_NOT_EXHAUSTED_RANGE, _GUARD_NOT_EXHAUSTED_LIST and _GUARD_NOT_EXHAUSTED_TUPLE so that they also target the POP_TOP following the matching END_FOR
* gh-118335: Configure Tier 2 interpreter at build time (#118339)Guido van Rossum2024-05-011-0/+3
| | | | | | | | | | | | | | | | | | | | | | The code for Tier 2 is now only compiled when configured with `--enable-experimental-jit[=yes|interpreter]`. We drop support for `PYTHON_UOPS` and -`Xuops`, but you can disable the interpreter or JIT at runtime by setting `PYTHON_JIT=0`. You can also build it without enabling it by default using `--enable-experimental-jit=yes-off`; enable with `PYTHON_JIT=1`. On Windows, the `build.bat` script supports `--experimental-jit`, `--experimental-jit-off`, `--experimental-interpreter`. In the C code, `_Py_JIT` is defined as before when the JIT is enabled; the new variable `_Py_TIER2` is defined when the JIT *or* the interpreter is enabled. It is actually a bitmask: 1: JIT; 2: default-off; 4: interpreter.
* GH-115480: Reduce guard strength for binary ops when type of one operand is ↵Mark Shannon2024-04-221-4/+10
| | | | known already (GH-118050)
* GH-115819: Eliminate Boolean guards when value is known (GH-116355)Mark Shannon2024-03-051-0/+9
|
* GH-115685: Optimize `TO_BOOL` and variants based on truthiness of input. ↵Mark Shannon2024-03-051-0/+44
| | | | (GH-116311)
* gh-116088: Insert bottom checks after all sym_set_...() calls (#116089)Guido van Rossum2024-02-291-14/+22
| | | | | | | | | This changes the `sym_set_...()` functions to return a `bool` which is `false` when the symbol is `bottom` after the operation. All calls to such functions now check this result and go to `hit_bottom`, a special error label that prints a different message and then reports that it wasn't able to optimize the trace. No executor will be produced in this case.
* gh-115859: Re-enable T2 optimizer pass by default (#116062)Guido van Rossum2024-02-281-1/+1
| | | | | | | | | | | | | This undoes the *temporary* default disabling of the T2 optimizer pass in gh-115860. - Add a new test that reproduces Brandt's example from gh-115859; it indeed crashes before gh-116028 with PYTHONUOPSOPTIMIZE=1 - Re-enable the optimizer pass in T2, stop checking PYTHONUOPSOPTIMIZE - Rename the env var to disable T2 entirely to PYTHON_UOPS_OPTIMIZE (must be explicitly set to 0 to disable) - Fix skipIf conditions on tests in test_opt.py accordingly - Export sym_is_bottom() (for debugging) - Fix various things in the `_BINARY_OP_` specializations in the abstract interpreter: - DECREF(temp) - out-of-space check after sym_new_const() - add sym_matches_type() checks, so even if we somehow reach a binary op with symbolic constants of the wrong type on the stack we won't trigger the type assert
* gh-115816: Improve internal symbols API in optimizer (#116028)Guido van Rossum2024-02-281-62/+180
| | | | | | | - Any `sym_set_...` call that attempts to set conflicting information cause the symbol to become `bottom` (contradiction). - All `sym_is...` and similar calls return false or NULL for `bottom`. - Everything's tested. - The tests still pass with `PYTHONUOPSOPTIMIZE=1`.
* GH-115816: Assorted naming and formatting changes to improve ↵Mark Shannon2024-02-271-57/+48
| | | | | | | maintainability. (GH-115987) * Rename _Py_UOpsAbstractInterpContext to _Py_UOpsContext and _Py_UOpsSymType to _Py_UopsSymbol. * #define shortened form of _Py_uop_... names for improved readability.
* GH-115816: Make tier2 optimizer symbols testable, and add a few tests. ↵Mark Shannon2024-02-271-0/+332
(GH-115953)