summaryrefslogtreecommitdiffstats
path: root/Python/optimizer_cases.c.h
Commit message (Collapse)AuthorAgeFilesLines
* gh-115999: Specialize loading attributes from modules in free-threaded ↵mpage2024-12-131-11/+36
| | | | | | | | | builds (#127711) We use the same approach that was used for specialization of LOAD_GLOBAL in free-threaded builds: _CHECK_ATTR_MODULE is renamed to _CHECK_ATTR_MODULE_PUSH_KEYS; it pushes the keys object for the following _LOAD_ATTR_MODULE_FROM_KEYS (nee _LOAD_ATTR_MODULE). This arrangement avoids having to recheck the keys version. _LOAD_ATTR_MODULE is renamed to _LOAD_ATTR_MODULE_FROM_KEYS; it loads the value from the keys object pushed by the preceding _CHECK_ATTR_MODULE_PUSH_KEYS at the cached index.
* gh-120619: Strength reduce function guards, support 2-operand uop forms ↵Ken Jin2024-11-091-22/+47
| | | | | (GH-124846) Co-authored-by: Brandt Bucher <brandtbucher@gmail.com>
* gh-115999: Implement thread-local bytecode and enable specialization for ↵mpage2024-11-041-0/+2
| | | | | | | | | `BINARY_OP` (#123926) Each thread specializes a thread-local copy of the bytecode, created on the first RESUME, in free-threaded builds. All copies of the bytecode for a code object are stored in the co_tlbc array on the code object. Threads reserve a globally unique index identifying its copy of the bytecode in all co_tlbc arrays at thread creation and release the index at thread destruction. The first entry in every co_tlbc array always points to the "main" copy of the bytecode that is stored at the end of the code object. This ensures that no bytecode is copied for programs that do not use threads. Thread-local bytecode can be disabled at runtime by providing either -X tlbc=0 or PYTHON_TLBC=0. Disabling thread-local bytecode also disables specialization. Concurrent modifications to the bytecode made by the specializing interpreter and instrumentation use atomics, with specialization taking care not to overwrite an instruction that was instrumented concurrently.
* GH-125837: Split `LOAD_CONST` into three. (GH-125972)Mark Shannon2024-10-291-0/+21
| | | | | | | | * Add LOAD_CONST_IMMORTAL opcode * Add LOAD_SMALL_INT opcode * Remove RETURN_CONST opcode
* GH-125912: Teach the JIT's optimizer about _BINARY_OP_INPLACE_ADD_UNICODE ↵Brandt Bucher2024-10-281-0/+19
| | | | (GH-125935)
* gh-115999: Refactor `LOAD_GLOBAL` specializations to avoid reloading ↵mpage2024-10-091-9/+51
| | | | | | | | | | | | | | | | | | {globals, builtins} keys (gh-124953) Each of the `LOAD_GLOBAL` specializations is implemented roughly as: 1. Load keys version. 2. Load cached keys version. 3. Deopt if (1) and (2) don't match. 4. Load keys. 5. Load cached index into keys. 6. Load object from (4) at offset from (5). This is not thread-safe in free-threaded builds; the keys object may be replaced in between steps (3) and (4). This change refactors the specializations to avoid reloading the keys object and instead pass the keys object from guards to be consumed by downstream uops.
* GH-119866: Spill the stack around escaping calls. (GH-124392)Mark Shannon2024-10-071-57/+119
| | | | | | | * Spill the evaluation around escaping calls in the generated interpreter and JIT. * The code generator tracks live, cached values so they can be saved to memory when needed. * Spills the stack pointer around escaping calls, so that the exact stack is visible to the cycle GC.
* gh-120619: Optimize through `_Py_FRAME_GENERAL` (GH-124518)Ken Jin2024-10-021-24/+14
| | | | | * Optimize through _Py_FRAME_GENERAL * refactor
* GH-123516: Improve JIT memory consumption by invalidating cold executors ↵Savannah Ostrowski2024-09-271-0/+4
| | | | | (GH-124443) Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
* gh-121459: Deferred LOAD_GLOBAL (GH-123128)Ken Jin2024-09-131-3/+5
| | | | Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com> Co-authored-by: Sam Gross <655866+colesbury@users.noreply.github.com>
* GH-123996: Explicitly mark 'self_or_null' as an array of size 1 to ensure ↵Mark Shannon2024-09-121-6/+10
| | | | that it is kept in memory for calls (GH-124003)
* GH-115776: Allow any fixed sized object to have inline values (GH-123192)Mark Shannon2024-08-211-2/+2
|
* GH-118093: Make `CALL_ALLOC_AND_ENTER_INIT` suitable for tier 2. (GH-123140)Mark Shannon2024-08-201-1/+40
| | | | | * Convert CALL_ALLOC_AND_ENTER_INIT to micro-ops such that tier 2 supports it * Allow inexact arguments for CALL_ALLOC_AND_ENTER_INIT.
* GH-118093: Specialize `CALL_KW` (GH-123006)Mark Shannon2024-08-161-0/+56
|
* GH-120024: Remove `CHECK_EVAL_BREAKER` macro. (GH-122968)Mark Shannon2024-08-141-8/+14
| | | | | * Factor some instructions into micro-ops to isolate CHECK_EVAL_BREAKER for escape analysis * Eliminate CHECK_EVAL_BREAKER macro
* GH-122869: Add missing tier two optimizer cases (GH-122936)Mark Shannon2024-08-121-4/+17
|
* GH-118095: Add tier two support for BINARY_SUBSCR_GETITEM (GH-120793)Mark Shannon2024-08-011-1/+12
|
* GH-122155: Track local variables between pops and pushes in cases generator ↵Mark Shannon2024-08-011-1/+2
| | | | (GH-122286)
* Manually override bytecode definition in optimizer, to avoid build error ↵Mark Shannon2024-07-261-4/+9
| | | | (GH-122316)
* GH-122294: Burn in the addresses of side exits (GH-122295)Brandt Bucher2024-07-261-0/+2
|
* GH-122029: Break INSTRUMENTED_CALL into micro-ops, so that its behavior is ↵Mark Shannon2024-07-261-2/+16
| | | | consistent with CALL (GH-122177)
* GH-121131: Clean up and fix some instrumented instructions. (GH-121132)Mark Shannon2024-07-261-6/+2
| | | | * Add support for 'prev_instr' to code generator and refactor some INSTRUMENTED instructions
* GH-118093: Add tier two support for BINARY_OP_INPLACE_ADD_UNICODE (GH-122253)Brandt Bucher2024-07-251-0/+6
|
* GH-118093: Add tier two support for LOAD_ATTR_PROPERTY (GH-122283)Brandt Bucher2024-07-251-1/+6
|
* GH-122160: Remove BUILD_CONST_KEY_MAP opcode. (GH-122164)Mark Shannon2024-07-251-9/+0
|
* GH-118093: Add tier two support to several instructions (GH-121884)Brandt Bucher2024-07-181-3/+45
|
* GH-116017: Get rid of _COLD_EXITs (GH-120960)Brandt Bucher2024-07-011-4/+0
|
* gh-117139: Convert the evaluation stack to stack refs (#118450)Ken Jin2024-06-261-9/+9
| | | | | | | | | | | | | | | | | This PR sets up tagged pointers for CPython. The general idea is to create a separate struct _PyStackRef for everything on the evaluation stack to store the bits. This forces the C compiler to warn us if we try to cast things or pull things out of the struct directly. Only for free threading: We tag the low bit if something is deferred - that means we skip incref and decref operations on it. This behavior may change in the future if Mark's plans to defer all objects in the interpreter loop pans out. This implies a strict stack reference discipline is required. ALL incref and decref operations on stackrefs must use the stackref variants. It is unsafe to untag something then do normal incref/decref ops on it. The new incref and decref variants are called dup and close. They mimic a "handle" API operating on these stackrefs. Please read Include/internal/pycore_stackref.h for more information! --------- Co-authored-by: Mark Shannon <9448417+markshannon@users.noreply.github.com>
* GH-120982: Add stack check assertions to generated interpreter code (GH-120992)Mark Shannon2024-06-251-0/+128
|
* gh-120437: Fix `_CHECK_STACK_SPACE` optimization problems introduced in ↵Nadeshiko Manju2024-06-191-1/+0
| | | | | | gh-118322 (GH-120712) Co-authored-by: Ken Jin <kenjin4096@gmail.com>
* GH-120507: Lower the `BEFORE_WITH` and `BEFORE_ASYNC_WITH` instructions. ↵Mark Shannon2024-06-181-3/+13
| | | | | | | | | (#120640) * Remove BEFORE_WITH and BEFORE_ASYNC_WITH instructions. * Add LOAD_SPECIAL instruction * Reimplement `with` and `async with` statements using LOAD_SPECIAL
* GH-120619: Clean up `RETURN_VALUE` instruction (GH-120624)Mark Shannon2024-06-171-1/+1
| | | | | * Rename _POP_FRAME to _RETURN_VALUE as it returns a value as well as popping a frame. * Remove remaining _POP_FRAMEs
* gh-119258: Eliminate Type Guards in Tier 2 Optimizer with Watcher (GH-119365)Saul Shanabrook2024-06-081-8/+25
| | | | | | | Co-authored-by: parmeggiani <parmeggiani@spaziodati.eu> Co-authored-by: dpdani <git@danieleparmeggiani.me> Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com> Co-authored-by: Brandt Bucher <brandtbucher@microsoft.com> Co-authored-by: Ken Jin <kenjin@python.org>
* gh-119821: Support non-dict globals in LOAD_FROM_DICT_OR_GLOBALS (#119822)Jelle Zijlstra2024-05-311-6/+1
| | | | | | | | | Support non-dict globals in LOAD_FROM_DICT_OR_GLOBALS The implementation basically copies LOAD_GLOBAL. Possibly it could be deduplicated, but that seems like it may get hairy since the two operations have different operands. This is important to fix in 3.14 for PEP 649, but it's a bug in earlier versions too, and we should backport to 3.13 and 3.12 if possible.
* GH-119258: Handle STORE_ATTR_WITH_HINT in tier two (GH-119481)Brandt Bucher2024-05-281-1/+4
|
* GH-119476: Split _CHECK_FUNCTION_VERSION out of _CHECK_FUNCTION_EXACT_ARGS ↵Brandt Bucher2024-05-281-2/+0
| | | | (GH-119510)
* gh-119180: Add LOAD_COMMON_CONSTANT opcode (#119321)Jelle Zijlstra2024-05-221-1/+1
| | | | | | | | | | The PEP 649 implementation will require a way to load NotImplementedError from the bytecode. @markshannon suggested implementing this by converting LOAD_ASSERTION_ERROR into a more general mechanism for loading constants. This PR adds this new opcode. I will work on the rest of the implementation of the PEP separately. Co-authored-by: Irit Katriel <1055913+iritkatriel@users.noreply.github.com>
* GH-118910: Less boilerplate in the tier 2 optimizer (#118913)Mark Shannon2024-05-101-222/+87
|
* GH-118095: Use broader specializations of CALL in tier 1, for better tier 2 ↵Mark Shannon2024-05-041-3/+53
| | | | | | | | | | support of calls. (GH-118322) * Add CALL_PY_GENERAL, CALL_BOUND_METHOD_GENERAL and call CALL_NON_PY_GENERAL specializations. * Remove CALL_PY_WITH_DEFAULTS specialization * Use CALL_NON_PY_GENERAL in more cases when otherwise failing to specialize
* GH-113464: Remove the extra jump via `_SIDE_EXIT` in `_EXIT_TRACE` (GH-118545)Mark Shannon2024-05-041-4/+0
|
* GH-117442: Check eval-breaker at start (rather than end) of tier 2 loops ↵Mark Shannon2024-05-021-4/+0
| | | | (GH-118482)
* GH-118095: Add tier 2 support for YIELD_VALUE (GH-118380)Mark Shannon2024-04-301-5/+9
|
* GH-118095: Allow a variant of RESUME_CHECK in tier 2 (GH-118286)Mark Shannon2024-04-291-0/+8
|
* GH-118095: Add dynamic exit support and FOR_ITER_GEN support to tier 2 ↵Mark Shannon2024-04-261-1/+12
| | | | (GH-118279)
* GH-118095: Handle `RETURN_GENERATOR` in tier 2 (GH-118180)Mark Shannon2024-04-251-0/+23
|
* GH-115480: Reduce guard strength for binary ops when type of one operand is ↵Mark Shannon2024-04-221-8/+60
| | | | known already (GH-118050)
* GH-115419: Tidy up tier 2 optimizer. Merge peephole pass into main pass ↵Mark Shannon2024-04-181-3/+52
| | | | (GH-117997)
* gh-116168: Remove extra `_CHECK_STACK_SPACE` uops (#117242)Peter Lazorchak2024-04-031-0/+4
| | | This merges all `_CHECK_STACK_SPACE` uops in a trace into a single `_CHECK_STACK_SPACE_OPERAND` uop that checks whether there is enough stack space for all calls included in the entire trace.
* GH-115776: Embed the values array into the object, for "normal" Python ↵Mark Shannon2024-04-021-1/+1
| | | | objects. (GH-116115)
* GH-116422: Tier2 hot/cold splitting (GH-116813)Mark Shannon2024-03-261-40/+17
| | | | | Splits the "cold" path, deopts and exits, from the "hot" path, reducing the size of most jitted instructions, at the cost of slower exits.