summaryrefslogtreecommitdiffstats
path: root/Objects/codeobject.c
Commit message (Collapse)AuthorAgeFilesLines
* gh-139109: A new tracing JIT compiler frontend for CPython (GH-140310)Ken Jin2025-11-131-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This PR changes the current JIT model from trace projection to trace recording. Benchmarking: better pyperformance (about 1.7% overall) geomean versus current https://raw.githubusercontent.com/facebookexperimental/free-threading-benchmarking/refs/heads/main/results/bm-20251108-3.15.0a1%2B-7e2bc1d-JIT/bm-20251108-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-7e2bc1d-vs-base.svg, 100% faster Richards on the most improved benchmark versus the current JIT. Slowdown of about 10-15% on the worst benchmark versus the current JIT. **Note: the fastest version isn't the one merged, as it relies on fixing bugs in the specializing interpreter, which is left to another PR**. The speedup in the merged version is about 1.1%. https://raw.githubusercontent.com/facebookexperimental/free-threading-benchmarking/refs/heads/main/results/bm-20251112-3.15.0a1%2B-f8a764a-JIT/bm-20251112-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-f8a764a-vs-base.svg Stats: 50% more uops executed, 30% more traces entered the last time we ran them. It also suggests our trace lengths for a real trace recording JIT are too short, as a lot of trace too long aborts https://github.com/facebookexperimental/free-threading-benchmarking/blob/main/results/bm-20251023-3.15.0a1%2B-eb73378-CLANG%2CJIT/bm-20251023-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-eb73378-pystats-vs-base.md . This new JIT frontend is already able to record/execute significantly more instructions than the previous JIT frontend. In this PR, we are now able to record through custom dunders, simple object creation, generators, etc. None of these were done by the old JIT frontend. Some custom dunders uops were discovered to be broken as part of this work gh-140277 The optimizer stack space check is disabled, as it's no longer valid to deal with underflow. Pros: * Ignoring the generated tracer code as it's automatically created, this is only additional 1k lines of code. The maintenance burden is handled by the DSL and code generator. * `optimizer.c` is now significantly simpler, as we don't have to do strange things to recover the bytecode from a trace. * The new JIT frontend is able to handle a lot more control-flow than the old one. * Tracing is very low overhead. We use the tail calling interpreter/computed goto interpreter to switch between tracing mode and non-tracing mode. I call this mechanism dual dispatch, as we have two dispatch tables dispatching to each other. Specialization is still enabled while tracing. * Better handling of polymorphism. We leverage the specializing interpreter for this. Cons: * (For now) requires tail calling interpreter or computed gotos. This means no Windows JIT for now :(. Not to fret, tail calling is coming soon to Windows though https://github.com/python/cpython/pull/139962 Design: * After each instruction, the `record_previous_inst` function/label is executed. This does as the name suggests. * The tracing interpreter lowers bytecode to uops directly so that it can obtain "fresh" values at the point of lowering. * The tracing version behaves nearly identical to the normal interpreter, in fact it even has specialization! This allows it to run without much of a slowdown when tracing. The actual cost of tracing is only a function call and writes to memory. * The tracing interpreter uses the specializing interpreter's deopt to naturally form the side exit chains. This allows it to side exit chain effectively, without repeating much code. We force a re-specializing when tracing a deopt. * The tracing interpreter can even handle goto errors/exceptions, but I chose to disable them for now as it's not tested. * Because we do not share interpreter dispatch, there is should be no significant slowdown to the original specializing interpreter on tailcall and computed got with JIT disabled. With JIT enabled, there might be a slowdown in the form of the JIT trying to trace. * Things that could have dynamic instruction pointer effects are guarded on. The guard deopts to a new instruction --- `_DYNAMIC_EXIT`.
* gh-140815: Fix faulthandler for invalid/freed frame (#140921)Victor Stinner2025-11-041-3/+20
| | | | | | | | | | | | | faulthandler now detects if a frame or a code object is invalid or freed. Add helper functions: * _PyCode_SafeAddr2Line() * _PyFrame_SafeGetCode() * _PyFrame_SafeGetLasti() _PyMem_IsPtrFreed() now detects pointers in [-0xff, 0xff] range as freed.
* gh-138661: fix data race in `PyCode_Addr2Line` (#138664)Kumar Aditya2025-09-121-1/+11
|
* GH-137623: Use an AC decorator for docstring line length enforcement (#137690)Adam Turner2025-08-181-1/+2
|
* gh-137514: Add a free-threading wrapper for mutexes (GH-137515)Peter Bierma2025-08-071-13/+5
| | | Add `FT_MUTEX_LOCK`/`FT_MUTEX_UNLOCK`, which call `PyMutex_Lock` and `PyMutex_Unlock` on the free-threaded build, and no-op otherwise.
* gh-136396: Include instrumentation when creating new copies of the bytecode ↵mpage2025-07-141-1/+18
| | | | | | | | | | | | | | | | | (#136525) Previously, we assumed that instrumentation would happen for all copies of the bytecode if the instrumentation version on the code object didn't match the per-interpreter instrumentation version. That assumption was incorrect: instrumentation will exit early if there are no new "events," even if there is an instrumentation version mismatch. To fix this, include the instrumented opcodes when creating new copies of the bytecode, rather than replacing them with their uninstrumented variants. I don't think we have to worry about races between instrumentation and creating new copies of the bytecode: instrumentation and new bytecode creation cannot happen concurrently. Instrumentation requires that either the world is stopped or the code object's per-object lock is held and new bytecode creation requires holding the code object's per-object lock.
* gh-135607: remove null checking of weakref list in dealloc of extension ↵Xuanteng Huang2025-06-301-3/+2
| | | | | | modules and objects (#135614) Co-authored-by: Kumar Aditya <kumaraditya@python.org> Co-authored-by: Victor Stinner <vstinner@python.org>
* GH-133136: Revise QSBR to reduce excess memory held (gh-135473)Neil Schemenauer2025-06-251-1/+1
| | | | | | | | | | | | | | | | | The free threading build uses QSBR to delay the freeing of dictionary keys and list arrays when the objects are accessed by multiple threads in order to allow concurrent reads to proceed with holding the object lock. The requests are processed in batches to reduce execution overhead, but for large memory blocks this can lead to excess memory usage. Take into account the size of the memory block when deciding when to process QSBR requests. Also track the amount of memory being held by QSBR for mimalloc pages. Advance the write sequence if this memory exceeds a limit. Advancing the sequence will allow it to be freed more quickly. Process the held QSBR items from the "eval breaker", rather than from `_PyMem_FreeDelayed()`. This gives a higher chance that the global read sequence has advanced enough so that items can be freed. Co-authored-by: Sam Gross <colesbury@gmail.com>
* gh-135450: Remove assertion in `_PyCode_CheckNoExternalState` (gh-135466)Peter Bierma2025-06-181-1/+0
| | | | The assertion reflected a misunderstanding of situations where "hidden" variables might exist, namely generator expressions and comprehensions.
* gh-135437: Account For Duplicate Names in _PyCode_SetUnboundVarCounts() ↵Eric Snow2025-06-131-5/+25
| | | | (gh-135438)
* gh-91048: Refactor and optimize remote debugging module (#134652)Pablo Galindo Salgado2025-05-251-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Completely refactor Modules/_remote_debugging_module.c with improved code organization, replacing scattered reference counting and error handling with centralized goto error paths. This cleanup improves maintainability and reduces code duplication throughout the module while preserving the same external API. Implement memory page caching optimization in Python/remote_debug.h to avoid repeated reads of the same memory regions during debugging operations. The cache stores previously read memory pages and reuses them for subsequent reads, significantly reducing system calls and improving performance. Add code object caching mechanism with a new code_object_generation field in the interpreter state that tracks when code object caches need invalidation. This allows efficient reuse of parsed code object metadata and eliminates redundant processing of the same code objects across debugging sessions. Optimize memory operations by replacing multiple individual structure copies with single bulk reads for the same data structures. This reduces the number of memory operations and system calls required to gather debugging information from the target process. Update Makefile.pre.in to include Python/remote_debug.h in the headers list, ensuring that changes to the remote debugging header force proper recompilation of dependent modules and maintain build consistency across the codebase. Also, make the module compatible with the free threading build as an extra :) Co-authored-by: Łukasz Langa <lukasz@langa.pl>
* gh-132775: Unrevert "Add _PyCode_VerifyStateless()" (gh-133528)Eric Snow2025-05-081-17/+151
| | | | | | | | This reverts commit 3c73cf5 (gh-133497), which itself reverted the original commit d270bb5 (gh-133221). We reverted the original change due to failing android tests. The checks in _PyCode_CheckNoInternalState() were too strict, so we've relaxed them.
* gh-132775: Revert "gh-132775: Add _PyCode_VerifyStateless() (gh-133221)" ↵Petr Viktorin2025-05-061-155/+17
| | | | (#133497)
* gh-132775: Add _PyCode_VerifyStateless() (gh-133221)Eric Snow2025-05-051-17/+155
| | | | | | | | "Stateless" code is a function or code object which does not rely on external state or internal state. It may rely on arguments and builtins, but not globals or a closure. I've left a comment in pycore_code.h that provides more detail. We also add _PyFunction_VerifyStateless(). The new functions will be used in several later changes that facilitate "sharing" functions and code objects between interpreters.
* gh-132775: Unrevert "Add _PyCode_GetVarCounts()" (gh-133265)Eric Snow2025-05-051-3/+278
| | | | | | This reverts commit 811edcf (gh-133232), which itself reverted the original commit 811edcf (gh-133128). We reverted the original change due to failing s390 builds (a big-endian architecture). It ended up that I had not accommodated op caches.
* Revert "gh-132775: Add _PyCode_GetVarCounts() (gh-133128)" (gh-133232)Eric Snow2025-05-011-235/+0
| | | | | The change broke the s390 builds, so I'm reverting it while I investigate. This reverts commit 94b4fcd806e7b692955173d309ea3b70a193ad96.
* gh-132775: Add _PyCode_GetVarCounts() (gh-133128)Eric Snow2025-04-301-0/+235
| | | | | | | This helper is useful in a variety of ways, including in demonstrating how the different counts relate to one another. It will be used in a later change to help identify if a function is "stateless", meaning it doesn't have any free vars or globals. Note that a majority of this change is tests.
* gh-132775: Add _PyCode_ReturnsOnlyNone() (gh-132981)Eric Snow2025-04-291-0/+43
| | | | | The function indicates whether or not the function has a return statement. This is used by a later change related treating some functions like scripts.
* gh-132399: fix invalid function signatures on the free-threaded build (#132400)Bénédikt Tran2025-04-121-2/+3
|
* gh-131238: Remove pycore_object_deferred.h from pycore_object.h (#131549)Victor Stinner2025-03-211-0/+1
| | | Remove also pycore_function.h from pycore_typeobject.h.
* GH-131498: Remove conditional stack effects (GH-131499)Mark Shannon2025-03-201-0/+1
| | | * Adds some missing #includes
* gh-131238: Remove includes from pycore_interp.h (#131495)Victor Stinner2025-03-201-6/+4
| | | Remove also now unused includes in C files.
* gh-131238: Remove more includes from pycore_interp.h (#131480)Victor Stinner2025-03-191-1/+3
|
* gh-111178: Fix function signatures to fix undefined behavior (#131191)Victor Stinner2025-03-141-1/+2
|
* gh-130851: Only intern constants of types generated by the compiler (#130901)Sam Gross2025-03-071-2/+41
| | | | | | | | | | | | | The free-threading build interns and immortalizes most constants generated by the bytecode compiler. However, users can construct their own code objects with arbitrary constants. We should not intern or immortalize these objects if they are not of a type that we know how to handle. This change fixes a reference leak failure in the recently added `test_code.test_unusual_constants` test. It also addresses a potential crash that could occur when attempting to destroy an immortalized object during interpreter shutdown.
* gh-130851: Don't crash when deduping unusual code constants (#130853)Sam Gross2025-03-051-6/+12
| | | | | | | | | | The bytecode compiler only generates a few different types of constants, like str, int, tuple, slices, etc. Users can construct code objects with various unusual constants, including ones that are not hashable or not even constant. The free threaded build previously crashed with a fatal error when confronted with these constants. Instead, treat distinct objects of otherwise unhandled types as not equal for the purposes of deduplication.
* Postpone <stdbool.h> inclusion after Python.h (#130641)Hugo Beauzée-Luyssen2025-02-281-2/+2
| | | | | | | Remove inclusions prior to Python.h. <stdbool.h> will cause <features.h> to be included before Python.h can define some macros to enable some additional features, causing multiple types not to be defined down the line.
* GH-128872: Remove unused argument from _PyCode_Quicken (GH-128873)Yan Yanchii2025-02-021-6/+4
| | | Co-authored-by: Kirill Podoprigora <kirill.bast9@mail.ru>
* GH-127953: Make line number lookup O(1) regardless of the size of the code ↵Mark Shannon2025-01-211-0/+3
| | | | object (GH-128350)
* gh-111178: Generate correct signature for most self converters (#128447)Erlend E. Aasland2025-01-201-15/+15
|
* gh-128923: Use zero to indicate unassigned unique id (#128925)Sam Gross2025-01-171-1/+1
| | | | | | | | In the free threading build, the per thread reference counting uses a unique id for some objects to index into the local reference count table. Use 0 instead of -1 to indicate that the id is not assigned. This avoids bugs where zero-initialized heap type objects look like they have a unique id assigned.
* gh-111178: fix UBSan failures in `Objects/codeobject.c` (GH-128240)Bénédikt Tran2025-01-131-18/+22
|
* GH-122548: Implement branch taken and not taken events for sys.monitoring ↵Mark Shannon2024-12-191-0/+7
| | | | (GH-122564)
* gh-127582: Make object resurrection thread-safe for free threading. (GH-127612)Sam Gross2024-12-051-5/+2
| | | | | | | | | | | | Objects may be temporarily "resurrected" in destructors when calling finalizers or watcher callbacks. We previously undid the resurrection by decrementing the reference count using `Py_SET_REFCNT`. This was not thread-safe because other threads might be accessing the object (modifying its reference count) if it was exposed by the finalizer, watcher callback, or temporarily accessed by a racy dictionary or list access. This adds internal-only thread-safe functions for temporary object resurrection during destructors.
* gh-114940: Add _Py_FOR_EACH_TSTATE_UNLOCKED(), and Friends (gh-127077)Eric Snow2024-11-211-2/+4
| | | This is a precursor to the actual fix for gh-114940, where we will change these macros to use the new lock. This change is almost entirely mechanical; the exceptions are the loops in codeobject.c and ceval.c, which now hold the "head" lock. Note that almost all of the uses of _Py_FOR_EACH_TSTATE_UNLOCKED() here will change to _Py_FOR_EACH_TSTATE_BEGIN() once we add the new per-interpreter lock.
* gh-127020: Make `PyCode_GetCode` thread-safe for free threading (#127043)Sam Gross2024-11-211-27/+51
| | | | Some fields in PyCodeObject are lazily initialized. Use atomics and critical sections to make their initializations and accesses thread-safe.
* gh-126298: Don't deduplicate slice constants based on equality (#126398)Michael Droettboom2024-11-071-1/+34
| | | | | | | | | | | | | | | * gh-126298: Don't deduplicated slice constants based on equality * NULL check for PySlice_New * Fix refcounting * Fix refcounting some more * Fix refcounting * Make tests more complete * Fix tests
* gh-115999: Implement thread-local bytecode and enable specialization for ↵mpage2024-11-041-4/+309
| | | | | | | | | `BINARY_OP` (#123926) Each thread specializes a thread-local copy of the bytecode, created on the first RESUME, in free-threaded builds. All copies of the bytecode for a code object are stored in the co_tlbc array on the code object. Threads reserve a globally unique index identifying its copy of the bytecode in all co_tlbc arrays at thread creation and release the index at thread destruction. The first entry in every co_tlbc array always points to the "main" copy of the bytecode that is stored at the end of the code object. This ensures that no bytecode is copied for programs that do not use threads. Thread-local bytecode can be disabled at runtime by providing either -X tlbc=0 or PYTHON_TLBC=0. Disabling thread-local bytecode also disables specialization. Concurrent modifications to the bytecode made by the specializing interpreter and instrumentation use atomics, with specialization taking care not to overwrite an instruction that was instrumented concurrently.
* gh-125900: Clean-up logic around immortalization in free-threading (#125901)Sam Gross2024-10-241-14/+6
| | | | | | | | | * Remove `@suppress_immortalization` decorator * Make suppression flag per-thread instead of per-interpreter * Suppress immortalization in `eval()` to avoid refleaks in three tests (test_datetime.test_roundtrip, test_logging.test_config8_ok, and test_random.test_after_fork). * frozenset() is constant, but not a singleton. When run multiple times, the test could fail due to constant interning.
* gh-124218: Use per-thread refcounts for code objects (#125216)Sam Gross2024-10-151-1/+5
| | | | | | | Use per-thread refcounting for the reference from function objects to their corresponding code object. This can be a source of contention when frequently creating nested functions. Deferred refcounting alone isn't a great fit here because these references are on the heap and may be modified by other libraries.
* gh-111178: Fix function signatures in codeobject.c (#125180)Victor Stinner2024-10-091-32/+48
|
* gh-125063: Emit slices as constants in the bytecode compiler (#125064)Michael Droettboom2024-10-081-0/+1
| | | | | | | | | | | | | | | | | | | | | | | * Make slices marshallable * Emit slices as constants * Update Python/marshal.c Co-authored-by: Peter Bierma <zintensitydev@gmail.com> * Refactor codegen_slice into two functions so it always has the same net effect * Fix for free-threaded builds * Simplify marshal loading of slices * Only return SUCCESS/ERROR from codegen_slice --------- Co-authored-by: Mark Shannon <mark@hotpy.org> Co-authored-by: Peter Bierma <zintensitydev@gmail.com>
* gh-122854: Add Py_HashBuffer() function (#122855)Victor Stinner2024-08-301-2/+2
|
* GH-122390: Replace `_Py_GetbaseOpcode` with `_Py_GetBaseCodeUnit` (GH-122942)Mark Shannon2024-08-131-50/+12
|
* Replace PyObject_Del with PyObject_Free (#122453)Victor Stinner2024-08-011-2/+2
| | | | PyObject_Del() is just a alias to PyObject_Free() kept for backward compatibility. Use directly PyObject_Free() instead.
* gh-121863: Immortalize names in code objects to avoid crash (GH-121903)Petr Viktorin2024-07-171-1/+1
|
* Fix typos in comments (#120821)Xie Yanbo2024-06-241-1/+1
|
* Fixes loop variables to be the same types as their limit (GH-120958)Steve Dower2024-06-241-1/+1
|
* gh-113993: Allow interned strings to be mortal, and fix related issues ↵Petr Viktorin2024-06-211-6/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (GH-120520) * Add an InternalDocs file describing how interning should work and how to use it. * Add internal functions to *explicitly* request what kind of interning is done: - `_PyUnicode_InternMortal` - `_PyUnicode_InternImmortal` - `_PyUnicode_InternStatic` * Switch uses of `PyUnicode_InternInPlace` to those. * Disallow using `_Py_SetImmortal` on strings directly. You should use `_PyUnicode_InternImmortal` instead: - Strings should be interned before immortalization, otherwise you're possibly interning a immortalizing copy. - `_Py_SetImmortal` doesn't handle the `SSTATE_INTERNED_MORTAL` to `SSTATE_INTERNED_IMMORTAL` update, and those flags can't be changed in backports, as they are now part of public API and version-specific ABI. * Add private `_only_immortal` argument for `sys.getunicodeinternedsize`, used in refleak test machinery. * Make sure the statically allocated string singletons are unique. This means these sets are now disjoint: - `_Py_ID` - `_Py_STR` (including the empty string) - one-character latin-1 singletons Now, when you intern a singleton, that exact singleton will be interned. * Add a `_Py_LATIN1_CHR` macro, use it instead of `_Py_ID`/`_Py_STR` for one-character latin-1 singletons everywhere (including Clinic). * Intern `_Py_STR` singletons at startup. * For free-threaded builds, intern `_Py_LATIN1_CHR` singletons at startup. * Beef up the tests. Cover internal details (marked with `@cpython_only`). * Add lots of assertions Co-Authored-By: Eric Snow <ericsnowcurrently@gmail.com>
* gh-117657: Fix race involving immortalizing objects (#119927)Sam Gross2024-06-031-2/+2
| | | | | | | | | The free-threaded build currently immortalizes objects that use deferred reference counting (see gh-117783). This typically happens once the first non-main thread is created, but the behavior can be suppressed for tests, in subinterpreters, or during a compile() call. This fixes a race condition involving the tracking of whether the behavior is suppressed.