summaryrefslogtreecommitdiffstats
path: root/Python/pystate.c
Commit message (Collapse)AuthorAgeFilesLines
* gh-142048: Fix lost gc allocations count on thread cleanup (#142233)Kevin Wang2025-12-101-2/+9
|
* gh-138122: Don't sample partial frame chains (#141912)Pablo Galindo Salgado2025-12-071-2/+26
|
* gh-139109: A new tracing JIT compiler frontend for CPython (GH-140310)Ken Jin2025-11-131-8/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This PR changes the current JIT model from trace projection to trace recording. Benchmarking: better pyperformance (about 1.7% overall) geomean versus current https://raw.githubusercontent.com/facebookexperimental/free-threading-benchmarking/refs/heads/main/results/bm-20251108-3.15.0a1%2B-7e2bc1d-JIT/bm-20251108-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-7e2bc1d-vs-base.svg, 100% faster Richards on the most improved benchmark versus the current JIT. Slowdown of about 10-15% on the worst benchmark versus the current JIT. **Note: the fastest version isn't the one merged, as it relies on fixing bugs in the specializing interpreter, which is left to another PR**. The speedup in the merged version is about 1.1%. https://raw.githubusercontent.com/facebookexperimental/free-threading-benchmarking/refs/heads/main/results/bm-20251112-3.15.0a1%2B-f8a764a-JIT/bm-20251112-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-f8a764a-vs-base.svg Stats: 50% more uops executed, 30% more traces entered the last time we ran them. It also suggests our trace lengths for a real trace recording JIT are too short, as a lot of trace too long aborts https://github.com/facebookexperimental/free-threading-benchmarking/blob/main/results/bm-20251023-3.15.0a1%2B-eb73378-CLANG%2CJIT/bm-20251023-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-eb73378-pystats-vs-base.md . This new JIT frontend is already able to record/execute significantly more instructions than the previous JIT frontend. In this PR, we are now able to record through custom dunders, simple object creation, generators, etc. None of these were done by the old JIT frontend. Some custom dunders uops were discovered to be broken as part of this work gh-140277 The optimizer stack space check is disabled, as it's no longer valid to deal with underflow. Pros: * Ignoring the generated tracer code as it's automatically created, this is only additional 1k lines of code. The maintenance burden is handled by the DSL and code generator. * `optimizer.c` is now significantly simpler, as we don't have to do strange things to recover the bytecode from a trace. * The new JIT frontend is able to handle a lot more control-flow than the old one. * Tracing is very low overhead. We use the tail calling interpreter/computed goto interpreter to switch between tracing mode and non-tracing mode. I call this mechanism dual dispatch, as we have two dispatch tables dispatching to each other. Specialization is still enabled while tracing. * Better handling of polymorphism. We leverage the specializing interpreter for this. Cons: * (For now) requires tail calling interpreter or computed gotos. This means no Windows JIT for now :(. Not to fret, tail calling is coming soon to Windows though https://github.com/python/cpython/pull/139962 Design: * After each instruction, the `record_previous_inst` function/label is executed. This does as the name suggests. * The tracing interpreter lowers bytecode to uops directly so that it can obtain "fresh" values at the point of lowering. * The tracing version behaves nearly identical to the normal interpreter, in fact it even has specialization! This allows it to run without much of a slowdown when tracing. The actual cost of tracing is only a function call and writes to memory. * The tracing interpreter uses the specializing interpreter's deopt to naturally form the side exit chains. This allows it to side exit chain effectively, without repeating much code. We force a re-specializing when tracing a deopt. * The tracing interpreter can even handle goto errors/exceptions, but I chose to disable them for now as it's not tested. * Because we do not share interpreter dispatch, there is should be no significant slowdown to the original specializing interpreter on tailcall and computed got with JIT disabled. With JIT enabled, there might be a slowdown in the form of the JIT trying to trace. * Things that could have dynamic instruction pointer effects are guarded on. The guard deopts to a new instruction --- `_DYNAMIC_EXIT`.
* gh-139653: Add PyUnstable_ThreadState_SetStackProtection() (#139668)Victor Stinner2025-11-131-0/+3
| | | | | | | | Add PyUnstable_ThreadState_SetStackProtection() and PyUnstable_ThreadState_ResetStackProtection() functions to set the stack base address and stack size of a Python thread state. Co-authored-by: Petr Viktorin <encukou@gmail.com>
* gh-131253: free-threaded build support for pystats (gh-137189)Neil Schemenauer2025-11-031-0/+29
| | | | | | | | Allow the --enable-pystats build option to be used with free-threading. The stats are now stored on a per-interpreter basis, rather than process global. For free-threaded builds, the stats structure is allocated per-thread and then periodically merged into the per-interpreter stats structure (on thread exit or when the reporting function is called). Most of the pystats related code has be moved into the file Python/pystats.c.
* gh-140544: Always assume that thread locals are available (GH-140690)Peter Bierma2025-10-281-3/+0
| | | Python has required thread local support since 3.12 (see GH-103324). By assuming that thread locals are always supported, we can improve the performance of third-party extensions by allowing them to access the attached thread and interpreter states directly.
* gh-138050: [WIP] JIT - Streamline MAKE_WARM - move coldness check to ↵alm2025-10-271-1/+1
| | | | executor creation (GH-138240)
* gh-140544: store pointer to interpreter state as a thread local for fast ↵Kumar Aditya2025-10-251-3/+9
| | | | access (#140573)
* gh-140544: cleanup `HAVE_THREAD_LOCAL` checks in pystate.c (#140547)Kumar Aditya2025-10-241-21/+3
|
* gh-140301: Fix memory leak in subinterpreter `PyConfig` cleanup (#140303)Shamil2025-10-201-0/+2
| | | Co-authored-by: Kumar Aditya <kumaraditya@python.org>
* gh-140067: Fix memory leak in sub-interpreter creation (#140111) (#140261)Kumar Aditya2025-10-181-6/+9
| | | | | | Fix memory leak in sub-interpreter creation caused by overwriting of the previously used `_malloced` field. Now the pointer is stored in the first word of the memory block to avoid it being overwritten accidentally. Co-authored-by: Kumar Aditya <kumaraditya@python.org>
* gh-140257: fix data race on eval_breaker during finalization (#140265)Shamil2025-10-181-3/+4
|
* Revert "gh-140067: Fix memory leak in sub-interpreter creation (#140111)" ↵Peter Bierma2025-10-151-9/+6
| | | | | (#140140) This reverts commit 59547a251f7069dc6e08cb6082dd21872671e381.
* gh-140067: Fix memory leak in sub-interpreter creation (#140111)Shamil2025-10-141-6/+9
| | | | | Fix memory leak in sub-interpreter creation caused by overwriting of the previously used `_malloced` field. Now the pointer is stored in the first word of the memory block to avoid it being overwritten accidentally. Co-authored-by: Kumar Aditya <kumaraditya@python.org>
* gh-126016: Remove bad assertion in `PyThreadState_Clear` (GH-139158)Peter Bierma2025-09-191-1/+5
| | | In the _interpreters module, we use PyEval_EvalCode() to run Python code in another interpreter. However, when the process receives a KeyboardInterrupt, PyEval_EvalCode() will jump straight to finalization rather than returning. This prevents us from cleaning up and marking the thread as "not running main", which triggers an assertion in PyThreadState_Clear() on debug builds. Since everything else works as intended, remove that assertion.
* gh-136003: Execute pre-finalization callbacks in a loop (GH-136004)Peter Bierma2025-09-181-1/+1
|
* gh-137838: Move _PyUOpInstruction buffer to PyInterpreterState (gh-138918)Donghee Na2025-09-171-0/+11
|
* gh-128627: Use __builtin_wasm_test_function_pointer_signature for Emscripten ↵Hood Chatham2025-09-171-6/+0
| | | | | | | trampoline (#137470) With https://github.com/llvm/llvm-project/pull/150201 being merged, there is now a better way to generate the Emscripten trampoline, instead of including hand-generated binary WASM content. Requires Emscripten 4.0.12.
* gh-137433: Fix deadlock with stop-the-world and daemon threads (gh-137735)Sam Gross2025-09-161-2/+4
| | | | | | | | | | | | | | | | | There was a deadlock originally seen by Memray when a daemon thread enabled or disabled profiling while the interpreter was shutting down. I think this could also happen with garbage collection, but I haven't seen that in practice. The daemon thread could be hung while trying acquire the global rwmutex that prevents overlapping global and per-interpreter stop-the-world events. Since it already held the main interpreter's stop-the-world lock, it also deadlocked the main thread, which is trying to perform interpreter finalization. Swap the order of lock acquisition to prevent this deadlock. Additionally, refactor `_PyParkingLot_Park` so that the global buckets hashtable is left in a clean state if the thread is hung in `PyEval_AcquireThread`.
* gh-137384: fix crash when accessing warnings state late in runtime shutdown ↵Kumar Aditya2025-08-221-1/+4
| | | | (#138027)
* GH-137959: Replace shim code in jitted code with a single trampoline ↵Mark Shannon2025-08-211-0/+5
| | | | function. (GH-137961)
* gh-137400: Fix thread-safety issues when profiling all threads (gh-137518)Sam Gross2025-08-131-11/+2
| | | | | | | | | | There were a few thread-safety issues when profiling or tracing all threads via PyEval_SetProfileAllThreads or PyEval_SetTraceAllThreads: * The loop over thread states could crash if a thread exits concurrently (in both the free threading and default build) * The modification of `c_profilefunc` and `c_tracefunc` wasn't thread-safe on the free threading build.
* gh-137514: Add a free-threading wrapper for mutexes (GH-137515)Peter Bierma2025-08-071-6/+2
| | | Add `FT_MUTEX_LOCK`/`FT_MUTEX_UNLOCK`, which call `PyMutex_Lock` and `PyMutex_Unlock` on the free-threaded build, and no-op otherwise.
* GH-136410: Faster side exits by using a cold exit stub (GH-136411)Mark Shannon2025-08-011-1/+8
|
* gh-136870: fix data race in `PyThreadState_Clear` on `sys_tracing_threads` ↵Kumar Aditya2025-07-211-0/+9
| | | | | (#136951) In free-threading, multiple threads can be cleared concurrently as such the modifications on `sys_tracing_threads` should be done while holding the profile lock, otherwise it can race with other threads setting up profiling.
* gh-132775: Fix Interpreter.call() __main__ Visibility (gh-135595)Eric Snow2025-06-171-0/+1
| | | | | | As noted in the new tests, there are a few situations we must carefully accommodate for functions that get pickled during interp.call(). We do so by running the script from the main interpreter's __main__ module in a hidden module in the other interpreter. That hidden module is used as the function __globals__.
* gh-91048: Refactor and optimize remote debugging module (#134652)Pablo Galindo Salgado2025-05-251-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Completely refactor Modules/_remote_debugging_module.c with improved code organization, replacing scattered reference counting and error handling with centralized goto error paths. This cleanup improves maintainability and reduces code duplication throughout the module while preserving the same external API. Implement memory page caching optimization in Python/remote_debug.h to avoid repeated reads of the same memory regions during debugging operations. The cache stores previously read memory pages and reuses them for subsequent reads, significantly reducing system calls and improving performance. Add code object caching mechanism with a new code_object_generation field in the interpreter state that tracks when code object caches need invalidation. This allows efficient reuse of parsed code object metadata and eliminates redundant processing of the same code objects across debugging sessions. Optimize memory operations by replacing multiple individual structure copies with single bulk reads for the same data structures. This reduces the number of memory operations and system calls required to gather debugging information from the target process. Update Makefile.pre.in to include Python/remote_debug.h in the headers list, ensuring that changes to the remote debugging header force proper recompilation of dependent modules and maintain build consistency across the codebase. Also, make the module compatible with the free threading build as an extra :) Co-authored-by: Łukasz Langa <lukasz@langa.pl>
* gh-131185: Use a proper thread-local for cached thread states (gh-132510)Peter Bierma2025-05-211-156/+43
| | | | | Switches over to a _Py_thread_local in place of autoTssKey, and also fixes a few other checks regarding PyGILState_Ensure after finalization. Note that this doesn't fix concurrent use of PyGILState_Ensure with Py_Finalize; I'm pretty sure zapthreads doesn't work at all, and that needs to be fixed seperately.
* Simplify interp_look_up_id() (#134257)Victor Stinner2025-05-191-4/+2
| | | | Don't use PyInterpreterState_GetID() but get directly the interpreter 'id' member which cannot fail.
* gh-134144: Fix use-after-free in zapthreads() (#134145)b-pass2025-05-181-2/+7
|
* GH-133231: Changes to executor management to support proposed `sys._jit` ↵Mark Shannon2025-05-041-1/+7
| | | | | | | | module (GH-133287) * Track the current executor, not the previous one, on the thread-state. * Batch executors for deallocation to avoid having to constantly incref executors; this is an ad-hoc form of deferred reference counting.
* GH-124715: Move trashcan mechanism into `Py_Dealloc` (GH-132280)Mark Shannon2025-04-301-1/+4
|
* GH-132508: Use tagged integers on the evaluation stack for the last ↵Mark Shannon2025-04-291-1/+1
| | | | instruction offset (GH-132545)
* gh-133079: Remove Py_C_RECURSION_LIMIT & PyThreadState.c_recursion_remaining ↵Petr Viktorin2025-04-291-1/+0
| | | | | | | | (GH-133080) Both were added in 3.13, are undocumented, and don't make sense in 3.14 due to changes in the stack overflow detection machinery (gh-112282). PEP 387 exception for skipping a deprecation period: https://github.com/python/steering-council/issues/288
* gh-132775: Drop PyUnstable_InterpreterState_GetMainModule() (gh-132978)Eric Snow2025-04-281-6/+32
| | | | | We replace it with _Py_GetMainModule(), and add _Py_CheckMainModule(), but both in the internal-only C-API. We also add _PyImport_GetModulesRef(), which is the equivalent of _PyImport_GetModules(), but which increfs before the lock is released. This is used by a later change related to pickle and handling __main__.
* gh-132399: ensure correct alignment of `PyInterpreterState` (#132428)Bénédikt Tran2025-04-191-3/+14
|
* gh-131238: Add pycore_interpframe_structs.h header (#131553)Victor Stinner2025-03-211-2/+0
| | | | Add an explicit include to pycore_interpframe_structs.h in pycore_runtime_structs.h to fix a dependency cycle.
* gh-131238: Remove includes from pycore_interp.h (#131495)Victor Stinner2025-03-201-8/+7
| | | Remove also now unused includes in C files.
* gh-131238: Remove more includes from pycore_interp.h (#131480)Victor Stinner2025-03-191-0/+1
|
* gh-131238: Remove many includes from pycore_interp.h (#131472)Victor Stinner2025-03-191-0/+1
|
* gh-131238: Convert pycore_pystate.h static inline to functions (#131352)Victor Stinner2025-03-171-0/+38
| | | | | | | | Convert static inline functions to functions: * _Py_IsMainThread() * _PyInterpreterState_Main() * _Py_IsMainInterpreterFinalizing() * _Py_GetMainConfig()
* GH-131238: Core header refactor (GH-131250)Mark Shannon2025-03-171-0/+3
| | | | | * Moves most structs in pycore_ header files into pycore_structs.h and pycore_runtime_structs.h * Removes many cross-header dependencies
* gh-124878: Fix race conditions during interpreter finalization (#130649)Sam Gross2025-03-061-39/+57
| | | | | | | | | | | | | | | | | | | | The PyThreadState field gains a reference count field to avoid issues with PyThreadState being a dangling pointer to freed memory. The refcount starts with a value of two: one reference is owned by the interpreter's linked list of thread states and one reference is owned by the OS thread. The reference count is decremented when the thread state is removed from the interpreter's linked list and before the OS thread calls `PyThread_hang_thread()`. The thread that decrements it to zero frees the `PyThreadState` memory. The `holds_gil` field is moved out of the `_status` bit field, to avoid a data race where on thread calls `PyThreadState_Clear()`, modifying the `_status` bit field while the OS thread reads `holds_gil` when attempting to acquire the GIL. The `PyThreadState.state` field now has `_Py_THREAD_SHUTTING_DOWN` as a possible value. This corresponds to the `_PyThreadState_MustExit()` check. This avoids race conditions in the free threading build when checking `_PyThreadState_MustExit()`.
* GH-127705: better double free message. (GH-130785)Mark Shannon2025-03-051-3/+16
| | | | | * Add location information when accessing already closed stackref * Add #def option to track closed stackrefs to provide precise information for use after free and double frees.
* gh-130091: Reorder `_PyThreadState_Attach` to avoid data race (gh-130092)Sam Gross2025-02-271-2/+1
| | | | | | | | | | | | This moves `tstate_activate()` down to avoid a data race in the free threading build on the `_PyRuntime`'s thread-local `autoTSSkey`. This key is deleted during runtime finalization, which may happen concurrently with a call to `_PyThreadState_Attach`. The earlier `tstate_try/wait_attach` ensures that the thread is blocked before it attempts to access the deleted `autoTSSkey`. This fixes a TSAN reported data race in `test_threading.test_import_from_another_thread`.
* gh-130421: Fix data race on timebase initialization (gh-130592)Sam Gross2025-02-271-0/+6
| | | | | | Windows and macOS require precomputing a "timebase" in order to convert OS timestamps into nanoseconds. Retrieve and compute this value during runtime initialization to avoid data races when accessing the time.
* GH-130396: Use computed stack limits on linux (GH-130398)Mark Shannon2025-02-251-4/+7
| | | | | | | | | | | * Implement C recursion protection with limit pointers for Linux, MacOS and Windows * Remove calls to PyOS_CheckStack * Add stack protection to parser * Make tests more robust to low stacks * Improve error messages for stack overflow
* GH-91079: Revert "GH-91079: Implement C stack limits using addresses, not ↵Petr Viktorin2025-02-241-7/+4
| | | | | | | | | counters. (GH-130007)" for now (GH130413) Revert "GH-91079: Implement C stack limits using addresses, not counters. (GH-130007)" for now Unfortunatlely, the change broke some buildbots. This reverts commit 2498c22fa0a2b560491bc503fa676585c1a603d0.
* gh-111924: Fix data races when swapping allocators (gh-130287)Sam Gross2025-02-201-1/+1
| | | | | | | | | | | | | | | CPython current temporarily changes `PYMEM_DOMAIN_RAW` to the default allocator during initialization and shutdown. The motivation is to ensure that core runtime structures are allocated and freed using the same allocator. However, modifying the current allocator changes global state and is not thread-safe even with the GIL. Other threads may be allocating or freeing objects use PYMEM_DOMAIN_RAW; they are not required to hold the GIL to call PyMem_RawMalloc/PyMem_RawFree. This adds new internal-only functions like `_PyMem_DefaultRawMalloc` that aren't affected by calls to `PyMem_SetAllocator()`, so they're appropriate for Python runtime initialization and finalization. Use these calls in places where we previously swapped to the default raw allocator.
* GH-91079: Implement C stack limits using addresses, not counters. (GH-130007)Mark Shannon2025-02-191-4/+7
| | | | | | | | | | | | * Implement C recursion protection with limit pointers * Remove calls to PyOS_CheckStack * Add stack protection to parser * Make tests more robust to low stacks * Improve error messages for stack overflow