summaryrefslogtreecommitdiffstats
path: root/Include/cpython
diff options
context:
space:
mode:
authormpage <mpage@meta.com>2024-03-16 12:56:30 (GMT)
committerGitHub <noreply@github.com>2024-03-16 12:56:30 (GMT)
commit33da0e844c922b3dcded75fbb9b7be67cb013a17 (patch)
tree940ddc29bb00e450f9a984e01bff7674f929543e /Include/cpython
parent86bc40dd414bceb3f93382cc9f670936de9d68be (diff)
downloadcpython-33da0e844c922b3dcded75fbb9b7be67cb013a17.zip
cpython-33da0e844c922b3dcded75fbb9b7be67cb013a17.tar.gz
cpython-33da0e844c922b3dcded75fbb9b7be67cb013a17.tar.bz2
gh-114271: Fix race in `Thread.join()` (#114839)
There is a race between when `Thread._tstate_lock` is released[^1] in `Thread._wait_for_tstate_lock()` and when `Thread._stop()` asserts[^2] that it is unlocked. Consider the following execution involving threads A, B, and C: 1. A starts. 2. B joins A, blocking on its `_tstate_lock`. 3. C joins A, blocking on its `_tstate_lock`. 4. A finishes and releases its `_tstate_lock`. 5. B acquires A's `_tstate_lock` in `_wait_for_tstate_lock()`, releases it, but is swapped out before calling `_stop()`. 6. C is scheduled, acquires A's `_tstate_lock` in `_wait_for_tstate_lock()` but is swapped out before releasing it. 7. B is scheduled, calls `_stop()`, which asserts that A's `_tstate_lock` is not held. However, C holds it, so the assertion fails. The race can be reproduced[^3] by inserting sleeps at the appropriate points in the threading code. To do so, run the `repro_join_race.py` from the linked repo. There are two main parts to this PR: 1. `_tstate_lock` is replaced with an event that is attached to `PyThreadState`. The event is set by the runtime prior to the thread being cleared (in the same place that `_tstate_lock` was released). `Thread.join()` blocks waiting for the event to be set. 2. `_PyInterpreterState_WaitForThreads()` provides the ability to wait for all non-daemon threads to exit. To do so, an `is_daemon` predicate was added to `PyThreadState`. This field is set each time a thread is created. `threading._shutdown()` now calls into `_PyInterpreterState_WaitForThreads()` instead of waiting on `_tstate_lock`s. [^1]: https://github.com/python/cpython/blob/441affc9e7f419ef0b68f734505fa2f79fe653c7/Lib/threading.py#L1201 [^2]: https://github.com/python/cpython/blob/441affc9e7f419ef0b68f734505fa2f79fe653c7/Lib/threading.py#L1115 [^3]: https://github.com/mpage/cpython/commit/81946532792f938cd6f6ab4c4ff92a4edf61314f --------- Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com> Co-authored-by: Antoine Pitrou <antoine@python.org>
Diffstat (limited to 'Include/cpython')
-rw-r--r--Include/cpython/pystate.h26
1 files changed, 0 insertions, 26 deletions
diff --git a/Include/cpython/pystate.h b/Include/cpython/pystate.h
index ac7ff83..38d0897 100644
--- a/Include/cpython/pystate.h
+++ b/Include/cpython/pystate.h
@@ -161,32 +161,6 @@ struct _ts {
*/
uintptr_t critical_section;
- /* Called when a thread state is deleted normally, but not when it
- * is destroyed after fork().
- * Pain: to prevent rare but fatal shutdown errors (issue 18808),
- * Thread.join() must wait for the join'ed thread's tstate to be unlinked
- * from the tstate chain. That happens at the end of a thread's life,
- * in pystate.c.
- * The obvious way doesn't quite work: create a lock which the tstate
- * unlinking code releases, and have Thread.join() wait to acquire that
- * lock. The problem is that we _are_ at the end of the thread's life:
- * if the thread holds the last reference to the lock, decref'ing the
- * lock will delete the lock, and that may trigger arbitrary Python code
- * if there's a weakref, with a callback, to the lock. But by this time
- * _PyRuntime.gilstate.tstate_current is already NULL, so only the simplest
- * of C code can be allowed to run (in particular it must not be possible to
- * release the GIL).
- * So instead of holding the lock directly, the tstate holds a weakref to
- * the lock: that's the value of on_delete_data below. Decref'ing a
- * weakref is harmless.
- * on_delete points to _threadmodule.c's static release_sentinel() function.
- * After the tstate is unlinked, release_sentinel is called with the
- * weakref-to-lock (on_delete_data) argument, and release_sentinel releases
- * the indirectly held lock.
- */
- void (*on_delete)(void *);
- void *on_delete_data;
-
int coroutine_origin_tracking_depth;
PyObject *async_gen_firstiter;