Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition in make_pending_calls in free-threaded build #122201

Closed
colesbury opened this issue Jul 23, 2024 · 0 comments
Closed

Race condition in make_pending_calls in free-threaded build #122201

colesbury opened this issue Jul 23, 2024 · 0 comments
Labels
3.13 bugs and security fixes 3.14 new features, bugs and security fixes topic-free-threading type-bug An unexpected behavior, bug, or error

Comments

@colesbury
Copy link
Contributor

colesbury commented Jul 23, 2024

Bug report

make_pending_calls uses a mutex and the the handling_thread field to ensure that only one thread per-interpreter is handling pending calls at a time:

cpython/Python/ceval_gil.c

Lines 911 to 928 in 41a91bd

/* Only one thread (per interpreter) may run the pending calls
at once. In the same way, we don't do recursive pending calls. */
PyMutex_Lock(&pending->mutex);
if (pending->handling_thread != NULL) {
/* A pending call was added after another thread was already
handling the pending calls (and had already "unsignaled").
Once that thread is done, it may have taken care of all the
pending calls, or there might be some still waiting.
To avoid all threads constantly stopping on the eval breaker,
we clear the bit for this thread and make sure it is set
for the thread currently handling the pending call. */
_Py_set_eval_breaker_bit(pending->handling_thread, _PY_CALLS_TO_DO_BIT);
_Py_unset_eval_breaker_bit(tstate, _PY_CALLS_TO_DO_BIT);
PyMutex_Unlock(&pending->mutex);
return 0;
}
pending->handling_thread = tstate;
PyMutex_Unlock(&pending->mutex);

However, the clearing of handling_thread is done outside of the mutex:

cpython/Python/ceval_gil.c

Lines 959 to 960 in 41a91bd

pending->handling_thread = NULL;
return 0;

There are two problems with this (for the free-threaded build):

  • It's a data race because there's a read of handling_thread (in the mutex) concurrently with a write (outside the mutex)
  • The logic in that sets _PY_CALLS_TO_DO_BIT on pending->handling_thread is subject to a time-of-check to time-of-use hazard: pending->handling_thread may be non-NULL when evaluating the if-statement, but then cleared before setting the eval breaker bit.

Relevant unit test

TSan catches this race when running test_subthreads_can_handle_pending_calls from test_capi.test_misc.

Suggested Fix

We should set pending->handling_thread = NULL; while holding pending->mutex, at least in the free-threaded build

Linked PRs

@colesbury colesbury added type-bug An unexpected behavior, bug, or error 3.13 bugs and security fixes topic-free-threading 3.14 new features, bugs and security fixes labels Jul 23, 2024
colesbury added a commit to colesbury/cpython that referenced this issue Jul 23, 2024
In the free-threaded build, we need to lock pending->mutex when clearing
the handling_thread in order not to race with a concurrent
make_pending_calls in the same interpreter.
colesbury added a commit that referenced this issue Jul 26, 2024
In the free-threaded build, we need to lock pending->mutex when clearing
the handling_thread in order not to race with a concurrent
make_pending_calls in the same interpreter.
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jul 26, 2024
…honGH-122204)

In the free-threaded build, we need to lock pending->mutex when clearing
the handling_thread in order not to race with a concurrent
make_pending_calls in the same interpreter.
(cherry picked from commit c557ae9)

Co-authored-by: Sam Gross <colesbury@gmail.com>
colesbury added a commit that referenced this issue Jul 26, 2024
…-122204) (#122319)

In the free-threaded build, we need to lock pending->mutex when clearing
the handling_thread in order not to race with a concurrent
make_pending_calls in the same interpreter.
(cherry picked from commit c557ae9)

Co-authored-by: Sam Gross <colesbury@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.13 bugs and security fixes 3.14 new features, bugs and security fixes topic-free-threading type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

2 participants