-
-
Notifications
You must be signed in to change notification settings - Fork 30.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-111964: Implement stop-the-world pauses #112471
Conversation
1ebcb70
to
db1dd03
Compare
@eendebakpt - thanks! |
Mark and I are currently on vacation. I'm not back until after Dec 8. Mark should be back this week. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change mostly makes sense, but it took me a little while to connect all the pieces in my head. I've left various comments related to that and to a number of other things that might be worth adjusting.
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I split off the _PyRWMutex
PR here: #112859. (This is now rebased on that one.)
I don't think the stop-the-world functionality belongs in ceval
related files. It's fundamentally about managing PyThreadState
objects (particularly, the "state" field) and is closely tied to the existing _PyThreadState_Attach/Detach
functions. The connection to eval_breaker is only a small piece, similar to how the GC has an eval breaker bit.
Include/internal/pycore_interp.h
Outdated
// Support for stop-the-world events. This exists in both the PyRuntime struct | ||
// for global pauses and in each PyInterpreterState for per-interpreter pauses. | ||
struct _stoptheworld_state { | ||
PyMutex mutex; // Serializes stop-the-world attempts. | ||
|
||
// NOTE: The fields below are protected by HEAD_LOCK(runtime), not by the | ||
// above mutex. | ||
bool requested; // Set when a pause is requested. | ||
bool world_stopped; // Set when the world is stopped. | ||
bool is_global; // Set when contained in PyRuntime struct. | ||
|
||
PyEvent stop_event; // Set when thread_countdown reaches zero. | ||
Py_ssize_t thread_countdown; // Number of threads that must pause. | ||
|
||
PyThreadState *requester; // Thread that requested the pause (may be NULL). | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's only loosely related to the eval_breaker. It's fundamentally about PyThreadState.state
: transitioning threads to the _Py_THREAD_GC
state. You could still have this without eval-breaker (it would just be much less responsive), but you can't implement it without PyThreadState.state
.
Include/internal/pycore_pystate.h
Outdated
// Perform a stop-the-world pause for all threads in all interpreters. | ||
// | ||
// Threads in the "attached" state are paused and transitioned to the "GC" | ||
// state. Threads in the "detached" state switch to the "GC" state, preventing | ||
// them from reattaching until the stop-the-world pause is complete. | ||
// | ||
// NOTE: This is a no-op outside of Py_GIL_DISABLED builds. | ||
extern void _PyRuntimeState_StopTheWorld(_PyRuntimeState *runtime); | ||
extern void _PyRuntimeState_StartTheWorld(_PyRuntimeState *runtime); | ||
|
||
// Perform a stop-the-world pause for threads in the specified interpreter. | ||
// | ||
// NOTE: This is a no-op outside of Py_GIL_DISABLED builds. | ||
extern void _PyInterpreterState_StopTheWorld(PyInterpreterState *interp); | ||
extern void _PyInterpreterState_StartTheWorld(PyInterpreterState *interp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I like the common prefix for the global and per-interpreter APIs.
- I don't think we should combine the APIs because you don't need a thread-state or interpreter state to start a global stop the world pause.
fcd642b
to
a6f5ab7
Compare
Include/internal/pycore_pystate.h
Outdated
// (bound thread) (stop-the-world thread) | ||
// [attached] <-> [detached] <-> [gc] | ||
// [attached] <-> [detached] <-> [suspended] | ||
// + ^ | ||
// +--------------------------------------------------------+ | ||
// (bound thread) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, the text above makes it sound like thread a thread can go from "suspended" straight to "attached".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, (bound thread)
shows up at the top of the diagram and at the bottom. I'm not sure how to interpret that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By text above do you mean the diagram or the paragraph above it? I've updated both in an attempt to make them more clear.
The "(bound thread)" and "(stop-the-world thread)" labels indicate which thread can perform the transition. It's at the bottom because it's labeling the one-way transition from "attached" to "suspended".
Include/internal/pycore_pystate.h
Outdated
// If there is no stop-the-world pause in progress, then this function is | ||
// a no-op. Returns one if the thread was switched to the "suspended" state and | ||
// zero otherwise. | ||
extern int _PyThreadState_Suspend(PyThreadState *tstate); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to fall back to detaching?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think that's conceptually simpler.
Include/internal/pycore_pystate.h
Outdated
// Perform a stop-the-world pause for all threads in all interpreters. | ||
// | ||
// Threads in the "attached" state are paused and transitioned to the "GC" | ||
// state. Threads in the "detached" state switch to the "GC" state, preventing | ||
// them from reattaching until the stop-the-world pause is complete. | ||
// | ||
// NOTE: This is a no-op outside of Py_GIL_DISABLED builds. | ||
extern void _PyRuntimeState_StopTheWorld(_PyRuntimeState *runtime); | ||
extern void _PyRuntimeState_StartTheWorld(_PyRuntimeState *runtime); | ||
|
||
// Perform a stop-the-world pause for threads in the specified interpreter. | ||
// | ||
// NOTE: This is a no-op outside of Py_GIL_DISABLED builds. | ||
extern void _PyInterpreterState_StopTheWorld(PyInterpreterState *interp); | ||
extern void _PyInterpreterState_StartTheWorld(PyInterpreterState *interp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main reservation is that the PyInterpreterState_
and PyRuntimeState_
prefixes are used for either simple getters/setters or struct lifecycle. For anything beyond that we either use a Py_
prefix or a prefix that relates to a specific feature. (We're a bit freer with the PyThreadState_
prefix, but I'm okay with that.) In this case I got the sense that this is a ceval feature, hence suggesting the _PyEval_
prefix.
Regardless, let's find a prefix other than _Py*State_
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly things are looking good. There's the _PyRWMutex
PR, I have one comment here, and there are a couple still open from my earlier review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed the functions to _PyEval_StopTheWorldAll
, etc. at least as a placeholder. I couldn't think of anything better. We could also go with only the _Py
prefix.
Include/internal/pycore_pystate.h
Outdated
// (bound thread) (stop-the-world thread) | ||
// [attached] <-> [detached] <-> [gc] | ||
// [attached] <-> [detached] <-> [suspended] | ||
// + ^ | ||
// +--------------------------------------------------------+ | ||
// (bound thread) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By text above do you mean the diagram or the paragraph above it? I've updated both in an attempt to make them more clear.
The "(bound thread)" and "(stop-the-world thread)" labels indicate which thread can perform the transition. It's at the bottom because it's labeling the one-way transition from "attached" to "suspended".
Include/internal/pycore_pystate.h
Outdated
// If there is no stop-the-world pause in progress, then this function is | ||
// a no-op. Returns one if the thread was switched to the "suspended" state and | ||
// zero otherwise. | ||
extern int _PyThreadState_Suspend(PyThreadState *tstate); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think that's conceptually simpler.
The `--disable-gil` builds occasionally need to pause all but one thread. Some examples include: * Cyclic garbage collection, where this is often called a "stop the world event" * Before calling `fork()`, to ensure a consistent state for internal data structures * During interpreter shutdown, to ensure that daemon threads aren't accessing Python objects This adds the following functions to implement global and per-interpreter pauses: * `_PyRuntimeState_StopTheWorld` and `_PyRuntimeState_StartTheWorld` * `_PyInterpreterState_StopTheWorld` and `_PyInterpreterState_StartTheWorld` These functions are no-ops outside of the `--disable-gil` build. This also adds `_PyRWMutex`, a "readers-writer" lock, which is used to serialize global stop-the-world pauses with per-interpreter pauses.
Starting in the "suspended" state blocks any _PyThreadState_Attach() call until the stop-the-world event completes.
_PyThreadState_Suspend now switches to "detached" if there is no active stop-the-world request.
12c6585
to
4f7b6ce
Compare
@ericsnowcurrently, I've rebased the PR onto main to get the changes from gh-112859 (_PyRWMutex). |
Hi @ericsnowcurrently - happy New Year! Would you please look this over and let me know if there are any issues that still need to be addressed? I don't think we came to a decision on the names of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay. LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
void | ||
_PyEval_StopTheWorldAll(_PyRuntimeState *runtime) | ||
{ | ||
#ifdef Py_GIL_DISABLED | ||
stop_the_world(&runtime->stoptheworld); | ||
#endif | ||
} | ||
|
||
void | ||
_PyEval_StartTheWorldAll(_PyRuntimeState *runtime) | ||
{ | ||
#ifdef Py_GIL_DISABLED | ||
start_the_world(&runtime->stoptheworld); | ||
#endif | ||
} | ||
|
||
void | ||
_PyEval_StopTheWorld(PyInterpreterState *interp) | ||
{ | ||
#ifdef Py_GIL_DISABLED | ||
stop_the_world(&interp->stoptheworld); | ||
#endif | ||
} | ||
|
||
void | ||
_PyEval_StartTheWorld(PyInterpreterState *interp) | ||
{ | ||
#ifdef Py_GIL_DISABLED | ||
start_the_world(&interp->stoptheworld); | ||
#endif | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think these and the related static functions make more sense in Python/ceval_gil.c, but not enough to hold up this PR.
We can revisit some the discussion (including some of the alternate names I suggested) separately.
Thanks @ericsnowcurrently! Would you please merge the PR? |
Thanks, @colesbury! |
The `--disable-gil` builds occasionally need to pause all but one thread. Some examples include: * Cyclic garbage collection, where this is often called a "stop the world event" * Before calling `fork()`, to ensure a consistent state for internal data structures * During interpreter shutdown, to ensure that daemon threads aren't accessing Python objects This adds the following functions to implement global and per-interpreter pauses: * `_PyEval_StopTheWorldAll()` and `_PyEval_StartTheWorldAll()` (for the global runtime) * `_PyEval_StopTheWorld()` and `_PyEval_StartTheWorld()` (per-interpreter) (The function names may change.) These functions are no-ops outside of the `--disable-gil` build.
The `--disable-gil` builds occasionally need to pause all but one thread. Some examples include: * Cyclic garbage collection, where this is often called a "stop the world event" * Before calling `fork()`, to ensure a consistent state for internal data structures * During interpreter shutdown, to ensure that daemon threads aren't accessing Python objects This adds the following functions to implement global and per-interpreter pauses: * `_PyEval_StopTheWorldAll()` and `_PyEval_StartTheWorldAll()` (for the global runtime) * `_PyEval_StopTheWorld()` and `_PyEval_StartTheWorld()` (per-interpreter) (The function names may change.) These functions are no-ops outside of the `--disable-gil` build.
The
--disable-gil
builds occasionally need to pause all but one thread. Some examples include:fork()
, to ensure a consistent state for internal data structuresThis adds the following functions to implement global and per-interpreter pauses:
_PyRuntimeState_StopTheWorld
and_PyRuntimeState_StartTheWorld
_PyInterpreterState_StopTheWorld
and_PyInterpreterState_StartTheWorld
These functions are no-ops outside of the
--disable-gil
build.This also adds
_PyRWMutex
, a "readers-writer" lock, which is used to serialize global stop-the-world pauses with per-interpreter pauses.--disable-gil
builds) #111964