-
Notifications
You must be signed in to change notification settings - Fork 346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Yield from the main thread a few times at shutdown #2660
Conversation
we could pick a seed that causes the problems and hardcode it into the test's arguments. When it fails due to rustc changes, we need to find a new seed. |
The seed that causes the problem depends on the details of stdlib internals and MIR generation. Nearly all seeds are fine. So the problem we have is that any selection of a seed may not reproduce the problematic scheduling in the future.
The problem would be the test passing with whatever seed we pick, even though the bug is back. |
oh 🤦 yea, we'd need a way to disable this yielding scheme and then run the test once without it, expect it to fail and once with it, expecting it to pass... that's annoying and even more noise here. |
☔ The latest upstream changes (presumably 3162b7a) made this pull request unmergeable. Please resolve the merge conflicts. |
bd021e2
to
929e80c
Compare
Yeah sadly I don't think we can test for this.
Not this PR, but if you have a good idea for a refactor that'd be great! |
src/concurrency/thread.rs
Outdated
&& self.main_thread_yields_at_shutdown_remaining > 0 | ||
{ | ||
self.yield_active_thread = true; | ||
self.main_thread_yields_at_shutdown_remaining -= 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we'd do this yielding after executing main thread TLS dtors (because it'd be UB to access them). Not sure how terrible that would be to implement though, so a FIXME is also fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah... I would like to understand how to do this correctly. The naive implementation is just shifting around the big check this PR adds so that the original if should_terminate() { return Ok(ExecuteDtors) }
comes first, but that seems to produce data races whenever this yielding behavior kicks in: We run the main thread's dtors, then schedule another thread on the next step, and that thread reports a data race on one of a few allocations in the Rust runtime.
It wouldn't surprise me if these data races are legitimate. After all, we do have this comment:
// The main thread terminated; stop the program.
// We do *not* run TLS dtors of remaining threads, which seems to match rustc behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hard to say more without knowing the details of the race and how you implemented this. But we already currently will automatically yield to other threads while main thread Dtors are running, so we should already be seeing such races?
The best place to put this yielding behavior seems to be in the if self.threads[MAIN_THREAD].state == ThreadState::Terminated
check, no? Instead of stopping, we keep going, but with some kind of bound.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually a bound on the steps is probably better than yielding... if automatic yielding is disabled, this might otherwise keep going forever even after the main thread stopped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed a commit which does the change I was trying to describe above. I've seen two data races reported, one in the test suite and the other when I run this file:
use std::thread;
fn main() {
thread::scope(|s| {
s.spawn(|| {});
});
}
With ./miri many-seeds ./miri demo.rs
I agree that a bound on the extra steps sounds reasonable. Is the concern that this yielding behavior may take programs which currently exit under the non-yielding option and turn them into programs that never exit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the concern that this yielding behavior may take programs which currently exit under the non-yielding option and turn them into programs that never exit?
Yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TLS shims re-enable terminated threads. Is that a good thing? Do they have to be Enabled in order to execute user code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, threads must definitely be enabled when running TLS dtors.
I wish they wouldn't be disabled and then re-enabled... this entire scheduler and TLS stuff needs a rework at some point, but I haven't yet found a nice way to factor this. I still think a generator would be good... basically yield
ing a function to run, which is then executed until the stack is empty again, at which point the generator is consulted to figure out whether there is another function to run or whether this thread is done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sadly using generators for this will not work.
Co-authored-by: Ralf Jung <post@ralfj.de>
if self.active_thread == MAIN_THREAD | ||
&& active_thread.state == ThreadState::Terminated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already know self.threads[MAIN_THREAD].state == ThreadState::Terminated
here, so this seems redundant?
Uh, wth... the thread isn't even reading anything?^^
|
I believe alloc28 is |
Is the active thread getting inconsistent or so? It is pointing at the
Unfortunately the trace isn't very helpful when there are multiple threads... I think this race just happens when the main thread TLS dtors are done (i.e., immediately when the new logic kicks in). |
Oh I have a theory... when we are reading the return place here, is the active thread still the other thread? That would look like a race. |
@rustbot author |
I implemented a scheduler refactoring with an alternative way to do this yielding in #2699. |
refactor scheduler Refactors the scheduler to use something akin to a generator -- a callback that will be invoked when the stack of a thread is empty, which has the chance to push a new stack frame or do other things and then indicates whether this thread is done, or should be scheduled again. (Unfortunately I think we [cannot use actual generators](https://rust-lang.zulipchat.com/#narrow/stream/213817-t-lang/topic/Generators.20that.20borrow.20on.20each.20resume.3F) here.) The interpreter loop is now a proper infinite loop, the only way to leave it is for some kind of interrupt to be triggered (represented as `InterpError`) -- unifying how we handle 'exit when calling `process::exit`' and 'exit when main thread quits'. The last commit implements an alternative approach to #2660 using this new structure. Fixes #2629.
Closing in favor of #2699. Thanks for the PR! |
refactor scheduler Refactors the scheduler to use something akin to a generator -- a callback that will be invoked when the stack of a thread is empty, which has the chance to push a new stack frame or do other things and then indicates whether this thread is done, or should be scheduled again. (Unfortunately I think we [cannot use actual generators](https://rust-lang.zulipchat.com/#narrow/stream/213817-t-lang/topic/Generators.20that.20borrow.20on.20each.20resume.3F) here.) The interpreter loop is now a proper infinite loop, the only way to leave it is for some kind of interrupt to be triggered (represented as `InterpError`) -- unifying how we handle 'exit when calling `process::exit`' and 'exit when main thread quits'. The last commit implements an alternative approach to rust-lang/miri#2660 using this new structure. Fixes rust-lang/miri#2629.
Closes #2629
... how should we test this? The naive approach is to run
./miri many-seeds ./miri run example.rs
: #2629 (comment)But running many seeds takes an annoyingly long time. How many seeds is enough? And how does such a test fit into our existing testing system?
(I'm also not enthused about the way I'm continuing to pile on complexity into the scheduler here, should this PR include a refactor?)