Yield from the main thread a few times at shutdown #2660

saethlin · 2022-11-09T04:53:40Z

... how should we test this? The naive approach is to run ./miri many-seeds ./miri run example.rs: #2629 (comment)

use std::thread;

fn main() {
  thread::scope(|s| {
    s.spawn(|| {});
  });
}

But running many seeds takes an annoyingly long time. How many seeds is enough? And how does such a test fit into our existing testing system?

(I'm also not enthused about the way I'm continuing to pile on complexity into the scheduler here, should this PR include a refactor?)

oli-obk · 2022-11-09T09:10:14Z

how should we test this?

we could pick a seed that causes the problems and hardcode it into the test's arguments. When it fails due to rustc changes, we need to find a new seed.

saethlin · 2022-11-09T11:31:41Z

that causes the problems

The seed that causes the problem depends on the details of stdlib internals and MIR generation. Nearly all seeds are fine. So the problem we have is that any selection of a seed may not reproduce the problematic scheduling in the future.

When it fails due to rustc changes, we need to find a new seed.

The problem would be the test passing with whatever seed we pick, even though the bug is back.

oli-obk · 2022-11-09T11:40:27Z

The problem would be the test passing with whatever seed we pick, even though the bug is back.

oh 🤦 yea, we'd need a way to disable this yielding scheme and then run the test once without it, expect it to fail and once with it, expecting it to pass... that's annoying and even more noise here.

bors · 2022-11-17T15:16:37Z

☔ The latest upstream changes (presumably 3162b7a) made this pull request unmergeable. Please resolve the merge conflicts.

RalfJung · 2022-11-19T11:26:22Z

Yeah sadly I don't think we can test for this.

(I'm also not enthused about the way I'm continuing to pile on complexity into the scheduler here, should this PR include a refactor?)

Not this PR, but if you have a good idea for a refactor that'd be great!
I keep wondering if we can use generators for this somehow, it's kind of a state machine thing... though this applies more to the TLS logic, which is quite terrible too, then the core scheduler.

src/concurrency/thread.rs

RalfJung · 2022-11-19T11:30:28Z

src/concurrency/thread.rs

+            && self.main_thread_yields_at_shutdown_remaining > 0
+        {
+            self.yield_active_thread = true;
+            self.main_thread_yields_at_shutdown_remaining -= 1;


Ideally we'd do this yielding after executing main thread TLS dtors (because it'd be UB to access them). Not sure how terrible that would be to implement though, so a FIXME is also fine.

Yeah... I would like to understand how to do this correctly. The naive implementation is just shifting around the big check this PR adds so that the original if should_terminate() { return Ok(ExecuteDtors) } comes first, but that seems to produce data races whenever this yielding behavior kicks in: We run the main thread's dtors, then schedule another thread on the next step, and that thread reports a data race on one of a few allocations in the Rust runtime.

It wouldn't surprise me if these data races are legitimate. After all, we do have this comment:

// The main thread terminated; stop the program. // We do *not* run TLS dtors of remaining threads, which seems to match rustc behavior.

Hard to say more without knowing the details of the race and how you implemented this. But we already currently will automatically yield to other threads while main thread Dtors are running, so we should already be seeing such races?

The best place to put this yielding behavior seems to be in the if self.threads[MAIN_THREAD].state == ThreadState::Terminated check, no? Instead of stopping, we keep going, but with some kind of bound.

Actually a bound on the steps is probably better than yielding... if automatic yielding is disabled, this might otherwise keep going forever even after the main thread stopped.

I pushed a commit which does the change I was trying to describe above. I've seen two data races reported, one in the test suite and the other when I run this file:

use std::thread; fn main() { thread::scope(|s| { s.spawn(|| {}); }); }

With ./miri many-seeds ./miri demo.rs

I agree that a bound on the extra steps sounds reasonable. Is the concern that this yielding behavior may take programs which currently exit under the non-yielding option and turn them into programs that never exit?

Is the concern that this yielding behavior may take programs which currently exit under the non-yielding option and turn them into programs that never exit?

Yes.

The TLS shims re-enable terminated threads. Is that a good thing? Do they have to be Enabled in order to execute user code?

Yes, threads must definitely be enabled when running TLS dtors.

I wish they wouldn't be disabled and then re-enabled... this entire scheduler and TLS stuff needs a rework at some point, but I haven't yet found a nice way to factor this. I still think a generator would be good... basically yielding a function to run, which is then executed until the stack is empty again, at which point the generator is consulted to figure out whether there is another function to run or whether this thread is done.

Ah sadly using generators for this will not work.

Co-authored-by: Ralf Jung <post@ralfj.de>

…e Thread

src/concurrency/thread.rs

RalfJung · 2022-11-21T21:24:44Z

src/concurrency/thread.rs

+            if self.active_thread == MAIN_THREAD
+                && active_thread.state == ThreadState::Terminated


We already know self.threads[MAIN_THREAD].state == ThreadState::Terminated here, so this seems redundant?

RalfJung · 2022-11-21T21:39:31Z

Uh, wth... the thread isn't even reading anything?^^

note: tracking was triggered
  |
  = note: created machine-managed memory allocation of 8 bytes (alignment 8 bytes) with id 28
  = note: (no span available)

Dropping 0
error: Undefined Behavior: Data race detected between Read on thread `<unnamed>` and Write on thread `main` at alloc28
  --> tests/pass/threadleak_ignored.rs:29:9
   |
29 |         loop {}
   |         ^^^^^^^ Data race detected between Read on thread `<unnamed>` and Write on thread `main` at alloc28
   |
   = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
   = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
   = note: BACKTRACE:
   = note: inside closure at tests/pass/threadleak_ignored.rs:29:9

saethlin · 2022-11-21T21:47:00Z

I believe alloc28 is main's return place

RalfJung · 2022-11-21T21:49:46Z

Is the active thread getting inconsistent or so? It is pointing at the loop; the last statements executed before that are

│ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ ├─11031ms  INFO rustc_const_eval::interpret::step return
│ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ ├─11031ms  INFO rustc_const_eval::interpret::eval_context popping stack frame (returning from function)
│ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │┌┘rustc_const_eval::interpret::eval_context::frame std::sys_common::thread_local_dtor::register_dtor_fallback::run_dtors
│ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ ├┘rustc_const_eval::interpret::eval_context::frame main::{closure#1}
│ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ ├─32962ms  INFO rustc_const_eval::interpret::step _12 = const ()

Unfortunately the trace isn't very helpful when there are multiple threads...

I think this race just happens when the main thread TLS dtors are done (i.e., immediately when the new logic kicks in).

RalfJung · 2022-11-21T21:52:31Z

Oh I have a theory... when we are reading the return place here, is the active thread still the other thread? That would look like a race.

RalfJung · 2022-11-23T14:31:32Z

@rustbot author

RalfJung · 2022-11-27T11:43:06Z

I implemented a scheduler refactoring with an alternative way to do this yielding in #2699.

refactor scheduler Refactors the scheduler to use something akin to a generator -- a callback that will be invoked when the stack of a thread is empty, which has the chance to push a new stack frame or do other things and then indicates whether this thread is done, or should be scheduled again. (Unfortunately I think we [cannot use actual generators](https://rust-lang.zulipchat.com/#narrow/stream/213817-t-lang/topic/Generators.20that.20borrow.20on.20each.20resume.3F) here.) The interpreter loop is now a proper infinite loop, the only way to leave it is for some kind of interrupt to be triggered (represented as `InterpError`) -- unifying how we handle 'exit when calling `process::exit`' and 'exit when main thread quits'. The last commit implements an alternative approach to #2660 using this new structure. Fixes #2629.

RalfJung · 2022-12-01T09:08:27Z

Closing in favor of #2699. Thanks for the PR!

refactor scheduler Refactors the scheduler to use something akin to a generator -- a callback that will be invoked when the stack of a thread is empty, which has the chance to push a new stack frame or do other things and then indicates whether this thread is done, or should be scheduled again. (Unfortunately I think we [cannot use actual generators](https://rust-lang.zulipchat.com/#narrow/stream/213817-t-lang/topic/Generators.20that.20borrow.20on.20each.20resume.3F) here.) The interpreter loop is now a proper infinite loop, the only way to leave it is for some kind of interrupt to be triggered (represented as `InterpError`) -- unifying how we handle 'exit when calling `process::exit`' and 'exit when main thread quits'. The last commit implements an alternative approach to rust-lang/miri#2660 using this new structure. Fixes rust-lang/miri#2629.

RalfJung force-pushed the master branch from 8466d2f to 3162b7a Compare November 17, 2022 15:15

Yield from the main thread a few times at shutdown

929e80c

saethlin force-pushed the yield-before-termination branch from bd021e2 to 929e80c Compare November 17, 2022 15:25

RalfJung reviewed Nov 19, 2022

View reviewed changes

src/concurrency/thread.rs Outdated Show resolved Hide resolved

RalfJung reviewed Nov 19, 2022

View reviewed changes

src/concurrency/thread.rs Outdated Show resolved Hide resolved

RalfJung reviewed Nov 19, 2022

View reviewed changes

src/concurrency/thread.rs Outdated Show resolved Hide resolved

RalfJung reviewed Nov 19, 2022

View reviewed changes

saethlin and others added 2 commits November 19, 2022 19:40

Update src/concurrency/thread.rs

fe07f66

Co-authored-by: Ralf Jung <post@ralfj.de>

Turn check_terminated into should_terminate, which does not modify th…

eefbf1d

…e Thread

RalfJung reviewed Nov 20, 2022

View reviewed changes

src/concurrency/thread.rs Outdated Show resolved Hide resolved

saethlin added 2 commits November 20, 2022 18:30

Better organization and wording

624a593

Yield after TLS dtors, not before

b40c64c

RalfJung reviewed Nov 21, 2022

View reviewed changes

rustbot added the S-waiting-on-author Status: Waiting for the PR author to address review comments label Nov 23, 2022

RalfJung mentioned this pull request Nov 27, 2022

refactor scheduler #2699

Merged

RalfJung closed this Dec 1, 2022

saethlin deleted the yield-before-termination branch January 15, 2023 22:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Yield from the main thread a few times at shutdown #2660

Yield from the main thread a few times at shutdown #2660

saethlin commented Nov 9, 2022

oli-obk commented Nov 9, 2022

saethlin commented Nov 9, 2022

oli-obk commented Nov 9, 2022

bors commented Nov 17, 2022

RalfJung commented Nov 19, 2022

RalfJung Nov 19, 2022

saethlin Nov 20, 2022

RalfJung Nov 20, 2022

RalfJung Nov 20, 2022

saethlin Nov 20, 2022

RalfJung Nov 21, 2022

saethlin Nov 25, 2022

RalfJung Nov 25, 2022 •

edited

Loading

RalfJung Nov 27, 2022

RalfJung Nov 21, 2022

RalfJung commented Nov 21, 2022

saethlin commented Nov 21, 2022

RalfJung commented Nov 21, 2022 •

edited

Loading

RalfJung commented Nov 21, 2022

RalfJung commented Nov 23, 2022

RalfJung commented Nov 27, 2022

RalfJung commented Dec 1, 2022

		if self.active_thread == MAIN_THREAD
		&& active_thread.state == ThreadState::Terminated

Yield from the main thread a few times at shutdown #2660

Yield from the main thread a few times at shutdown #2660

Conversation

saethlin commented Nov 9, 2022

oli-obk commented Nov 9, 2022

saethlin commented Nov 9, 2022

oli-obk commented Nov 9, 2022

bors commented Nov 17, 2022

RalfJung commented Nov 19, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RalfJung Nov 25, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RalfJung commented Nov 21, 2022

saethlin commented Nov 21, 2022

RalfJung commented Nov 21, 2022 • edited Loading

RalfJung commented Nov 21, 2022

RalfJung commented Nov 23, 2022

RalfJung commented Nov 27, 2022

RalfJung commented Dec 1, 2022

RalfJung Nov 25, 2022 •

edited

Loading

RalfJung commented Nov 21, 2022 •

edited

Loading