-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Panic with tokio::time on shutdown #2789
Comments
I checked and -#[tokio::test(threaded_scheduler)]
+#[tokio::test(basic_scheduler)] makes it work |
I've gathered some results. Unfortunately, they do not give me any clarity on where the bug(s) is. Note: I add some debugs: diff --git a/tokio/src/time/wheel/mod.rs b/tokio/src/time/wheel/mod.rs
index a2ef27fc..dcfd500e 100644
--- a/tokio/src/time/wheel/mod.rs
+++ b/tokio/src/time/wheel/mod.rs
@@ -221,7 +221,7 @@ where
fn set_elapsed(&mut self, when: u64) {
assert!(
- self.elapsed <= when,
+ dbg!(self.elapsed) <= dbg!(when),
"elapsed={:?}; when={:?}",
self.elapsed,
when Test I'm running:#[tokio::test(threaded_scheduler)]
// #[tokio::test(basic_scheduler)]
async fn hang_on_shutdown() {
let (sync_tx, sync_rx) = std::sync::mpsc::channel::<()>();
tokio::spawn(async move {
tokio::task::block_in_place(|| sync_rx.recv().ok());
});
tokio::spawn(async {
tokio::time::delay_for(std::time::Duration::from_secs(5)).await;
drop(sync_tx);
});
tokio::time::delay_for(std::time::Duration::from_secs(1)).await;
} Output on different versionsPanics!
21f7260 ( doesn't panic, but hangs.
|
Looked into this a bit. It appears that when the timer wheel shuts down it is supposed to be completely drained by adding a long "elapsed" time tokio/tokio/src/time/driver/mod.rs Line 323 in 646fbae
set_elapsed tokio/tokio/src/time/wheel/mod.rs Lines 222 to 233 in 4c4699b
However, in the threaded scheduler there is a race such that this "poll" function gets entered again after this, but this time it receives the normal and the assertion is triggered (not sure why all this happens though).
|
This doesn't happen if |
Agree. This works fine: #[cfg(test)]
#[tokio::test(threaded_scheduler)]
async fn hang_on_shutdown() {
let (sync_tx, sync_rx) = std::sync::mpsc::channel::<()>();
tokio::task::spawn_blocking(move || sync_rx.recv().ok());
tokio::spawn(async {
tokio::time::delay_for(std::time::Duration::from_secs(1)).await;
drop(sync_tx);
});
tokio::time::delay_for(std::time::Duration::from_secs(2)).await;
} |
I'm getting this specific panic too. Not on shutdown though. It happens for no apparent reason.
We're not using any tokio::task::spawn_blocking(move || {
tokio::runtime::Handle::current().block_on(async {
//...
})
}) but I'm not sure this is the cause. |
Hello, Here is the crash report generated by human-panic : name = 'dezoomify-rs'
operating_system = 'unix:Debian'
crate_version = '2.6.1'
explanation = '''
Panic occurred in file '/home/runner/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/tokio-0.2.23/src/time/wheel/mod.rs' at line 223
'''
cause = 'elapsed=18446744073709551615; when=56977'
method = 'Panic'
backtrace = '''
0: 0x56086d6bea77 - tokio::time::wheel::Wheel<T>::poll::h2116cfbd2fa0846e
1: 0x56086d6c757a - tokio::time::driver::Driver<T>::process::h3a3a3fe415090c44
2: 0x56086d6c7d4d - <tokio::time::driver::Driver<T> as tokio::park::Park>::park::hda1fea46742c1431
3: 0x56086d6c1092 - <tokio::runtime::park::Parker as tokio::park::Park>::park::h4d2a30281af6a33f
4: 0x56086d6b7e86 - tokio::runtime::thread_pool::worker::Context::run::h1cbcfea6abdf1996
5: 0x56086d6c5dd3 - tokio::macros::scoped_tls::ScopedKey<T>::set::hef45728da1fd4508
6: 0x56086d6b7b76 - tokio::runtime::thread_pool::worker::run::h4b98dc581b7ed45e
7: 0x56086d2beca3 - tokio::runtime::task::core::Core<T,S>::poll::hd0da54221e1d7050
8: 0x56086d2d37f2 - <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once::hd6d9269f37e02876
9: 0x56086d2ae486 - tokio::runtime::task::harness::Harness<T,S>::poll::h1517c781cc3f9f2c
10: 0x56086d6b5830 - tokio::runtime::blocking::pool::Inner::run::hfd468efbdbf0f354
11: 0x56086d6bffa0 - tokio::runtime::context::enter::ha89c39b700754910
12: 0x56086d6c9316 - std::sys_common::backtrace::__rust_begin_short_backtrace::h4d77fd4853978be2
13: 0x56086d6c4b4b - core::ops::function::FnOnce::call_once{{vtable.shim}}::hf771a3de431ab82d
14: 0x56086d85393a - <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once::h670c50864ac2cb92
at /rustc/18bf6b4f01a6feaf7259ba7cdae58031af1b7b39/library/alloc/src/boxed.rs:1042
- <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once::h2511952749086d81
at /rustc/18bf6b4f01a6feaf7259ba7cdae58031af1b7b39/library/alloc/src/boxed.rs:1042
- std::sys::unix::thread::Thread::new::thread_start::h5ad4ddffe24373a8
at /rustc/18bf6b4f01a6feaf7259ba7cdae58031af1b7b39/library/std/src/sys/unix/thread.rs:87
15: 0x7f31400e2fa3 - start_thread
16: 0x7f313fff74cf - clone
17: 0x0 - <unresolved>''' It also seems to be happening during shutdown. |
I don't really have context on this issue, but we recently had #3080 merged which may or may not fix it. Can someone familiar with the issue test it again? |
Using: The following hangs and panics sometimes with:
#[tokio::test(flavor = "multi_thread")]
async fn hang_on_shutdown() {
let (sync_tx, sync_rx) = std::sync::mpsc::channel::<()>();
tokio::spawn(async move {
tokio::task::block_in_place(|| sync_rx.recv().ok());
});
tokio::spawn(async {
tokio::time::sleep(std::time::Duration::from_secs(2)).await;
drop(sync_tx);
});
tokio::time::sleep(std::time::Duration::from_secs(1)).await;
} The following always completes: #[tokio::test]
async fn hang_on_shutdown() {
let (sync_tx, sync_rx) = std::sync::mpsc::channel::<()>();
tokio::spawn(async move {
tokio::task::block_in_place(|| sync_rx.recv().ok());
});
tokio::spawn(async {
tokio::time::sleep(std::time::Duration::from_secs(2)).await;
drop(sync_tx);
});
tokio::time::sleep(std::time::Duration::from_secs(1)).await;
} |
I don't think the PR has been released yet. Can you use a git dependency? |
Same issue using |
See tokio-rs/tokio#2789 for details. We were seeing this panic during normal operation, not just at shutdown.
See tokio-rs/tokio#2789 for details. We were seeing this panic during normal operation, not just at shutdown.
#3080 appears to make the panic much more frequent. On some machines, we see it every ~20 minutes using tokio 0.3.5 (with #3080), but every ~72 hours using tokio 0.3.4 (without #3080). For more details, see ZcashFoundation/zebra#1452. Unfortunately, the panic is right in the middle of some complex async code using |
@bdonlan this still happens maybe you have some thoughts? |
This panic indicates something is trying to roll back the last-evaluated time of the timer wheel. It looks like the expiration deadline is being set to a crazy high value somehow, and... then we're not rejecting this in the poll() function? I'll take a more detailed look tomorrow. |
@teor2345 I believe your case is actually different than this one. This issue is a shutdown issue. |
@teor2345 could you create a new issue w/ info containing features you have enabled and backtrace at least. |
All core threads share the same park, and invoke park() followed by shutdown - in parallel. This is quite possibly a bug in the worker (in that it invokes shutdown when it's not done with the parker), but for now work around it by bypassing the timer parker once shutdown is called. Fixes: tokio-rs#2789
Previously, the runtime shutdown logic would first hand control over all cores to a single thread, which would sequentially shut down all tasks on the core and then wait for them to complete. This could deadlock when one task is waiting for a later core's task to complete. For example, in the newly added test, we have a `block_in_place` task that is waiting for another task to be dropped. If the latter task adds its core to the shutdown list later than the former, we end up waiting forever for the `block_in_place` task to complete. Additionally, there also was a bug wherein we'd attempt to park on the parker after shutting it down that was fixed as part of the refactors above. This change restructures the code to bring all tasks to a halt (and do any parking needed) before we collapse to a single thread to avoid this deadlock. There was also an issue in which cancelled tasks would not unpark the originating thread, due to what appears to be some sort of optimization gone wrong. This has been fixed to be much more conservative in selecting when not to unpark the source thread (this may be too conservative; please take a look at the changes to `release()`). Fixes: tokio-rs#2789
Previously, the runtime shutdown logic would first hand control over all cores to a single thread, which would sequentially shut down all tasks on the core and then wait for them to complete. This could deadlock when one task is waiting for a later core's task to complete. For example, in the newly added test, we have a `block_in_place` task that is waiting for another task to be dropped. If the latter task adds its core to the shutdown list later than the former, we end up waiting forever for the `block_in_place` task to complete. Additionally, there also was a bug wherein we'd attempt to park on the parker after shutting it down that was fixed as part of the refactors above. This change restructures the code to bring all tasks to a halt (and do any parking needed) before we collapse to a single thread to avoid this deadlock. There was also an issue in which cancelled tasks would not unpark the originating thread, due to what appears to be some sort of optimization gone wrong. This has been fixed to be much more conservative in selecting when not to unpark the source thread (this may be too conservative; please take a look at the changes to `release()`). Fixes: tokio-rs#2789
Previously, the runtime shutdown logic would first-hand control over all cores to a single thread, which would sequentially shut down all tasks on the core and then wait for them to complete. This could deadlock when one task is waiting for a later core's task to complete. For example, in the newly added test, we have a `block_in_place` task that is waiting for another task to be dropped. If the latter task adds its core to the shutdown list later than the former, we end up waiting forever for the `block_in_place` task to complete. Additionally, there also was a bug wherein we'd attempt to park on the parker after shutting it down which was fixed as part of the refactors above. This change restructures the code to bring all tasks to a halt (and do any parking needed) before we collapse to a single thread to avoid this deadlock. There was also an issue in which canceled tasks would not unpark the originating thread, due to what appears to be some sort of optimization gone wrong. This has been fixed to be much more conservative in selecting when not to unpark the source thread (this may be too conservative; please take a look at the changes to `release()`). Fixes: #2789
Version
Current master 9d58b70
Platform
Linux markus-XPS-15-9570 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Description
Were trying to create a minimal reproduction of an intermittent hang during shutdown that I am seeing. But instead I found this panic. This may or may not be related to my real issue but I imagine it ought to be fixed as well.
The text was updated successfully, but these errors were encountered: