`#[should_panic(expected = ...)]` non-functional. #5054

FraserLee · 2022-06-20T01:18:15Z

Bevy version

Release 0.7

What you did

Repeatedly run the following minimum code sample with cargo test

use bevy::prelude::*;

#[test]
#[should_panic(expected = "expected panic")]
fn mvs_test() {
    App::new()
        .add_plugins(MinimalPlugins)
        .add_system( || {
            panic!("expected panic");
        })
        .run();
}

About 9/10 times the result is as expected

    Finished test [unoptimized + debuginfo] target(s) in 0.18s
     Running unittests src/main.rs (target/debug/deps/temp_test-698e95098388382a)

running 1 test
test mvs_test - should panic ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

What went wrong

About 1/10 times the test fails, with the message

    Finished test [unoptimized + debuginfo] target(s) in 0.18s
     Running unittests src/main.rs (target/debug/deps/temp_test-698e95098388382a)

running 1 test
test mvs_test - should panic ... FAILED

failures:

---- const_change_test stdout ----
thread 'Compute Task Pool (2)' panicked at 'expected panic', src/main.rs:9:13
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'mvs_test' panicked at 'task has failed', /Users/fraser/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/async-task-4.2.0/src/task.rs:425:45
note: panic did not contain expected string
      panic message: `"task has failed"`,
 expected substring: `"expected panic"`

failures:
    mvs_test

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

error: test failed, to rerun pass '--bin temp_test'

The correct panic is triggered, but the wrong panic message is sent.

Additional information

I've tried both running tests with cargo test -- --test-threads=1, and using the serial_test crate - neither helping.

The text was updated successfully, but these errors were encountered:

alice-i-cecile · 2022-06-20T01:28:03Z

This seems like another example of the recent panicking global task pool issue we encountered in our CI.

mockersf · 2022-06-20T08:12:12Z

the panic message will be different wether the panic happens on the main thread or on one of the other threads

FraserLee · 2022-06-20T08:17:46Z

@mockersf is that intentionally the case, and if so is there any way to change it?

I could be wrong, but it seems that at bare minimum there should be some way to lock-down things to always run on the main thread for testing, though I'd probably go further and say having the same error randomly show different messages depending on the thread setup is a really unergonomic corner, probably worth addressing even in non-test runtime.

bjorn3 · 2022-06-20T10:23:48Z

I think the thread pool should catch panics using std::panic::catch_unwind and resume them on the main thread using std::panic::resume_unwind with the result of catch_unwind. This prevents showing a backtrace twice and it allows libtest to see the original panic message.

james7132 · 2022-06-20T11:15:40Z

This seems like another example of the recent panicking global task pool issue we encountered in our CI.

If this is on 0.7 and not main, this seems to be more an issue with our general panic handling story, though the global task pool changes definitely exacerbate this.

I agree with @bjorn3 here that we should definitely use catch/resume unwind, though the behavior, or at least the default behavior likely needs some more discussion and thought.

If we catch the panic, we need to forward it to the main thread to resume it, but the main thread runs on its own accord and needs to explicitly call into the TaskPool to retrieve these panics. In the meantime, it's unclear what we should then do with the panicking thread.

If we kill it, it drops all tasks scheduled to it, including local tasks, which in the context of the engine in general, may lead to deadlock: our tasks aren't made to be canceled by anything other than dropping the Task itself.
We cannot migrate it to a new thread and kill the original as migrating thread local tasks may move non-Send tasks to another thread.
For the same aforementioned reasons, we cannot stall the panicking thread by just not accepting any new tasks, it will otherwise deadlock whkle waiting for the main thread to check for panics on the pool, which may not happen if the main thread is waiting on local work on the exisiting thread.
We could just abort. Though that is potentially an even worse solution.

It may be required to either stem all panics and never forward them to the main thread, force local tasks to be Send so they can be properly migrated in the case of a panic, or have the main thread operate as a worker thread in the pool and regularly yield to allow handling these panics.

mockersf · 2022-06-20T11:26:01Z

@mockersf is that intentionally the case, and if so is there any way to change it?

Unintended consequence of how we handle threads!

As mentioned by James, panicking in threads can cause deadlocks (it uses to happen quite often with asset loader tasks that panicked and deadlocked all the io threads). I think we should panic on the main thread when a panic happens somewhere else, at least until we have a way better error recovery than now

mockersf · 2022-06-20T15:50:46Z

just remembered this PR that is related: #2307

hymm · 2022-06-20T17:42:56Z

@FraserLee as a workaround you could add the system to a single threaded stage.

# Objective Right now, the `TaskPool` implementation allows panics to permanently kill worker threads upon panicking. This is currently non-recoverable without using a `std::panic::catch_unwind` in every scheduled task. This is poor ergonomics and even poorer developer experience. This is exacerbated by bevyengine#2250 as these threads are global and cannot be replaced after initialization. Removes the need for temporary fixes like bevyengine#4998. Fixes bevyengine#4996. Fixes bevyengine#6081. Fixes bevyengine#5285. Fixes bevyengine#5054. Supersedes bevyengine#2307. ## Solution The current solution is to wrap `Executor::run` in `TaskPool` with a `catch_unwind`, and discarding the potential panic. This was taken straight from [smol](https://github.com/smol-rs/smol/blob/404c7bcc0aea59b82d7347058043b8de7133241c/src/spawn.rs#L44)'s current implementation. ~~However, this is not entirely ideal as:~~ - ~~the signaled to the awaiting task. We would need to change `Task<T>` to use `async_task::FallibleTask` internally, and even then it doesn't signal *why* it panicked, just that it did.~~ (See below). - ~~no error is logged of any kind~~ (See below) - ~~it's unclear if it drops other tasks in the executor~~ (it does not) - ~~This allows the ECS parallel executor to keep chugging even though a system's task has been dropped. This inevitably leads to deadlock in the executor.~~ Assuming we don't catch the unwind in ParallelExecutor, this will naturally kill the main thread. ### Alternatives A final solution likely will incorporate elements of any or all of the following. #### ~~Log and Ignore~~ ~~Log the panic, drop the task, keep chugging. This only addresses the discoverability of the panic. The process will continue to run, probably deadlocking the executor. tokio's detatched tasks operate in this fashion.~~ Panics already do this by default, even when caught by `catch_unwind`. #### ~~`catch_unwind` in `ParallelExecutor`~~ ~~Add another layer catching system-level panics into the `ParallelExecutor`. How the executor continues when a core dependency of many systems fails to run is up for debate.~~ `async_task::Task` bubbles up panics already, this will transitively push panics all the way to the main thread. #### ~~Emulate/Copy `tokio::JoinHandle` with `Task<T>`~~ ~~`tokio::JoinHandle<T>` bubbles up the panic from the underlying task when awaited. This can be transitively applied across other APIs that also use `Task<T>` like `Query::par_for_each` and `TaskPool::scope`, bubbling up the panic until it's either caught or it reaches the main thread.~~ `async_task::Task` bubbles up panics already, this will transitively push panics all the way to the main thread. #### Abort on Panic The nuclear option. Log the error, abort the entire process on any thread in the task pool panicking. Definitely avoids any additional infrastructure for passing the panic around, and might actually lead to more efficient code as any unwinding is optimized out. However gives the developer zero options for dealing with the issue, a seemingly poor choice for debuggability, and prevents graceful shutdown of the process. Potentially an option for handling very low-level task management (a la bevyengine#4740). Roughly takes the shape of: ```rust struct AbortOnPanic; impl Drop for AbortOnPanic { fn drop(&mut self) { abort!(); } } let guard = AbortOnPanic; // Run task std::mem::forget(AbortOnPanic); ``` --- ## Changelog Changed: `bevy_tasks::TaskPool`'s threads will no longer terminate permanently when a task scheduled onto them panics. Changed: `bevy_tasks::Task` and`bevy_tasks::Scope` will propagate panics in the spawned tasks/scopes to the parent thread.

FraserLee added C-Bug An unexpected or incorrect behavior S-Needs-Triage This issue needs to be labelled labels Jun 20, 2022

alice-i-cecile added A-Diagnostics Logging, crash handling, error reporting and performance analysis and removed S-Needs-Triage This issue needs to be labelled labels Jun 20, 2022

alice-i-cecile added the A-Tasks Tools for parallel and async work label Jun 20, 2022

alice-i-cecile mentioned this issue Jul 11, 2022

The test correct_transforms_when_no_children spuriously fails. #5285

Closed

james7132 mentioned this issue Nov 2, 2022

[Merged by Bors] - TaskPool Panic Handling #6443

Closed

bors bot closed this as completed in 54a1e51 Nov 2, 2022

hymm mentioned this issue Nov 3, 2022

Panics in one task pool scope can panic a separate scope #6453

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`#[should_panic(expected = ...)]` non-functional. #5054

`#[should_panic(expected = ...)]` non-functional. #5054

FraserLee commented Jun 20, 2022

alice-i-cecile commented Jun 20, 2022

mockersf commented Jun 20, 2022

FraserLee commented Jun 20, 2022

bjorn3 commented Jun 20, 2022 •

edited

Loading

james7132 commented Jun 20, 2022 •

edited

Loading

mockersf commented Jun 20, 2022

mockersf commented Jun 20, 2022

hymm commented Jun 20, 2022

#[should_panic(expected = ...)] non-functional. #5054

#[should_panic(expected = ...)] non-functional. #5054

Comments

FraserLee commented Jun 20, 2022

Bevy version

What you did

What went wrong

Additional information

alice-i-cecile commented Jun 20, 2022

mockersf commented Jun 20, 2022

FraserLee commented Jun 20, 2022

bjorn3 commented Jun 20, 2022 • edited Loading

james7132 commented Jun 20, 2022 • edited Loading

mockersf commented Jun 20, 2022

mockersf commented Jun 20, 2022

hymm commented Jun 20, 2022

`#[should_panic(expected = ...)]` non-functional. #5054

`#[should_panic(expected = ...)]` non-functional. #5054

bjorn3 commented Jun 20, 2022 •

edited

Loading

james7132 commented Jun 20, 2022 •

edited

Loading