Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panics in one task pool scope can panic a separate scope #6453

Closed
hymm opened this issue Nov 3, 2022 · 3 comments
Closed

Panics in one task pool scope can panic a separate scope #6453

hymm opened this issue Nov 3, 2022 · 3 comments
Labels
A-Tasks Tools for parallel and async work C-Bug An unexpected or incorrect behavior

Comments

@hymm
Copy link
Contributor

hymm commented Nov 3, 2022

Bevy version

commit: 54a1e51

What you did

This is a continued investigation into #4998, #4996, #6081, #5285, and #5054.

A quick summary is that the panic in the panic_when_hierachy_cycle test infests other tests and causes them to panic. This was fixed by changing the panicking test to be single threaded and not use the thread pool. While this is a fine solution for testing, it unearthed a problem with how the task pool scopes work.

What went wrong

A task_pool.scope runs it's own instance of the executor so tasks can be run on the thread the scope is running on. In the case of these tests each test has it's own instance of scope running. When we get the spurious failures, the panicking system is actually getting run on the scope executor of the failing test instead of the scope in the should_panic test or one of the task pool threads.

It definitely feels incorrect that a panic from one scope's tasks is causing problems in a scope that is only related by using the same task pool.

Solutions?

Wrapping the try_tick's in a catch_unwind seems to prevent this problem.

let res = std::panic::catch_unwind(|| {
    executor.try_tick();
    task_scope_executor.try_tick();
});

But I'm not sure about the correctness of this and it probably has bad effects on performance since this is a pretty hot loop.

Another solution could be to not allow tasks from one scope executing on another scope. But this would limit the number of threads available to a scope to run tasks.

Overall I'm not sure what the correct fix for this would be or look like.

@hymm hymm added C-Bug An unexpected or incorrect behavior S-Needs-Triage This issue needs to be labelled A-Tasks Tools for parallel and async work and removed S-Needs-Triage This issue needs to be labelled labels Nov 3, 2022
@hymm hymm changed the title More info on previously flaky bevy_transform tests Panics in one task pool scope can panic a separate scope Nov 3, 2022
@james7132
Copy link
Member

Could we make Scope a Future instead of a synchronous function? This would allow yielding back to the executor/TaskPool, relying on the catch_unwind added in #6443, and allow transitively carrying the panic through via bubbling.

@hymm
Copy link
Contributor Author

hymm commented Nov 3, 2022

async scopes have a lot of open questions on how to implement correctly. https://rust-lang.github.io/wg-async/vision/roadmap/scopes.html. We probably don't want to open that can of worms to just solve this problem.

@james7132
Copy link
Member

Thinking on this more. I think the solution in the original issue is correct for the same reasons #6443's changes are correct. By dropping the runnable during the panic, it and causes the top level Task that spawned it to propagate the panic.

@bors bors bot closed this as completed in 210979f Nov 21, 2022
cart pushed a commit that referenced this issue Nov 30, 2022
# Objective
Fix #6453. 

## Solution
Use the solution mentioned in the issue by catching the unwind and dropping the error. Wrap the `executor.try_tick` calls with `std::catch::unwind`.

Ideally this would be moved outside of the hot loop, but the mut ref to the `spawned` future is not `UnwindSafe`.

This PR only addresses the bug, we can address the perf issues (should there be any) later.
ItsDoot pushed a commit to ItsDoot/bevy that referenced this issue Feb 1, 2023
# Objective
Fix bevyengine#6453. 

## Solution
Use the solution mentioned in the issue by catching the unwind and dropping the error. Wrap the `executor.try_tick` calls with `std::catch::unwind`.

Ideally this would be moved outside of the hot loop, but the mut ref to the `spawned` future is not `UnwindSafe`.

This PR only addresses the bug, we can address the perf issues (should there be any) later.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Tasks Tools for parallel and async work C-Bug An unexpected or incorrect behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants