Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix code that could lead to a possible deadlock. #1380

Merged
merged 6 commits into from
Mar 20, 2022
Merged

Fix code that could lead to a possible deadlock. #1380

merged 6 commits into from
Mar 20, 2022

Conversation

DusterTheFirst
Copy link
Contributor

@DusterTheFirst DusterTheFirst commented Mar 19, 2022

This pull request aims to address some accidental double-read locks on RwLocks made by egui. When acquiring 2 read locks, if a thread in parallel happened to attempt to acquire a write lock between the two read locks, the second read lock will deadlock. This deadlock occurs due to RwLocks refusing any new read locks on a RwLock that has a pending Write lock which in most cases can lead to better fairness.

Each commit has a detailed description of the deadlock and how the changes prevent it.

Rather than taking one lock and reusing it, it could be possible to use something like parking_lot's read_recursive which would bypass the aforementioned second read lock from blocking if a write lock is queued, but in my opinion it would be less clear and maintainers would need to understand the difference between read() and read_recursive(). Using the already commonly understood scoping/drop system of Rust seems like the better choice.

Note: Each of these deadlocks have been found by accident, I did not do an extensive comb through the egui code for these situations so others are bound to exist in the codebase that I have not encountered. These deadlocks were just encountered in my code where a second thread was calling ctx.request_repaint very frequently.

Note 2: There is no way that I know of to prove that the deadlocks are gone, but the deadlocks have gone away in my application after these changes.

Drop implementations are not called until the end of a statement. The statement changed in this commit therefore took 4 read locks on a RwLock which can lead to problems if a write lock is requested between any of these read locks. The code looks like it would only hold one lock at a time but it does not drop any of the locks until after the arithmatic operations complete, which leads to this situation. See https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=996079046184329f3a9df1cd19c87da8 to see this in action. The fix is to just take one lock and share it between the three calls to num_presses, letting it drop at the end of the scope.
The issue here is related to that in 9673b8f in that the lock is not dropped when it is expected.  Since the `RwLockReadGuard` produced by `ctx.input()` has a reference taken from it (one into `pointer`), the lock cannot be dropped until that reference is no longre valid, which is the end of the scope (in this case this function).  The following `ctx.input()` then attempts to aquire a second lock on the `RwLock`, creating the same situation that was found in the referenced commit.

This has been resolved by holding one lock on the input for the whole function.
@DusterTheFirst DusterTheFirst marked this pull request as ready for review March 19, 2022 18:59
Copy link
Owner

@emilk emilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

egui/src/menu.rs Outdated Show resolved Hide resolved
egui/src/widgets/slider.rs Outdated Show resolved Hide resolved
@emilk
Copy link
Owner

emilk commented Mar 21, 2022

Another solution is to use a standard Mutex instead of a RwLock for the ContextImpl.

@emilk
Copy link
Owner

emilk commented Mar 21, 2022

This problem would be mitigated by

emilk added a commit that referenced this pull request Mar 22, 2022
https://github.com/asny/three-d recently merged a PR adding
`glow` support: asny/three-d#210
This means it is a prime candidate for embedding 3D painting inside
an eframe app.

There are currently a few kinks that need to be figured out:

### Black screen
When reusing the same three_d context over time (as one should),
we only get one frame of egui together with three_d, and then after that
a black screen with just the three_d painting on top.

I need to fix that before merging this PR.

### `Shape: Send + Sync`
`Shape` is `Send + Sync` and `three_d::Context` is not. This means
we cannot store a three_d context and send it to the `Shape::Callback`.

So we either need to recreate the three_d context each frame (obviously
a bad idea), or access it through a `thread_local` hack.
This PR adds both as examples, with a checkbox to switch.

We could consider making `Shape: !Send + !Sync`, but that would mean
`egui::Context` could not be `Send+Sync` either (because the egui
context stores shapes). This could actually be fine. `egui::Context`
should only be used from a background thread for calling request_repaint
and allocating textures. These could be made separate parts of the
egui Context, so that one would do:

``` rust

let repaint_signal = egui_ctx.repaint_signal();
let tex_mngr = egui_ctx.tex_mngr();
std::thread::spawn(move || {
    // We can use repaint_signal and tex_mngr here,
    // but NOT `egui_ctx`.
}):

Related:
* #1379
* #1380
* #1389
emilk added a commit that referenced this pull request Mar 22, 2022
https://github.com/asny/three-d recently merged a PR adding
`glow` support: asny/three-d#210
This means it is a prime candidate for embedding 3D painting inside
an eframe app.

There are currently a few kinks that need to be figured out:

When reusing the same three_d context over time (as one should),
we only get one frame of egui together with three_d, and then after that
a black screen with just the three_d painting on top.

I need to fix that before merging this PR.

`Shape` is `Send + Sync` and `three_d::Context` is not. This means
we cannot store a three_d context and send it to the `Shape::Callback`.

So we either need to recreate the three_d context each frame (obviously
a bad idea), or access it through a `thread_local` hack.
This PR adds both as examples, with a checkbox to switch.

We could consider making `Shape: !Send + !Sync`, but that would mean
`egui::Context` could not be `Send+Sync` either (because the egui
context stores shapes). This could actually be fine. `egui::Context`
should only be used from a background thread for calling request_repaint
and allocating textures. These could be made separate parts of the
egui Context, so that one would do:

``` rust

let repaint_signal = egui_ctx.repaint_signal();
let tex_mngr = egui_ctx.tex_mngr();
std::thread::spawn(move || {
    // We can use repaint_signal and tex_mngr here,
    // but NOT `egui_ctx`.
}):

Related:
* #1379
* #1380
* #1389
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants