Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thread pool without work stealing #1174

Open
nhukc opened this issue Jun 14, 2024 · 6 comments
Open

Thread pool without work stealing #1174

nhukc opened this issue Jun 14, 2024 · 6 comments

Comments

@nhukc
Copy link

nhukc commented Jun 14, 2024

We are using a rayon thread pool to run a rust-based userspace driver. A mutex must be acquired before calls can be made to the driver.

We are running into a common deadlock problem with rayon work-stealing.
This issue occurs because the third party code that invokes our driver is also using rayon.

The deadlock results from the following events:

  1. Third party global thread pool is running N jobs
  2. Third party thread A invokes our library
  3. Third party thread A acquires our driver mutex
  4. Third party thread A creates a scope within our driver thread pool
  5. Third party thread A steals work
  6. Third party thread A invokes our library
  7. Third party thread A is waiting on a lock that it already holds.
  8. Deadlock

The rayon thread pool API fits our use-case perfectly, but the implementation detail of work-stealing causes deadlocks.

Is there an easy way to modify the thread pool internals to prevent work-stealing?

I'm not looking to merge with upstream unless this feature is wanted by others. I'd be OK with maintaining a non-work-stealing fork if a maintainer could give me some tips for how to do this.

@nhukc
Copy link
Author

nhukc commented Jun 14, 2024

It may be worth noting that our driver thread pool executes exactly as many jobs as there are threads in the pool.

@cuviper
Copy link
Member

cuviper commented Jun 14, 2024

The general issue with Mutex is also described in #592.

One of the ideas in that thread is to have a fully-blocking version of ThreadPool::install, so cross-pool calls won't work-steal in the first one anymore. In your scenario, that should block "third party thread A" until your driver is done.

@nhukc
Copy link
Author

nhukc commented Jun 14, 2024

Yes, that would solve my problem.

Is there any on-going work in that direction? Or tips for how to implement such a feature?

@nhukc
Copy link
Author

nhukc commented Jun 14, 2024

It looks like the meat of the work would be in and around this function.

unsafe fn in_worker_cross<OP, R>(&self, current_thread: &WorkerThread, op: OP) -> R

@cuviper
Copy link
Member

cuviper commented Jun 14, 2024

Or rather, avoid that function in this case and call in_worker_cold instead.

@nhukc
Copy link
Author

nhukc commented Jun 18, 2024

I implemented the suggestion in #1175. I did not call in_worker_cold because that function has a useful-looking debug assert that conflicts with this use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants