Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazily create threads in ThreadPool #576

Closed
Jasper-Bekkers opened this issue Jun 7, 2018 · 10 comments
Closed

Lazily create threads in ThreadPool #576

Jasper-Bekkers opened this issue Jun 7, 2018 · 10 comments

Comments

@Jasper-Bekkers
Copy link

Hi,

I just ran into an issue where I hit pretty major utilization doing almost no work on a 56LP machine, after profiling it looked like most of the time spent was on a condition variable when waiting for more work to be consumed in 56 threads at a time.

For quite low core count machines (<8) it might make sense to reserve the whole system for potential parallelism but with these beefy machines you end up spending a lot of time waiting for nothing & burning cycles in the condition variable. One alternative is to take a look at what TBB does for this problem since they seem to lazily allocate and deallocate threads after a cooldown period depending on the workload.

  • Jasper
@randomPoison
Copy link

I believe amethyst/amethyst#780 may be caused by a similar issue.

I recently notices that the amethyst game engine is using a surprising amount of CPU when idle (15%-25% CPU usage when no application logic is running). When I profiled my application, I noticed that most of the CPU time was actually being used by rayon worker threads, which were spinning idly while waiting for more work. Rayon has a mechanism for automatically putting worker threads to sleep when there's no work, but I believe this mechanism is being thwarted by the fact that amethyst will add new work for each frame. This means that every frame, once all work for the frame is done, every worker thread has to spin-wait for an amount of time before being put back to sleep.

@Xaeroxe
Copy link

Xaeroxe commented Jun 11, 2018

For the Amethyst use case it could also be useful to make the sleep timings configurable. In our case we need the workers to go to sleep much sooner than they do by default.

@cuviper
Copy link
Member

cuviper commented Jun 11, 2018

RE the subject, I think we've mostly decided we'd prefer not to dynamically adjust the thread count, as this implies a heap of complexity of its own. We've tried to keep the API such that it would be possible though.

I have thought about how we could be more careful about wakeups though. Right now, all sleeping threads in a given thread pool wait on a common condvar, and we mostly wake for two reasons: new work is available, or some stolen work is done. (A third reason is shutdown, but this is uncontroversial -- it must wake all threads to clean up and exit.)

For new work, it seemed like a good idea to wake everyone (notify_all) so we ramp up quickly to anticipate even more work, but maybe we should re-evaluate this and just wake one at a time (notify_one). For stolen work, we just don't have a way to wake up the exact thread that would be waiting for its completion, since everyone is on the same condvar.

We could also consider giving each thread its own condvar to allow fine-grained wake-ups. That makes broadcasts harder, but that should be on the cold path anyway, so maybe this doesn't matter.

@Xaeroxe
Copy link

Xaeroxe commented Jun 11, 2018

I think we've mostly decided we'd prefer not to dynamically adjust the thread count

Does the complexity arise from the direct act of adding/removing threads or from the new need to monitor the thread utilization? The reason I ask is that for my use case specifically I can monitor whether or not the thread pool is meeting our targets pretty easily, then if we'd benefit from adding another thread I could just call something like add_threads(1).

I for one do not mind the idea of controlling this completely manually.

@cuviper
Copy link
Member

cuviper commented Jun 11, 2018

Yeah, making decisions about dynamic threads is hard, and I think we'd more likely be open to manual controls to let the pool owner decide. Internally we'd still have to figure out how to safely update shared state, especially Registry.thread_infos, but I'm sure we can solve it.

Ideally, I'd hope we can improve wakeups/etc enough that you don't need to worry about it, but manual controls might be useful for other reasons too.

@Xaeroxe
Copy link

Xaeroxe commented Jun 11, 2018

@cuviper I'll gladly open a PR for manual thread count control, however the best solution I've got for managing mutation of the thread_infos is an RwLock. Would Rayon consider that acceptable?

@cuviper
Copy link
Member

cuviper commented Jun 11, 2018

Let's see if the benchmarks can answer whether RwLock is good enough -- at least comparing rayon-demo per https://github.com/rayon-rs/rayon/wiki/Benchmarking-Rayon.

@Jasper-Bekkers
Copy link
Author

If dynamically scaling up the number of threads isn't an option, potentially we could at least have saner default for high core count machines? There's really no reason to take all 56 LPs in a machine by default.

For new work, it seemed like a good idea to wake everyone (notify_all) so we ramp up quickly to anticipate even more work, but maybe we should re-evaluate this and just wake one at a time (notify_one). For stolen work, we just don't have a way to wake up the exact thread that would be waiting for its completion, since everyone is on the same condvar.

We could also consider giving each thread its own condvar to allow fine-grained wake-ups. That makes broadcasts harder, but that should be on the cold path anyway, so maybe this doesn't matter.

I think both of these solutions would go a long way on a high-core count machine and I'd personally lean to having a condvar per thread since it'll reduce contention quite significantly - having all 56 threads wake up seems like overkill.

@nikomatsakis
Copy link
Member

Note: I've opened rayon-rs/rfcs#5 which is targeting exactly this issue. I've got a branch that implements the proposed changes, also (in fact, it implements several variations thereof). In particular, that branch should only start a small-ish number of threads, and no longer tries to keep all threads awake all of the time.

@Jasper-Bekkers
Copy link
Author

Closing because issue is solved through other means.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants