Lazily create threads in ThreadPool #576

Jasper-Bekkers · 2018-06-07T09:40:51Z

Hi,

I just ran into an issue where I hit pretty major utilization doing almost no work on a 56LP machine, after profiling it looked like most of the time spent was on a condition variable when waiting for more work to be consumed in 56 threads at a time.

For quite low core count machines (<8) it might make sense to reserve the whole system for potential parallelism but with these beefy machines you end up spending a lot of time waiting for nothing & burning cycles in the condition variable. One alternative is to take a look at what TBB does for this problem since they seem to lazily allocate and deallocate threads after a cooldown period depending on the workload.

Jasper

randomPoison · 2018-06-11T16:35:28Z

I believe amethyst/amethyst#780 may be caused by a similar issue.

I recently notices that the amethyst game engine is using a surprising amount of CPU when idle (15%-25% CPU usage when no application logic is running). When I profiled my application, I noticed that most of the CPU time was actually being used by rayon worker threads, which were spinning idly while waiting for more work. Rayon has a mechanism for automatically putting worker threads to sleep when there's no work, but I believe this mechanism is being thwarted by the fact that amethyst will add new work for each frame. This means that every frame, once all work for the frame is done, every worker thread has to spin-wait for an amount of time before being put back to sleep.

Xaeroxe · 2018-06-11T16:48:06Z

For the Amethyst use case it could also be useful to make the sleep timings configurable. In our case we need the workers to go to sleep much sooner than they do by default.

cuviper · 2018-06-11T17:23:02Z

RE the subject, I think we've mostly decided we'd prefer not to dynamically adjust the thread count, as this implies a heap of complexity of its own. We've tried to keep the API such that it would be possible though.

I have thought about how we could be more careful about wakeups though. Right now, all sleeping threads in a given thread pool wait on a common condvar, and we mostly wake for two reasons: new work is available, or some stolen work is done. (A third reason is shutdown, but this is uncontroversial -- it must wake all threads to clean up and exit.)

For new work, it seemed like a good idea to wake everyone (notify_all) so we ramp up quickly to anticipate even more work, but maybe we should re-evaluate this and just wake one at a time (notify_one). For stolen work, we just don't have a way to wake up the exact thread that would be waiting for its completion, since everyone is on the same condvar.

We could also consider giving each thread its own condvar to allow fine-grained wake-ups. That makes broadcasts harder, but that should be on the cold path anyway, so maybe this doesn't matter.

Xaeroxe · 2018-06-11T17:33:21Z

I think we've mostly decided we'd prefer not to dynamically adjust the thread count

Does the complexity arise from the direct act of adding/removing threads or from the new need to monitor the thread utilization? The reason I ask is that for my use case specifically I can monitor whether or not the thread pool is meeting our targets pretty easily, then if we'd benefit from adding another thread I could just call something like add_threads(1).

I for one do not mind the idea of controlling this completely manually.

cuviper · 2018-06-11T17:42:53Z

Yeah, making decisions about dynamic threads is hard, and I think we'd more likely be open to manual controls to let the pool owner decide. Internally we'd still have to figure out how to safely update shared state, especially Registry.thread_infos, but I'm sure we can solve it.

Ideally, I'd hope we can improve wakeups/etc enough that you don't need to worry about it, but manual controls might be useful for other reasons too.

Xaeroxe · 2018-06-11T20:26:30Z

@cuviper I'll gladly open a PR for manual thread count control, however the best solution I've got for managing mutation of the thread_infos is an RwLock. Would Rayon consider that acceptable?

cuviper · 2018-06-11T20:31:16Z

Let's see if the benchmarks can answer whether RwLock is good enough -- at least comparing rayon-demo per https://github.com/rayon-rs/rayon/wiki/Benchmarking-Rayon.

Jasper-Bekkers · 2018-06-13T10:10:17Z

If dynamically scaling up the number of threads isn't an option, potentially we could at least have saner default for high core count machines? There's really no reason to take all 56 LPs in a machine by default.

For new work, it seemed like a good idea to wake everyone (notify_all) so we ramp up quickly to anticipate even more work, but maybe we should re-evaluate this and just wake one at a time (notify_one). For stolen work, we just don't have a way to wake up the exact thread that would be waiting for its completion, since everyone is on the same condvar.

We could also consider giving each thread its own condvar to allow fine-grained wake-ups. That makes broadcasts harder, but that should be on the cold path anyway, so maybe this doesn't matter.

I think both of these solutions would go a long way on a high-core count machine and I'd personally lean to having a condvar per thread since it'll reduce contention quite significantly - having all 56 threads wake up seems like overkill.

nikomatsakis · 2019-09-10T09:30:10Z

Note: I've opened rayon-rs/rfcs#5 which is targeting exactly this issue. I've got a branch that implements the proposed changes, also (in fact, it implements several variations thereof). In particular, that branch should only start a small-ish number of threads, and no longer tries to keep all threads awake all of the time.

Jasper-Bekkers · 2020-06-21T11:51:27Z

Closing because issue is solved through other means.

cuviper mentioned this issue Oct 7, 2018

add demo to simulate usage in a game engine #603

Merged

Rhuagh mentioned this issue Oct 12, 2018

High CPU usage in empty project amethyst/amethyst#780

Closed

bluss mentioned this issue Dec 9, 2018

ThreadPool::install and thread overhead #614

Closed

cuviper mentioned this issue Mar 5, 2019

Rayon uses a lot of CPU when there's not a lot of work to do #642

Closed

krankur mentioned this issue Jul 24, 2019

Frame limiting amethyst/space-menace#21

Open

Jasper-Bekkers closed this as completed Jun 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lazily create threads in ThreadPool #576

Lazily create threads in ThreadPool #576

Jasper-Bekkers commented Jun 7, 2018

randomPoison commented Jun 11, 2018

Xaeroxe commented Jun 11, 2018

cuviper commented Jun 11, 2018 •

edited

Loading

Xaeroxe commented Jun 11, 2018 •

edited

Loading

cuviper commented Jun 11, 2018

Xaeroxe commented Jun 11, 2018

cuviper commented Jun 11, 2018

Jasper-Bekkers commented Jun 13, 2018

nikomatsakis commented Sep 10, 2019

Jasper-Bekkers commented Jun 21, 2020

Lazily create threads in ThreadPool #576

Lazily create threads in ThreadPool #576

Comments

Jasper-Bekkers commented Jun 7, 2018

randomPoison commented Jun 11, 2018

Xaeroxe commented Jun 11, 2018

cuviper commented Jun 11, 2018 • edited Loading

Xaeroxe commented Jun 11, 2018 • edited Loading

cuviper commented Jun 11, 2018

Xaeroxe commented Jun 11, 2018

cuviper commented Jun 11, 2018

Jasper-Bekkers commented Jun 13, 2018

nikomatsakis commented Sep 10, 2019

Jasper-Bekkers commented Jun 21, 2020

cuviper commented Jun 11, 2018 •

edited

Loading

Xaeroxe commented Jun 11, 2018 •

edited

Loading