-
Notifications
You must be signed in to change notification settings - Fork 506
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stealing jobs from an existing pool / scope #1052
Comments
I get the motivation, but I'm not really sure how hard it will be. I mostly worry about how to get the completion notifications working for jobs that were queued within the pool, but now have to deal with an "external" thread.
Is that by chance, because they were idle, or are they deliberately pinned to E cores? Because if you're pinning, I wonder if you could pin one of the rayon threads to the same as the "Web Content", so they naturally exchange the spun-up CPU at the OS level. Wild idea: what if "Web Content" just ran on a rayon thread in the first place, so it naturally participated in work-stealing? You'd want to make sure that's not sitting on top of any other work, but since you control the application, you should be able to ensure that's the first thing into the pool. |
It might also be interesting to make that an explicit opt-in during pool creation, e.g. Let me know if this approach sounds at all palatable to you! |
That seems by chance. We're experimenting with pinning / affinity tweaks, which seem like it could help as well indeed.
So, "web content" is the process' "main" thread. In the parent process that'd be the UI thread as well, so it seems like a rather big change.
That seems like an interesting approach that should work indeed! One potential issue is how do Would you be able to post a rayon patch with that |
Here's a proof of concept for |
@cuviper I gave that patch a shot in https://bugzilla.mozilla.org/show_bug.cgi?id=1836948 and it seems to work rather nicely! |
@emilio I'm not familiar with your perf system - which result should I be looking at there? I'd be interested in a new image like @jrmuizel showed too, if that can be recorded. One caution that I want to put in the docs is that the pool-building thread can only involve itself in There are also possible footguns here, like |
@cuviper: @jrmuizel posted some screenshots in the bug linked above. There are some parameters to tweak how much work the main thread should retain for itself, but this screenshot for example seems a lot better than the one above: the "Web Content" thread can wake up after running out of work again, and keeps the fast core running. Actually, benchmark results look promising even on desktop. I want to do some more profiling on Android, and specially try to understand why sometimes the main thread doesn't wake up when the workers aren't getting scheduled (my theory is that the workers are mid-processing some work, such that there's no pending work in the thread pool, but the kernel isn't scheduling them so there's no progress temporarily). But that seems like a side-issue that we can tweak, and over-all |
Would it be possible to create a rayon thread pool with a fixed number of extra workers that would be explicitly used by one thread at a time? Something along the lines of: let (pool, workers) = ThreadPoolBuilder().with_extra_workers(3);
let worker = wrokers.pop().unwrap();
worker.scope(|| {
items.par_iter().map( //...etc
}); The API I came up on the spot isn't particularly nice (if sound at all), but more generally it would be great to be able to let multiple threads get the benefits of what |
@nical I think that's getting into the realm where it might be better with lower-level building blocks. Like maybe with a Are you also imagining that the "worker" in your design could be returned and usable elsewhere after its scope? |
Yes, but I haven't looked into how feasible/expensive it would be to install/uninstall the thread pool during each scope. I feel that a crate could want to rely on the performance characteristics of having its thread(s) be able to steal jobs (in my experience that's very significant) but it's a difficult choice to make because the side effect of having only a single installed pool for any given thread is not innocuous. I'm being a bit nit-picky. My main motivation is that in another part of Firefox (WebRender) we use rayon thread pools and schedule work from multiple threads. I'd like to have them benefit from a "use_current"-like thing and that does not require moving the workers between threads in between scopes or having several pool register themselves on the same thread. |
Interesting -- I think we could potentially have an API around "floating"
|
@nical / @cuviper: Given it's not much work to implement (it's basically already there, and useful), and that it helps us fix the scheduling issues we've found, wdyt about getting On that note, if that seems reasonable and the current branch is not landable as-is, what's left? I'd be happy to help moving it forward if possible. Thanks. |
If there's agreement that it's good enough for now, and maintainable in the long term (seems ok) -- the function should probably be called |
As per suggestion in rayon-rs#1052.
See discussion in rayon-rs#1052. Closes rayon-rs#1052.
…thread. As per suggestion in rayon-rs#1052.
See discussion in rayon-rs#1052. Closes rayon-rs#1052.
…thread. As per suggestion in rayon-rs#1052.
See discussion in rayon-rs#1052. Closes rayon-rs#1052.
…thread. As per suggestion in rayon-rs#1052.
See discussion in rayon-rs#1052. Closes rayon-rs#1052.
…thread. As per suggestion in rayon-rs#1052.
See discussion in rayon-rs#1052. Closes rayon-rs#1052.
…thread. As per suggestion in rayon-rs#1052.
…thread. As per suggestion in rayon-rs#1052.
In Firefox, we're using
scope_fifo
to distribute a bunch of work into aThreadPool
. We've found that stopping our current thread to do other work in the thread pool is a regression in a variety of heterogeneous devices, because the scheduler will schedule the thread-pool in small cores for a while (see here for an analysis).Luckily, @rocallahan and @cuviper added
in_place_scope
/in_place_scope_fifo
(#844 and related), which (if I'm reading correctly) should allow us to keep the main thread busy at least for a while (so, spawn work into the threadpool, but keep doing work in the main thread).It seems, however, there would be cases where the main thread will eventually run out of work to do, and would go to sleep. It'd be nice if the main thread could then keep doing work by stealing work posted by the thread pool until the scope is gone.
It seems that should be doable, at a glance, but it might be significant work... I'm not familiar with Rayon internals all that much but I'd be willing to give it a shot with some guidance.
@cuviper does this seem like a reasonable use case? Do you know how hard would it be to provide a way of doing this?
Thanks!
cc: @bholley @jrmuizel @smaug----
The text was updated successfully, but these errors were encountered: