-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make inference profiling thread safe for julia 1.9+ #48051
Conversation
Inference itself does not spawn tasks, but the goal is to indeed allow for concurrent inference, so yes it is much harder to create a single inference profile. |
Mmm good to know! That's exciting! :) But yeah, it will mean we'll need to look at redesigning the inference profiling. 👍 But in the meantime, i guess this PR should be enough to make the profiling in 1.9 thread-safe, so we should merge this and backport it. Then we can start talking about what should be done for the new concurrent type inference. Ideally, the profiling changes would be done together with the compilation changes themselves that you're talking about. |
Concurrent inference is probably already occurring; the type inference lock has been removed for some time, so inference results from one thread might get used by another thread depending on when the cache is updated. |
Yeah but we're still on 1.8 at RAI so we probably haven't noticed this issue. When you say "for some time," you mean on 1.9+, yeah? That's why i filed this issue. I agree that without the inference lock, 1.9+ could be racing. 👍 Or are you saying that even before 1.9, there can be concurrent inference? |
Only in 1.9 will there be concurrent inference, but that means the inference profiler probably has to be redesigned for reporting concurrent inference in 1.9. I doubt we'd want to re-serialize inference or block 1.9 release on the profiler being redesigned, so it probably makes sense to notate this limitation somewhere in the release documentation. |
Okay yeah, cool! :) Well, i think this PR is already enough to correctly support concurrent inference in 1.9! That's what it's meant to do, anyway. The only change is switching away from a global working stack, and into a task-local storage working stack. I think that's enough to support concurrent inference. The final inference results vector is already thread safe as of #47615, so it can be pushed to from multiple concurrent tasks. (And they only push right at the end, so i don't think this would have a serious impact on the concurrency. it shouldn't serialize the inference, it would only grab a lock for the very tail end of an entire inference tree, pushing the single pointer onto the vector.) The diff for this PR is just this one change: So i think with this PR concurrent inference in 1.9 should be safe! Does that hold up in your understanding as well? Anything i'm missing? Thanks for engaging with me on these PRs! 😊 |
Co-authored-by: Nathan Daly <NHDaly@gmail.com>
Starting in 1.9, julia does not hold the codegen lock for all of typeinference! This means that we cannot use a single shared scratch-space for profiling type inference. This PR changes the scratch space to use Task Local Storage, so that we build a separate inference profile tree per Task doing inference, and then report them to the global results buffer at the end.
e0b6225
to
e161d91
Compare
I'm going to close this PR, then, since it is merged into #47615 now. |
Starting in 1.9, julia does not hold the codegen lock for all of typeinference! This means that we cannot use a single shared scratch-space for profiling type inference.
This PR changes the scratch space to use Task Local Storage, so that we build a separate inference profile tree per Task doing inference, and then report them to the global results buffer at the end.
This makes things thread safe as long as type inference itself does not spawn Tasks, or perform compilation across concurrent Tasks. If that changes in the future, we will need to change the entire inference profiling model, since it will no longer be a single depth-first inference tree per invocation of type inference. (CC: @pchintalapudi / @vchuravy)
This PR is based on top of #47615.
Until that PR merges, you can view a diff of the changes, here:
vilterp/julia@pv-compiler-timings-lock...JuliaLang:nhd-inference-profiling-thread-safe-1.9