Make inference profiling thread safe for julia 1.9+ #48051

NHDaly · 2022-12-30T19:51:05Z

Starting in 1.9, julia does not hold the codegen lock for all of typeinference! This means that we cannot use a single shared scratch-space for profiling type inference.

This PR changes the scratch space to use Task Local Storage, so that we build a separate inference profile tree per Task doing inference, and then report them to the global results buffer at the end.

This makes things thread safe as long as type inference itself does not spawn Tasks, or perform compilation across concurrent Tasks. If that changes in the future, we will need to change the entire inference profiling model, since it will no longer be a single depth-first inference tree per invocation of type inference. (CC: @pchintalapudi / @vchuravy)

This PR is based on top of #47615.

Until that PR merges, you can view a diff of the changes, here:
vilterp/julia@pv-compiler-timings-lock...JuliaLang:nhd-inference-profiling-thread-safe-1.9

vchuravy · 2022-12-31T12:15:12Z

This makes things thread safe as long as type inference itself does not spawn Tasks, or perform compilation across concurrent Tasks. If that changes in the future, we will need to change the entire inference profiling model, since it will no longer be a single depth-first inference tree per invocation of type inference.

Inference itself does not spawn tasks, but the goal is to indeed allow for concurrent inference, so yes it is much harder to create a single inference profile.

NHDaly · 2022-12-31T18:28:54Z

Mmm good to know! That's exciting! :)

But yeah, it will mean we'll need to look at redesigning the inference profiling. 👍
In #47615 we're already changing the shape of the inference profiling results into a vector of inference trees, rather than a single inference tree rooted at a fake ROOT node, to better reflect what's actually happening: several invocations of type inference. So maybe it won't be too hard to redesign this so that even a single invocation of type inference is itself a collection of several type inference trees spread over concurrent tasks? Or possibly some other design. I'm looking forward to thinking that through with you all! 😊

But in the meantime, i guess this PR should be enough to make the profiling in 1.9 thread-safe, so we should merge this and backport it. Then we can start talking about what should be done for the new concurrent type inference.

Ideally, the profiling changes would be done together with the compilation changes themselves that you're talking about.

pchintalapudi · 2023-01-03T20:23:15Z

Concurrent inference is probably already occurring; the type inference lock has been removed for some time, so inference results from one thread might get used by another thread depending on when the cache is updated.

NHDaly · 2023-01-04T00:07:44Z

Yeah but we're still on 1.8 at RAI so we probably haven't noticed this issue. When you say "for some time," you mean on 1.9+, yeah? That's why i filed this issue. I agree that without the inference lock, 1.9+ could be racing. 👍

Or are you saying that even before 1.9, there can be concurrent inference?

pchintalapudi · 2023-01-04T04:44:56Z

Only in 1.9 will there be concurrent inference, but that means the inference profiler probably has to be redesigned for reporting concurrent inference in 1.9. I doubt we'd want to re-serialize inference or block 1.9 release on the profiler being redesigned, so it probably makes sense to notate this limitation somewhere in the release documentation.

NHDaly · 2023-01-04T05:52:00Z

Okay yeah, cool! :)

Well, i think this PR is already enough to correctly support concurrent inference in 1.9! That's what it's meant to do, anyway. The only change is switching away from a global working stack, and into a task-local storage working stack. I think that's enough to support concurrent inference. The final inference results vector is already thread safe as of #47615, so it can be pushed to from multiple concurrent tasks. (And they only push right at the end, so i don't think this would have a serious impact on the concurrency. it shouldn't serialize the inference, it would only grab a lock for the very tail end of an entire inference tree, pushing the single pointer onto the vector.)

The diff for this PR is just this one change:
vilterp/julia@pv-compiler-timings-lock...JuliaLang:nhd-inference-profiling-thread-safe-1.9

So i think with this PR concurrent inference in 1.9 should be safe! Does that hold up in your understanding as well? Anything i'm missing? Thanks for engaging with me on these PRs! 😊

Co-authored-by: Nathan Daly <NHDaly@gmail.com>

base/compiler/typeinfer.jl

Starting in 1.9, julia does not hold the codegen lock for all of typeinference! This means that we cannot use a single shared scratch-space for profiling type inference. This PR changes the scratch space to use Task Local Storage, so that we build a separate inference profile tree per Task doing inference, and then report them to the global results buffer at the end.

NHDaly · 2023-01-18T23:36:45Z

I'm going to close this PR, then, since it is merged into #47615 now.

NHDaly added the backport 1.9 Change should be backported to release-1.9 label Dec 31, 2022

KristofferC mentioned this pull request Jan 2, 2023

Backports for julia 1.9.0-beta3 #48075

Merged

41 tasks

NHDaly mentioned this pull request Jan 5, 2023

Allow threadsafe access to buffer of type inference profiling trees #47615

Open

Allow threadsafe access to buffer ot type inference profiling trees

9e6be01

Co-authored-by: Nathan Daly <NHDaly@gmail.com>

NHDaly commented Jan 6, 2023

View reviewed changes

base/compiler/typeinfer.jl Outdated Show resolved Hide resolved

KristofferC mentioned this pull request Jan 17, 2023

Backports for Julia 1.9.0-beta4 #48311

Merged

35 tasks

NHDaly force-pushed the nhd-inference-profiling-thread-safe-1.9 branch from e0b6225 to e161d91 Compare January 18, 2023 22:29

NHDaly closed this Jan 18, 2023

giordano deleted the nhd-inference-profiling-thread-safe-1.9 branch January 19, 2023 00:09

KristofferC removed the backport 1.9 Change should be backported to release-1.9 label Jan 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make inference profiling thread safe for julia 1.9+ #48051

Make inference profiling thread safe for julia 1.9+ #48051

NHDaly commented Dec 30, 2022 •

edited

Loading

vchuravy commented Dec 31, 2022

NHDaly commented Dec 31, 2022

pchintalapudi commented Jan 3, 2023

NHDaly commented Jan 4, 2023

pchintalapudi commented Jan 4, 2023

NHDaly commented Jan 4, 2023

NHDaly commented Jan 18, 2023

Make inference profiling thread safe for julia 1.9+ #48051

Make inference profiling thread safe for julia 1.9+ #48051

Conversation

NHDaly commented Dec 30, 2022 • edited Loading

vchuravy commented Dec 31, 2022

NHDaly commented Dec 31, 2022

pchintalapudi commented Jan 3, 2023

NHDaly commented Jan 4, 2023

pchintalapudi commented Jan 4, 2023

NHDaly commented Jan 4, 2023

NHDaly commented Jan 18, 2023

NHDaly commented Dec 30, 2022 •

edited

Loading