-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
optimizer: inline abstract union-split callsite #44512
Conversation
@nanosoldier |
Your benchmark job has completed - no performance regressions were detected. A full report can be found here. |
15f74c7
to
1725b3b
Compare
Ok, the @nanosoldier |
if handled_all_cases && revisit_idx !== nothing | ||
# If there's only one case that's not a dispatchtuple, we can | ||
# still unionsplit by visiting all the other cases first. | ||
# This is useful for code like: | ||
# foo(x::Int) = 1 | ||
# foo(@nospecialize(x::Any)) = 2 | ||
# where we where only a small number of specific dispatchable | ||
# cases are split off from an ::Any typed fallback. | ||
(i, j) = revisit_idx | ||
match = infos[i].results[j] | ||
handled_all_cases &= handle_match!(match, argtypes, flag, state, cases, true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. |
bf346fb
to
d76f2a4
Compare
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. |
d76f2a4
to
c72f482
Compare
c72f482
to
466b592
Compare
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. |
Are we sure that this actually works in general? Method matching is very complicated, e.g. when matching against diagonal signatures that I'm pretty sure the current isa-generating code will not handle properly. We should be able to split all cases that inference decided to unionsplit, but we'd have to remember what arguments it split on and then generate an appropriate isa nest for those. Unless I'm missing something. |
E.g., I just tried the simplest possible case I could think of here:
|
5cf29ed
to
b0eca9a
Compare
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. |
b0eca9a
to
2305a52
Compare
Thanks @Keno for pointing out that assertion is not valid.
I also added this comment describing the transformation of
Now I think we can move this forward once we confirm that this PR doesn't come with any regression, which may happen from performance penalty of |
2305a52
to
fb08dcb
Compare
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. |
I'm still quite skeptical that this works, because in general, method matching cannot necessarily be compiled down to an |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. |
@nanosoldier |
Your benchmark job has completed - no performance regressions were detected. A full report can be found here. |
Your package evaluation job has completed - possible new issues were detected. A full report can be found here. |
@nanosoldier |
Your package evaluation job has completed - possible new issues were detected. A full report can be found here. |
Your package evaluation job has completed - possible new issues were detected. A full report can be found here. |
@nanosoldier |
Your package evaluation job has completed - possible new issues were detected. A full report can be found here. |
The only real failure is from SquidGame.jl. Posted a reproducer here. |
7a30a51
to
c1306b9
Compare
c1306b9
to
7d4cc1f
Compare
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. |
Currently the optimizer handles abstract callsite only when there is a single dispatch candidate (in most cases), and so inlining and static-dispatch are prohibited when the callsite is union-split (in other word, union-split happens only when all the dispatch candidates are concrete). However, there are certain patterns of code (most notably our Julia-level compiler code) that inherently need to deal with abstract callsite. The following example is taken from `Core.Compiler` utility: ```julia julia> @inline isType(@nospecialize t) = isa(t, DataType) && t.name === Type.body.name isType (generic function with 1 method) julia> code_typed((Any,)) do x # abstract, but no union-split, successful inlining isType(x) end |> only CodeInfo( 1 ─ %1 = (x isa Main.DataType)::Bool └── goto #3 if not %1 2 ─ %3 = π (x, DataType) │ %4 = Base.getfield(%3, :name)::Core.TypeName │ %5 = Base.getfield(Type{T}, :name)::Core.TypeName │ %6 = (%4 === %5)::Bool └── goto #4 3 ─ goto #4 4 ┄ %9 = φ (#2 => %6, #3 => false)::Bool └── return %9 ) => Bool julia> code_typed((Union{Type,Nothing},)) do x # abstract, union-split, unsuccessful inlining isType(x) end |> only CodeInfo( 1 ─ %1 = (isa)(x, Nothing)::Bool └── goto #3 if not %1 2 ─ goto #4 3 ─ %4 = Main.isType(x)::Bool └── goto #4 4 ┄ %6 = φ (#2 => false, #3 => %4)::Bool └── return %6 ) => Bool ``` (note that this is a limitation of the inlining algorithm, and so any user-provided hints like callsite inlining annotation doesn't help here) This commit enables inlining and static dispatch for abstract union-split callsite. The core idea here is that we can simulate our dispatch semantics by generating `isa` checks in order of the specialities of dispatch candidates: ```julia julia> code_typed((Union{Type,Nothing},)) do x # union-split, unsuccessful inlining isType(x) end |> only CodeInfo( 1 ─ %1 = (isa)(x, Nothing)::Bool └── goto #3 if not %1 2 ─ goto #9 3 ─ %4 = (isa)(x, Type)::Bool └── goto #8 if not %4 4 ─ %6 = π (x, Type) │ %7 = (%6 isa Main.DataType)::Bool └── goto #6 if not %7 5 ─ %9 = π (%6, DataType) │ %10 = Base.getfield(%9, :name)::Core.TypeName │ %11 = Base.getfield(Type{T}, :name)::Core.TypeName │ %12 = (%10 === %11)::Bool └── goto #7 6 ─ goto #7 7 ┄ %15 = φ (#5 => %12, #6 => false)::Bool └── goto #9 8 ─ Core.throw(ErrorException("fatal error in type inference (type bound)"))::Union{} └── unreachable 9 ┄ %19 = φ (#2 => false, #7 => %15)::Bool └── return %19 ) => Bool ``` Inlining/static-dispatch of abstract union-split callsite will improve the performance in such situations (and so this commit will improve the latency of our JIT compilation). Especially, this commit helps us avoid excessive specializations of `Core.Compiler` code by statically-resolving `@nospecialize`d callsites, and as the result, the # of precompiled statements is now reduced from `1956` ([`master`](dc45d77)) to `1901` (this commit). And also, as a side effect, the implementation of our inlining algorithm gets much simplified now since we no longer need the previous special handlings for abstract callsites. One possible drawback would be increased code size. This change seems to certainly increase the size of sysimage, but I think these numbers are in an acceptable range: > [`master`](dc45d77) ``` ❯ du -sh usr/lib/julia/* 17M usr/lib/julia/corecompiler.ji 188M usr/lib/julia/sys-o.a 164M usr/lib/julia/sys.dylib 23M usr/lib/julia/sys.dylib.dSYM 101M usr/lib/julia/sys.ji ``` > this commit ``` ❯ du -sh usr/lib/julia/* 17M usr/lib/julia/corecompiler.ji 190M usr/lib/julia/sys-o.a 166M usr/lib/julia/sys.dylib 23M usr/lib/julia/sys.dylib.dSYM 102M usr/lib/julia/sys.ji ```
7d4cc1f
to
408c140
Compare
Going to merge this if CI runs successfully. |
Probably worth cross-referencing #45051 (comment) from here. |
Currently the optimizer handles abstract callsite only when there is a
single dispatch candidate (in most cases), and so inlining and static-dispatch
are prohibited when the callsite is union-split (in other word, union-split
happens only when all the dispatch candidates are concrete).
However, there are certain patterns of code (most notably our Julia-level compiler code)
that inherently need to deal with abstract callsite.
The following example is taken from
Core.Compiler
utility:(note that this is a limitation of the inlining algorithm, and so any
user-provided hints like callsite inlining annotation doesn't help here)
This commit enables inlining and static dispatch for abstract union-split callsite.
The core idea here is that we can simulate our dispatch semantics by
generating
isa
checks in order of the specialities of dispatch candidates:Inlining/static-dispatch of abstract union-split callsite will improve
the performance in such situations (and so this commit will improve the
latency of our JIT compilation). Especially, this commit helps us avoid
excessive specializations of
Core.Compiler
code by statically-resolving@nospecialize
d callsites, and as the result, the # of precompiledstatements is now reduced from
1956
(master
) to1901
(this commit).And also, as a side effect, the implementation of our inlining algorithm
gets much simplified now since we no longer need the previous special
handlings for abstract callsites.
One possible drawback would be increased code size.
This change seems to certainly increase the size of sysimage,
but I think these numbers are in an acceptable range: