-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow inlining methods with unmatched type parameters #45062
Conversation
c3da8e6
to
b2ea6d1
Compare
Re: partial reverting of #44512:
Looking further, it appeared that in the following IR, Expr(:call, Main.eltype, Core.Argument(n=2)),
Expr(:call, Main.oneunit, SSAValue(1)),
Expr(:call, Main.sqrt, SSAValue(2)),
Expr(:call, Main.typeof, SSAValue(3)),
Expr(:call, Main.promote_type, SSAValue(4), Main.Float32),
Core.ReturnNode(val=SSAValue(5))] we were choosing to inline This triggered Keno helped me to figure out that this was due to inlining not being able to discern which signature was right due to the By bringing back this branch that was removed in #44512, we were able to be a bit more fine-grained about when we could bypass validating sparams (as well as choosing not to inline when we had a direct match of |
@nanosoldier |
Your package evaluation job has completed - possible new issues were detected. A full report can be found here. |
@nanosoldier |
Your package evaluation job has completed - possible new issues were detected. A full report can be found here. |
Seems similar to the case listed above, but still investigating where a possibly incorrect inlining choice happens |
a8dbd6a
to
cc1f9e0
Compare
@nanosoldier |
Your package evaluation job has completed - possible new issues were detected. A full report can be found here. |
@nanosoldier |
@aviatesk Could I ask for your review here? Specifically on 3a6b36c where I revert some of #44512, as well as on 884e5ea, where I somewhat broadly restrict the cases in which we can perform this inlining. It feels like we could do better, but I'm not familiar enough with this part of the codebase to fully understand |
Your package evaluation job has completed - possible new issues were detected. A full report can be found here. |
Re: test failures in The My idea is that we can do something similar to |
Sure, I will look into this later today or the weekend. |
@nanosoldier |
test/compiler/inline.jl
Outdated
# basic tests for inlining of `apply_type` in the presence of unmatched type parameters | ||
f44656(x) = Val{x}() | ||
|
||
function g44656() | ||
for i = 1:10000 | ||
Base.donotdelete(Val{i}()) | ||
end | ||
end | ||
|
||
let srcs = (code_typed1(f44656, (Any,)), | ||
code_typed1(g44656)) | ||
for src in srcs | ||
@test count(isnew, src.code) == 1 | ||
@test count(iscall((src, Core.apply_type), src.code)) == 0 | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add more test cases. E.g. since this PR adds a change on the inlining pass for constant prop'ed callsite, we should have a corresponding test case.
#44512 removed the previous (somewhat random imho) special casings, and you can recover some if needed. But I'd like to revert it in a way at least we understand which optimizable cases handled by #44512 are now ignored, and which are still optimized. Rather as far as I understand I believe what essentially needed is the |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. |
3b7c5d8
to
5e92b7d
Compare
nonva_args = argexprs[1:end-1] | ||
va_arg = argexprs[end] | ||
tuple_call = Expr(:call, TOP_TUPLE, def, nonva_args...) | ||
tuple_type = tuple_tfunc(Any[argextype(arg, compact) for arg in nonva_args]) | ||
tupl = insert_node_here!(compact, NewInstruction(tuple_call, tuple_type, topline)) | ||
apply_iter_expr = Expr(:call, Core._apply_iterate, iterate, Core._compute_sparams, tupl, va_arg) | ||
sparam_vals = insert_node_here!(compact, | ||
effect_free(NewInstruction(apply_iter_expr, SimpleVector, topline))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems like it would add significant cost (and allocations and new DataType's) that were not present before inlining? are we sure this is profitable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prior to the fix_va_argexprs!
call that you have above, the argexprs list was required to be a known length and did not have or need a va_arg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in #46700.
elseif head === :foreigncall && isa(spvals, SimpleVector) | ||
@assert !isa(spsig, UnionAll) || !isempty(spvals) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be an assert? if it managed to get this far, but didn't run this fixup code here, it will generate corrupt code later and possibly segfault later at runtime
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming you're talking about the isa
? If so, I don't think so - there can be :foreigncall
s that got inlined from a callee that don't need the fixup here (though it's not harmful either).
@@ -720,6 +720,97 @@ function perform_lifting!(compact::IncrementalCompact, | |||
return stmt_val # N.B. should never happen | |||
end | |||
|
|||
function lift_svec_ref!(compact::IncrementalCompact, idx::Int, stmt::Expr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be removed and replaced with an inference tfunc
for _svec_ref
, since it is rather unfortunate (generally) when optimization ends up with a more precise answer than inference; it tends to confuse and annoy people, since it is usually less reliable and accurate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This lifts svec_ref(_compute_sparams(...), ...)
which is inserted by inlining and does not exist at inference time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems possible that the optimizer could have already lifted that static_parameter expr into a typeof
call inside the method, before it reached here. Is the optimizer not accurately marking the svec_ref
with the result type constant (if known from inference) when it inserts it? Still seems confusing that inference couldn't determine this static_parameter value when the IR was simple, but optimization could figure it out after the IR was made more complicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the optimizer knows the type of the static parameter, it shouldn't insert the svec_ref (it currently does on master, but that's a bug fixed in #46703). Note that the optimizer here is not figuring out the value of the static parameters, but merely an SSAValue that will contain that value at runtime, which is a different analysis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, okay, I thought this was attempting to forward the type. It makes more sense that this is attempting to lift the computation of T
through a call that just gets the parameter back directly, e.g. eltype(A{T}) -> T
m = m.val | ||
isa(m, Method) || return nothing | ||
# TODO: More general structural analysis of the intersection | ||
length(def.args) >= 3 || return nothing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this must be earlier in the function, before you access def.args[2]
i = findfirst(j->has_typevar(sig.parameters[j], tvar), 1:length(sig.parameters)) | ||
i === nothing && return nothing | ||
_any(j->has_typevar(sig.parameters[j], tvar), i+1:length(sig.parameters)) && return nothing | ||
|
||
arg = sig.parameters[i] | ||
isa(arg, DataType) || return nothing | ||
|
||
rarg = def.args[2 + i] | ||
isa(rarg, SSAValue) || return nothing | ||
argdef = compact[rarg][:inst] | ||
if isexpr(argdef, :new) | ||
rarg = argdef.args[1] | ||
isa(rarg, SSAValue) || return nothing | ||
argdef = compact[rarg][:inst] | ||
end | ||
|
||
is_known_call(argdef, Core.apply_type, compact) || return nothing | ||
length(argdef.args) == 3 || return nothing | ||
|
||
applyT = argextype(argdef.args[2], compact) | ||
isa(applyT, Const) || return nothing | ||
applyT = applyT.val | ||
|
||
isa(applyT, UnionAll) || return nothing | ||
applyTvar = applyT.var | ||
applyTbody = applyT.body | ||
|
||
isa(applyTbody, DataType) || return nothing | ||
applyTbody.name == arg.name || return nothing | ||
length(applyTbody.parameters) == length(arg.parameters) == 1 || return nothing | ||
applyTbody.parameters[1] === applyTvar || return nothing | ||
arg.parameters[1] === tvar || return nothing | ||
return argdef.args[3] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to think about this logic more, but this code needs substantial comment about the intent here. As written, I don't think the structural assumptions here are valid outside of trivial common cases (though it appears to try to limit itself to only the most trivial of trivial cases). But with so sparse comments, it is hard to review whether it is doing only what it intends and not including cases that it won't handle properly
end | ||
return | ||
elseif is_known_call(def, Core._compute_sparams, compact) | ||
res = _lift_svec_ref(def, compact) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how can you lift an svec_ref for valI
without including the valI
for the index of the value that you are referencing as an argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It currently assumes valI == 1
and that the svec is inbounds, but yeah, that should be checked (and generalized in the future).
|
||
if isa(vec, SimpleVector) | ||
if valI <= length(val) | ||
compact[idx] = vec[valI] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is missing a QuoteNode
`Core._svec_ref` has accepted `boundscheck`-value as the first argument since it was added in #45062. Nonetheless, `Core._svec_ref` simply calls `jl_svec_ref` in either the interpreter or the codegen, and thus the `boundscheck` value isn't utilized in any optimizations. Rather, even worse, this `boundscheck`-argument negatively influences the effect analysis (xref #50167 for details) and has caused type inference regressions as reported in #50544. For these reasons, this commit simply eliminates the `boundscheck` argument from `Core._svec_ref`. Consequently, `getindex(::SimpleVector, ::Int)` is now being concrete-eval eligible. closes #50544
`Core._svec_ref` has accepted `boundscheck`-value as the first argument since it was added in #45062. Nonetheless, `Core._svec_ref` simply calls `jl_svec_ref` in either the interpreter or the codegen, and thus the `boundscheck` value isn't utilized in any optimizations. Rather, even worse, this `boundscheck`-argument negatively influences the effect analysis (xref #50167 for details) and has caused type inference regressions as reported in #50544. For these reasons, this commit simply eliminates the `boundscheck` argument from `Core._svec_ref`. Consequently, `getindex(::SimpleVector, ::Int)` is now being concrete-eval eligible. closes #50544
`Core._svec_ref` has accepted `boundscheck`-value as the first argument since it was added in #45062. Nonetheless, `Core._svec_ref` simply calls `jl_svec_ref` in either the interpreter or the codegen, and thus the `boundscheck` value isn't utilized in any optimizations. Rather, even worse, this `boundscheck`-argument negatively influences the effect analysis (xref #50167 for details) and has caused type inference regressions as reported in #50544. For these reasons, this commit simply eliminates the `boundscheck` argument from `Core._svec_ref`. Consequently, `getindex(::SimpleVector, ::Int)` is now being concrete-eval eligible. closes #50544
`Core._svec_ref` has accepted `boundscheck`-value as the first argument since it was added in #45062. Nonetheless, `Core._svec_ref` simply calls `jl_svec_ref` in either the interpreter or the codegen, and thus the `boundscheck` value isn't utilized in any optimizations. Rather, even worse, this `boundscheck`-argument negatively influences the effect analysis (xref #50167 for details) and has caused type inference regressions as reported in #50544. For these reasons, this commit simply eliminates the `boundscheck` argument from `Core._svec_ref`. Consequently, `getindex(::SimpleVector, ::Int)` is now being concrete-eval eligible. closes #50544
`Core._svec_ref` has accepted `boundscheck`-value as the first argument since it was added in #45062. Nonetheless, `Core._svec_ref` simply calls `jl_svec_ref` in either the interpreter or the codegen, and thus the `boundscheck` value isn't utilized in any optimizations. Rather, even worse, this `boundscheck`-argument negatively influences the effect analysis (xref #50167 for details) and has caused type inference regressions as reported in #50544. For these reasons, this commit simply eliminates the `boundscheck` argument from `Core._svec_ref`. Consequently, `getindex(::SimpleVector, ::Int)` is now being concrete-eval eligible. closes #50544
The deleted branch was added in #45062, although it had not been tested. I tried the following diff to find cases optimized by that, but I just found the handling proved to be in vain in all cases I tried. ```diff diff --git a/base/compiler/ssair/inlining.jl b/base/compiler/ssair/inlining.jl index 318b21b09b..7e42a65aa4 100644 --- a/base/compiler/ssair/inlining.jl +++ b/base/compiler/ssair/inlining.jl @@ -1473,6 +1473,14 @@ function compute_inlining_cases(@nospecialize(info::CallInfo), flag::UInt32, sig handle_any_const_result!(cases, result, match, argtypes, info, flag, state; allow_abstract=true, allow_typevars=true) fully_covered = handled_all_cases = match.fully_covers + if length(cases) == 1 && fully_covered + println("first case: ", only_method) + elseif length(cases) == 1 + atype = argtypes_to_type(sig.argtypes) + if atype isa DataType && cases[1].sig isa DataType + println("second case: ", only_method) + end + end elseif !handled_all_cases # if we've not seen all candidates, union split is valid only for dispatch tuples filter!(case::InliningCase->isdispatchtuple(case.sig), cases) ```
The deleted branch was added in #45062, although it had not been tested. I tried the following diff to find cases optimized by that, but I just found the handling proved to be in vain in all cases I tried. ```diff diff --git a/base/compiler/ssair/inlining.jl b/base/compiler/ssair/inlining.jl index 318b21b09b..7e42a65aa4 100644 --- a/base/compiler/ssair/inlining.jl +++ b/base/compiler/ssair/inlining.jl @@ -1473,6 +1473,14 @@ function compute_inlining_cases(@nospecialize(info::CallInfo), flag::UInt32, sig handle_any_const_result!(cases, result, match, argtypes, info, flag, state; allow_abstract=true, allow_typevars=true) fully_covered = handled_all_cases = match.fully_covers + if length(cases) == 1 && fully_covered + println("first case: ", only_method) + elseif length(cases) == 1 + atype = argtypes_to_type(sig.argtypes) + if atype isa DataType && cases[1].sig isa DataType + println("second case: ", only_method) + end + end elseif !handled_all_cases # if we've not seen all candidates, union split is valid only for dispatch tuples filter!(case::InliningCase->isdispatchtuple(case.sig), cases) ```
The deleted branch was added in #45062, although it had not been tested. I tried the following diff to find cases optimized by that, but I just found the handling proved to be in vain in all cases I tried. ```diff diff --git a/base/compiler/ssair/inlining.jl b/base/compiler/ssair/inlining.jl index 318b21b09b..7e42a65aa4 100644 --- a/base/compiler/ssair/inlining.jl +++ b/base/compiler/ssair/inlining.jl @@ -1473,6 +1473,14 @@ function compute_inlining_cases(@nospecialize(info::CallInfo), flag::UInt32, sig handle_any_const_result!(cases, result, match, argtypes, info, flag, state; allow_abstract=true, allow_typevars=true) fully_covered = handled_all_cases = match.fully_covers + if length(cases) == 1 && fully_covered + println("first case: ", only_method) + elseif length(cases) == 1 + atype = argtypes_to_type(sig.argtypes) + if atype isa DataType && cases[1].sig isa DataType + println("second case: ", only_method) + end + end elseif !handled_all_cases # if we've not seen all candidates, union split is valid only for dispatch tuples filter!(case::InliningCase->isdispatchtuple(case.sig), cases) ```
The deleted branch was added in #45062, although it had not been tested. I tried the following diff to find cases optimized by that, but I just found the handling proved to be in vain in all cases I tried. ```diff diff --git a/base/compiler/ssair/inlining.jl b/base/compiler/ssair/inlining.jl index 318b21b09b..7e42a65aa4 100644 --- a/base/compiler/ssair/inlining.jl +++ b/base/compiler/ssair/inlining.jl @@ -1473,6 +1473,14 @@ function compute_inlining_cases(@nospecialize(info::CallInfo), flag::UInt32, sig handle_any_const_result!(cases, result, match, argtypes, info, flag, state; allow_abstract=true, allow_typevars=true) fully_covered = handled_all_cases = match.fully_covers + if length(cases) == 1 && fully_covered + println("first case: ", only_method) + elseif length(cases) == 1 + atype = argtypes_to_type(sig.argtypes) + if atype isa DataType && cases[1].sig isa DataType + println("second case: ", only_method) + end + end elseif !handled_all_cases # if we've not seen all candidates, union split is valid only for dispatch tuples filter!(case::InliningCase->isdispatchtuple(case.sig), cases) ```
The deleted branch was added in #45062, although it had not been tested. I tried the following diff to find cases optimized by that, but I just found the handling proved to be in vain in all cases I tried. ```diff diff --git a/base/compiler/ssair/inlining.jl b/base/compiler/ssair/inlining.jl index 318b21b09b..7e42a65aa4 100644 --- a/base/compiler/ssair/inlining.jl +++ b/base/compiler/ssair/inlining.jl @@ -1473,6 +1473,14 @@ function compute_inlining_cases(@nospecialize(info::CallInfo), flag::UInt32, sig handle_any_const_result!(cases, result, match, argtypes, info, flag, state; allow_abstract=true, allow_typevars=true) fully_covered = handled_all_cases = match.fully_covers + if length(cases) == 1 && fully_covered + println("first case: ", only_method) + elseif length(cases) == 1 + atype = argtypes_to_type(sig.argtypes) + if atype isa DataType && cases[1].sig isa DataType + println("second case: ", only_method) + end + end elseif !handled_all_cases # if we've not seen all candidates, union split is valid only for dispatch tuples filter!(case::InliningCase->isdispatchtuple(case.sig), cases) ```
Rebase of #44656 to solve some inference regressions in #44803
As Keno predicted, there are some regressions in this PR as well, including quite a few test suite failures that I will be fixing up.
Additionally, some needed future work would be to teach SROA how to fold
svec_ref
more generally, as it is currently very specificThe original description follows below.
Currently we do not allow inlining any methods that have unmatched
type parameters. The original reason for this restriction is that
I didn't really know what to put for an inlined :static_parameter,
so I just had inlining bail. As a result, in code like:
the call to
Val{x}()
would not be inlined unlessx
was knownthrough constant propagation.
This PR attempts to remidy that. A new builtin is added that computes
the static parameters for a given method/argument list. Additionally,
sroa gains the ability to simplify and fold this builtin. As a result,
inlining can insert an expression that computes the correct values
for the inlinees static parameters.
The change benchmarks favorably:
Before:
After:
Note that this particular benchmark could also be fixed by #44654,
but this change is more general.
There is a potential downside, which is that we remove a specialization
barrier here. We already do that in the case when all type parameters
are matched, so it's not eggregious. However, there is anectodal
knowledge in the community that extra type parameters force specialization.
Some of that is due to the handling of type parameters in the specialization
code, but some of it might also be due to inlining's prior refusal
to perform this inlining. We'll have to keep an eye out for any
regressions.