-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Propagate iteration info to optimizer #36684
Conversation
base/compiler/ssair/inlining.jl
Outdated
@@ -596,49 +596,74 @@ function spec_lambda(@nospecialize(atype), sv::OptimizationState, @nospecialize( | |||
end | |||
|
|||
# This assumes the caller has verified that all arguments to the _apply call are Tuples. | |||
function rewrite_apply_exprargs!(ir::IRCode, idx::Int, argexprs::Vector{Any}, atypes::Vector{Any}, arg_start::Int) | |||
function rewrite_apply_exprargs!(ir::IRCode, todo, idx::Int, argexprs::Vector{Any}, atypes::Vector{Any}, arginfos::Vector{Any}, arg_start::Int, sv) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type decls
base/compiler/ssair/inlining.jl
Outdated
else | ||
state = Core.svec() | ||
for i = 1:length(thisarginfo.each) | ||
mthd = thisarginfo.each[i] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mthd = thisarginfo.each[i] | |
meth = thisarginfo.each[i] |
Don't think we've ever used this mangling before
base/compiler/ssair/inlining.jl
Outdated
@@ -876,9 +901,22 @@ function call_sig(ir::IRCode, stmt::Expr) | |||
Signature(f, ft, atypes) | |||
end | |||
|
|||
function inline_apply!(ir::IRCode, idx::Int, sig::Signature, params::OptimizationParams) | |||
function inline_apply!(ir::IRCode, todo, idx::Int, sig::Signature, params::OptimizationParams, sv) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type decls
base/compiler/ssair/inlining.jl
Outdated
if i != length(thisarginfo.each) | ||
valT = getfield_tfunc(T, Const(1)) | ||
val_extracted = insert_node!(ir, idx, valT, | ||
Expr(:call, Core.getfield, state1, 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expr(:call, Core.getfield, state1, 1)) | |
Expr(:call, GlobalRef(Core, :getfield), state1, 1)) |
@@ -945,7 +990,7 @@ end | |||
# Handles all analysis and inlining of intrinsics and builtins. In particular, | |||
# this method does not access the method table or otherwise process generic | |||
# functions. | |||
function process_simple!(ir::IRCode, idx::Int, params::OptimizationParams, world::UInt) | |||
function process_simple!(ir::IRCode, todo, idx::Int, params::OptimizationParams, world::UInt, sv) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type decls
base/compiler/ssair/inlining.jl
Outdated
@@ -1010,13 +1055,95 @@ function recompute_method_matches(atype, sv) | |||
MethodMatchInfo(meth, ambig) | |||
end | |||
|
|||
function analyze_single_call!(ir, todo, idx, stmt, sig, calltype, infos, sv) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type decls
meth = info.applicable | ||
if meth === false || info.ambig | ||
# Too many applicable methods | ||
# Or there is a (partial?) ambiguity |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happened to this? (the comment specifically, the rest I know just moved)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it got lost in rebase. Will fix.
base/compiler/ssair/inlining.jl
Outdated
end | ||
for match in meth::Vector{Any} | ||
(metharg, methsp, method) = (match[1]::Type, match[2]::SimpleVector, match[3]::Method) | ||
# TODO: This could be better |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment should say how
function assemble_inline_todo!(ir::IRCode, sv::OptimizationState) | ||
# todo = (inline_idx, (isva, isinvoke, na), method, spvals, inline_linetable, inline_ir, lie) | ||
todo = Any[] | ||
skip = find_throw_blocks(ir.stmts.inst, RefValue(ir)) | ||
for idx in 1:length(ir.stmts) | ||
idx in skip && continue | ||
r = process_simple!(ir, idx, sv.params, sv.world) | ||
r = process_simple!(ir, todo, idx, sv.params, sv.world, sv) | ||
r === nothing && continue | ||
|
||
stmt = ir.stmts[idx][:inst] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory, the purpose of ir.stmts[idx]
existing as a type now is so that we don't need to pass idx, stmt, type, and so on as separate arguments in new code
end | ||
length(cases) == 0 && return | ||
push!(todo, UnionSplit(idx, fully_covered, sig.atype, cases)) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
end | |
nothing | |
end |
matches::Vector{MethodMatchInfo} | ||
end | ||
|
||
struct AbstractIterationInfo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add docs describing when each of these is legal to appear as the stmtinfo and what it means elsewhere to discover them on a statement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, with some minor nits
This supersedes #36169. Rather than re-implementing the iteration analysis as done there, this uses the new stmtinfo infrastrcture to propagate all the analysis done during inference all the way to inlining. As a result, it applies not only to splats of singletons, but also to splats of any other short iterable that inference can analyze. E.g.: ``` f(x) = (x...,) @code_typed f(1=>2) @benchmark f(1=>2) ``` Before: ``` julia> @code_typed f(1=>2) CodeInfo( 1 ─ %1 = Core._apply_iterate(Base.iterate, Core.tuple, x)::Tuple{Int64,Int64} └── return %1 ) => Tuple{Int64,Int64} julia> @benchmark f(1=>2) BenchmarkTools.Trial: memory estimate: 96 bytes allocs estimate: 3 -------------- minimum time: 242.659 ns (0.00% GC) median time: 246.904 ns (0.00% GC) mean time: 255.390 ns (1.08% GC) maximum time: 4.415 μs (93.94% GC) -------------- samples: 10000 evals/sample: 405 ``` After: ``` julia> @code_typed f(1=>2) CodeInfo( 1 ─ %1 = Base.getfield(x, 1)::Int64 │ %2 = Base.getfield(x, 2)::Int64 │ %3 = Core.tuple(%1, %2)::Tuple{Int64,Int64} └── return %3 ) => Tuple{Int64,Int64} julia> @benchmark f(1=>2) BenchmarkTools.Trial: memory estimate: 0 bytes allocs estimate: 0 -------------- minimum time: 1.701 ns (0.00% GC) median time: 1.925 ns (0.00% GC) mean time: 1.904 ns (0.00% GC) maximum time: 6.941 ns (0.00% GC) -------------- samples: 10000 evals/sample: 1000 ``` I also implemented the TODO, I had left in #36169 to inline the iterate calls themselves, which gives another 3x improvement over the solution in that PR: ``` julia> @code_typed f(1) CodeInfo( 1 ─ %1 = Core.tuple(x)::Tuple{Int64} └── return %1 ) => Tuple{Int64} julia> @benchmark f(1) BenchmarkTools.Trial: memory estimate: 0 bytes allocs estimate: 0 -------------- minimum time: 1.696 ns (0.00% GC) median time: 1.699 ns (0.00% GC) mean time: 1.702 ns (0.00% GC) maximum time: 5.389 ns (0.00% GC) -------------- samples: 10000 evals/sample: 1000 ``` Fixes #36087 Fixes #29114
Note to self that we can revert #29060 when this merges. |
Well I didn't get time to look at this in any detail, but absent any meaningful technical contribution I thought I'd leave a note of appreciation instead: thanks Keno for fixing this! Although we toyed with other ways to fix this in #29114, improving the optimizer does seem like the right thing. |
PR JuliaLang#36684 changes `iterate(IncrementalCompact)` to return an extra index, but leaves its arguments unchanged. However, the PR decremented the index argument in a particular recursive call to `iterate`. This caused `iterate` not to recognise that it was done when `allow_cfg_transforms` was turned on.
PR JuliaLang#36684 changes `iterate(IncrementalCompact)` to return an extra index, but leaves its arguments unchanged. However, the PR decremented the index argument in a particular recursive call to `iterate`. This caused `iterate` not to recognise that it was done when `allow_cfg_transforms` was turned on.
This supersedes JuliaLang#36169. Rather than re-implementing the iteration analysis as done there, this uses the new stmtinfo infrastrcture to propagate all the analysis done during inference all the way to inlining. As a result, it applies not only to splats of singletons, but also to splats of any other short iterable that inference can analyze. E.g.: ``` f(x) = (x...,) @code_typed f(1=>2) @benchmark f(1=>2) ``` Before: ``` julia> @code_typed f(1=>2) CodeInfo( 1 ─ %1 = Core._apply_iterate(Base.iterate, Core.tuple, x)::Tuple{Int64,Int64} └── return %1 ) => Tuple{Int64,Int64} julia> @benchmark f(1=>2) BenchmarkTools.Trial: memory estimate: 96 bytes allocs estimate: 3 -------------- minimum time: 242.659 ns (0.00% GC) median time: 246.904 ns (0.00% GC) mean time: 255.390 ns (1.08% GC) maximum time: 4.415 μs (93.94% GC) -------------- samples: 10000 evals/sample: 405 ``` After: ``` julia> @code_typed f(1=>2) CodeInfo( 1 ─ %1 = Base.getfield(x, 1)::Int64 │ %2 = Base.getfield(x, 2)::Int64 │ %3 = Core.tuple(%1, %2)::Tuple{Int64,Int64} └── return %3 ) => Tuple{Int64,Int64} julia> @benchmark f(1=>2) BenchmarkTools.Trial: memory estimate: 0 bytes allocs estimate: 0 -------------- minimum time: 1.701 ns (0.00% GC) median time: 1.925 ns (0.00% GC) mean time: 1.904 ns (0.00% GC) maximum time: 6.941 ns (0.00% GC) -------------- samples: 10000 evals/sample: 1000 ``` I also implemented the TODO, I had left in JuliaLang#36169 to inline the iterate calls themselves, which gives another 3x improvement over the solution in that PR: ``` julia> @code_typed f(1) CodeInfo( 1 ─ %1 = Core.tuple(x)::Tuple{Int64} └── return %1 ) => Tuple{Int64} julia> @benchmark f(1) BenchmarkTools.Trial: memory estimate: 0 bytes allocs estimate: 0 -------------- minimum time: 1.696 ns (0.00% GC) median time: 1.699 ns (0.00% GC) mean time: 1.702 ns (0.00% GC) maximum time: 5.389 ns (0.00% GC) -------------- samples: 10000 evals/sample: 1000 ``` Fixes JuliaLang#36087 Fixes JuliaLang#29114
PR #36684 changes `iterate(IncrementalCompact)` to return an extra index, but leaves its arguments unchanged. However, the PR decremented the index argument in a particular recursive call to `iterate`. This caused `iterate` not to recognise that it was done when `allow_cfg_transforms` was turned on.
PR #36684 changes `iterate(IncrementalCompact)` to return an extra index, but leaves its arguments unchanged. However, the PR decremented the index argument in a particular recursive call to `iterate`. This caused `iterate` not to recognise that it was done when `allow_cfg_transforms` was turned on.
PR JuliaLang#36684 changes `iterate(IncrementalCompact)` to return an extra index, but leaves its arguments unchanged. However, the PR decremented the index argument in a particular recursive call to `iterate`. This caused `iterate` not to recognise that it was done when `allow_cfg_transforms` was turned on.
PR #36684 changes `iterate(IncrementalCompact)` to return an extra index, but leaves its arguments unchanged. However, the PR decremented the index argument in a particular recursive call to `iterate`. This caused `iterate` not to recognise that it was done when `allow_cfg_transforms` was turned on.
This supersedes #36169. Rather than re-implementing the iteration
analysis as done there, this uses the new stmtinfo infrastrcture
to propagate all the analysis done during inference all the way
to inlining. As a result, it applies not only to splats of
singletons, but also to splats of any other short iterable
that inference can analyze. E.g.:
Before:
After:
I also implemented the TODO, I had left in #36169 to inline
the iterate calls themselves, which gives another 3x
improvement over the solution in that PR:
Fixes #36087
Fixes #29114