-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
better early DCE #27547
Comments
The action item here is a better representation of the CFG and the ability to update the domtree (optional, but we should at least make sure to mark it as invalidated and recompute it if necessary). |
I just ran into this when puzzling over some Is the compiler now in a state where this would be fairly easy, given #28978 and other changes since then? |
If you want to take a look at this #37882 is probably the right starting point. |
Adds a very simple optimization pass to eliminate `typeassert` calls. The motivation is, when SROA replaces `getfield` calls with scalar values, then we can often prove `typeassert` whose first operand is a replaced value is no-op: ```julia julia> struct Foo; x; end julia> code_typed((Int,)) do a x1 = Foo(a) x2 = Foo(x1) typeassert(x2.x, Foo).x end |> only |> first CodeInfo( 1 ─ %1 = Main.Foo::Type{Foo} │ %2 = %new(%1, a)::Foo │ Main.typeassert(%2, Main.Foo)::Foo # can be nullified └── return a ) ``` Nullifying `typeassert` helps succeeding (simple) DCE to eliminate dead allocations, and also allows LLVM to do more aggressive DCE to emit simpler code. Here is a simple benchmarking: > sample target code: ```julia julia> function compute(T, n) r = 0 for i in 1:n x1 = T(i) x2 = T(x1) r += (x2.x::T).x::Int end r end compute (generic function with 1 method) julia> struct Foo; x; end julia> mutable struct Bar; x; end ``` > on master ```julia julia> @benchmark compute(Foo, 1000) BenchmarkTools.Trial: 10000 samples with 8 evaluations. Range (min … max): 3.263 μs … 145.828 μs ┊ GC (min … max): 0.00% … 97.14% Time (median): 3.516 μs ┊ GC (median): 0.00% Time (mean ± σ): 4.015 μs ± 3.726 μs ┊ GC (mean ± σ): 3.16% ± 3.46% ▇█▆▄▅▄▄▃▂▁▂▁ ▂ ▇███████████████▇██▇▇█▇▇▆▇▇▇▇▅▆▅▇▇▅██▇▇▆▇▇▇█▇█▇▇▅▆▆▆▆▅▅▅▅▄▄ █ 3.26 μs Histogram: log(frequency) by time 8.52 μs < Memory estimate: 7.64 KiB, allocs estimate: 489. julia> @benchmark compute(Bar, 1000) BenchmarkTools.Trial: 10000 samples with 4 evaluations. Range (min … max): 6.990 μs … 288.079 μs ┊ GC (min … max): 0.00% … 97.03% Time (median): 7.657 μs ┊ GC (median): 0.00% Time (mean ± σ): 9.019 μs ± 9.710 μs ┊ GC (mean ± σ): 4.59% ± 4.28% ▆█▆▄▃▂▂▁▂▃▂▁ ▁ ▁ ██████████████████████▇▇▇▇▇▆██████▇▇█▇▇▇▆▆▆▆▅▆▅▄▄▄▅▄▄▃▄▄▂▄▅ █ 6.99 μs Histogram: log(frequency) by time 20.7 μs < Memory estimate: 23.27 KiB, allocs estimate: 1489. ``` > on this branch ```julia julia> @benchmark compute(Foo, 1000) BenchmarkTools.Trial: 10000 samples with 1000 evaluations. Range (min … max): 1.234 ns … 116.188 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 1.246 ns ┊ GC (median): 0.00% Time (mean ± σ): 1.307 ns ± 1.444 ns ┊ GC (mean ± σ): 0.00% ± 0.00% █▇ ▂▂▁ ▂ ▁ ██████▇█▇▅▄▆▇▆▁▃▄▁▁▁▁▁▃▁▃▁▁▄▇▅▃▃▃▁▃▄▁▃▃▁▃▁▁▃▁▁▁▄▃▁▃▇███▇▇▇▆ █ 1.23 ns Histogram: log(frequency) by time 1.94 ns < Memory estimate: 0 bytes, allocs estimate: 0. julia> @benchmark compute(Bar, 1000) BenchmarkTools.Trial: 10000 samples with 1000 evaluations. Range (min … max): 1.234 ns … 33.790 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 1.245 ns ┊ GC (median): 0.00% Time (mean ± σ): 1.297 ns ± 0.677 ns ┊ GC (mean ± σ): 0.00% ± 0.00% █▇ ▃▂▁ ▁ ██████▆▆▅▁▄▅▅▄▁▄▄▄▃▄▃▁▃▁▃▄▃▁▃▁▃▁▁▁▃▃▁▃▃▁▁▁▁▁▁▁▃▁▄█████▇▇▇▇ █ 1.23 ns Histogram: log(frequency) by time 1.96 ns < Memory estimate: 0 bytes, allocs estimate: 0. ``` We may want to enable this `typeassert` elimination after we implement more aggressive SROA based on [escape analysis](https://github.com/aviatesk/EscapeAnalysis.jl) and [more aggressive Julia-level DCE](#27547), but since this pass is super simple I think it doesn't hurt things to have it for now.
Adds a very simple optimization pass to eliminate `typeassert` calls. The motivation is, when SROA replaces `getfield` calls with scalar values, then we can often prove `typeassert` whose first operand is a replaced value is no-op: ```julia julia> struct Foo; x; end julia> code_typed((Int,)) do a x1 = Foo(a) x2 = Foo(x1) typeassert(x2.x, Foo).x end |> only |> first CodeInfo( 1 ─ %1 = Main.Foo::Type{Foo} │ %2 = %new(%1, a)::Foo │ Main.typeassert(%2, Main.Foo)::Foo # can be nullified └── return a ) ``` Nullifying `typeassert` helps succeeding (simple) DCE to eliminate dead allocations, and also allows LLVM to do more aggressive DCE to emit simpler code. Here is a simple benchmarking: > sample target code: ```julia julia> function compute(T, n) r = 0 for i in 1:n x1 = T(i) x2 = T(x1) r += (x2.x::T).x::Int end r end compute (generic function with 1 method) julia> struct Foo; x; end julia> mutable struct Bar; x; end ``` > on master ```julia julia> @benchmark compute(Foo, 1000) BenchmarkTools.Trial: 10000 samples with 8 evaluations. Range (min … max): 3.263 μs … 145.828 μs ┊ GC (min … max): 0.00% … 97.14% Time (median): 3.516 μs ┊ GC (median): 0.00% Time (mean ± σ): 4.015 μs ± 3.726 μs ┊ GC (mean ± σ): 3.16% ± 3.46% ▇█▆▄▅▄▄▃▂▁▂▁ ▂ ▇███████████████▇██▇▇█▇▇▆▇▇▇▇▅▆▅▇▇▅██▇▇▆▇▇▇█▇█▇▇▅▆▆▆▆▅▅▅▅▄▄ █ 3.26 μs Histogram: log(frequency) by time 8.52 μs < Memory estimate: 7.64 KiB, allocs estimate: 489. julia> @benchmark compute(Bar, 1000) BenchmarkTools.Trial: 10000 samples with 4 evaluations. Range (min … max): 6.990 μs … 288.079 μs ┊ GC (min … max): 0.00% … 97.03% Time (median): 7.657 μs ┊ GC (median): 0.00% Time (mean ± σ): 9.019 μs ± 9.710 μs ┊ GC (mean ± σ): 4.59% ± 4.28% ▆█▆▄▃▂▂▁▂▃▂▁ ▁ ▁ ██████████████████████▇▇▇▇▇▆██████▇▇█▇▇▇▆▆▆▆▅▆▅▄▄▄▅▄▄▃▄▄▂▄▅ █ 6.99 μs Histogram: log(frequency) by time 20.7 μs < Memory estimate: 23.27 KiB, allocs estimate: 1489. ``` > on this branch ```julia julia> @benchmark compute(Foo, 1000) BenchmarkTools.Trial: 10000 samples with 1000 evaluations. Range (min … max): 1.234 ns … 116.188 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 1.246 ns ┊ GC (median): 0.00% Time (mean ± σ): 1.307 ns ± 1.444 ns ┊ GC (mean ± σ): 0.00% ± 0.00% █▇ ▂▂▁ ▂ ▁ ██████▇█▇▅▄▆▇▆▁▃▄▁▁▁▁▁▃▁▃▁▁▄▇▅▃▃▃▁▃▄▁▃▃▁▃▁▁▃▁▁▁▄▃▁▃▇███▇▇▇▆ █ 1.23 ns Histogram: log(frequency) by time 1.94 ns < Memory estimate: 0 bytes, allocs estimate: 0. julia> @benchmark compute(Bar, 1000) BenchmarkTools.Trial: 10000 samples with 1000 evaluations. Range (min … max): 1.234 ns … 33.790 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 1.245 ns ┊ GC (median): 0.00% Time (mean ± σ): 1.297 ns ± 0.677 ns ┊ GC (mean ± σ): 0.00% ± 0.00% █▇ ▃▂▁ ▁ ██████▆▆▅▁▄▅▅▄▁▄▄▄▃▄▃▁▃▁▃▄▃▁▃▁▃▁▁▁▃▃▁▃▃▁▁▁▁▁▁▁▃▁▄█████▇▇▇▇ █ 1.23 ns Histogram: log(frequency) by time 1.96 ns < Memory estimate: 0 bytes, allocs estimate: 0. ``` This `typeassert` elimination would be much more effective if we implement more aggressive SROA based on strong [alias analysis](https://github.com/aviatesk/EscapeAnalysis.jl) and/or [more aggressive Julia-level DCE](#27547). But this change is so simple that I don't think it hurts anything to have it for now.
Adds a very simple optimization pass to eliminate `typeassert` calls. The motivation is, when SROA replaces `getfield` calls with scalar values, then we can often prove `typeassert` whose first operand is a replaced value is no-op: ```julia julia> struct Foo; x; end julia> code_typed((Int,)) do a x1 = Foo(a) x2 = Foo(x1) typeassert(x2.x, Foo).x end |> only |> first CodeInfo( 1 ─ %1 = Main.Foo::Type{Foo} │ %2 = %new(%1, a)::Foo │ Main.typeassert(%2, Main.Foo)::Foo # can be nullified └── return a ) ``` Nullifying `typeassert` helps succeeding (simple) DCE to eliminate dead allocations, and also allows LLVM to do more aggressive DCE to emit simpler code. Here is a simple benchmarking: > sample target code: ```julia julia> function compute(T, n) r = 0 for i in 1:n x1 = T(i) x2 = T(x1) r += (x2.x::T).x::Int end r end compute (generic function with 1 method) julia> struct Foo; x; end julia> mutable struct Bar; x; end ``` > on master ```julia julia> @benchmark compute(Foo, 1000) BenchmarkTools.Trial: 10000 samples with 8 evaluations. Range (min … max): 3.263 μs … 145.828 μs ┊ GC (min … max): 0.00% … 97.14% Time (median): 3.516 μs ┊ GC (median): 0.00% Time (mean ± σ): 4.015 μs ± 3.726 μs ┊ GC (mean ± σ): 3.16% ± 3.46% ▇█▆▄▅▄▄▃▂▁▂▁ ▂ ▇███████████████▇██▇▇█▇▇▆▇▇▇▇▅▆▅▇▇▅██▇▇▆▇▇▇█▇█▇▇▅▆▆▆▆▅▅▅▅▄▄ █ 3.26 μs Histogram: log(frequency) by time 8.52 μs < Memory estimate: 7.64 KiB, allocs estimate: 489. julia> @benchmark compute(Bar, 1000) BenchmarkTools.Trial: 10000 samples with 4 evaluations. Range (min … max): 6.990 μs … 288.079 μs ┊ GC (min … max): 0.00% … 97.03% Time (median): 7.657 μs ┊ GC (median): 0.00% Time (mean ± σ): 9.019 μs ± 9.710 μs ┊ GC (mean ± σ): 4.59% ± 4.28% ▆█▆▄▃▂▂▁▂▃▂▁ ▁ ▁ ██████████████████████▇▇▇▇▇▆██████▇▇█▇▇▇▆▆▆▆▅▆▅▄▄▄▅▄▄▃▄▄▂▄▅ █ 6.99 μs Histogram: log(frequency) by time 20.7 μs < Memory estimate: 23.27 KiB, allocs estimate: 1489. ``` > on this branch ```julia julia> @benchmark compute(Foo, 1000) BenchmarkTools.Trial: 10000 samples with 1000 evaluations. Range (min … max): 1.234 ns … 116.188 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 1.246 ns ┊ GC (median): 0.00% Time (mean ± σ): 1.307 ns ± 1.444 ns ┊ GC (mean ± σ): 0.00% ± 0.00% █▇ ▂▂▁ ▂ ▁ ██████▇█▇▅▄▆▇▆▁▃▄▁▁▁▁▁▃▁▃▁▁▄▇▅▃▃▃▁▃▄▁▃▃▁▃▁▁▃▁▁▁▄▃▁▃▇███▇▇▇▆ █ 1.23 ns Histogram: log(frequency) by time 1.94 ns < Memory estimate: 0 bytes, allocs estimate: 0. julia> @benchmark compute(Bar, 1000) BenchmarkTools.Trial: 10000 samples with 1000 evaluations. Range (min … max): 1.234 ns … 33.790 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 1.245 ns ┊ GC (median): 0.00% Time (mean ± σ): 1.297 ns ± 0.677 ns ┊ GC (mean ± σ): 0.00% ± 0.00% █▇ ▃▂▁ ▁ ██████▆▆▅▁▄▅▅▄▁▄▄▄▃▄▃▁▃▁▃▄▃▁▃▁▃▁▁▁▃▃▁▃▃▁▁▁▁▁▁▁▃▁▄█████▇▇▇▇ █ 1.23 ns Histogram: log(frequency) by time 1.96 ns < Memory estimate: 0 bytes, allocs estimate: 0. ``` This `typeassert` elimination would be much more effective if we implement more aggressive SROA based on strong [alias analysis](https://github.com/aviatesk/EscapeAnalysis.jl) and/or [more aggressive Julia-level DCE](#27547). But this change is so simple that I don't think it hurts anything to have it for now.
Adds a very simple optimization pass to eliminate `typeassert` calls. The motivation is, when SROA replaces `getfield` calls with scalar values, then we can often prove `typeassert` whose first operand is a replaced value is no-op: ```julia julia> struct Foo; x; end julia> code_typed((Int,)) do a x1 = Foo(a) x2 = Foo(x1) typeassert(x2.x, Foo).x end |> only |> first CodeInfo( 1 ─ %1 = Main.Foo::Type{Foo} │ %2 = %new(%1, a)::Foo │ Main.typeassert(%2, Main.Foo)::Foo # can be nullified └── return a ) ``` Nullifying `typeassert` helps succeeding (simple) DCE to eliminate dead allocations, and also allows LLVM to do more aggressive DCE to emit simpler code. Here is a simple benchmarking: > sample target code: ```julia julia> function compute(T, n) r = 0 for i in 1:n x1 = T(i) x2 = T(x1) r += (x2.x::T).x::Int end r end compute (generic function with 1 method) julia> struct Foo; x; end julia> mutable struct Bar; x; end ``` > on master ```julia julia> @benchmark compute(Foo, 1000) BenchmarkTools.Trial: 10000 samples with 8 evaluations. Range (min … max): 3.263 μs … 145.828 μs ┊ GC (min … max): 0.00% … 97.14% Time (median): 3.516 μs ┊ GC (median): 0.00% Time (mean ± σ): 4.015 μs ± 3.726 μs ┊ GC (mean ± σ): 3.16% ± 3.46% ▇█▆▄▅▄▄▃▂▁▂▁ ▂ ▇███████████████▇██▇▇█▇▇▆▇▇▇▇▅▆▅▇▇▅██▇▇▆▇▇▇█▇█▇▇▅▆▆▆▆▅▅▅▅▄▄ █ 3.26 μs Histogram: log(frequency) by time 8.52 μs < Memory estimate: 7.64 KiB, allocs estimate: 489. julia> @benchmark compute(Bar, 1000) BenchmarkTools.Trial: 10000 samples with 4 evaluations. Range (min … max): 6.990 μs … 288.079 μs ┊ GC (min … max): 0.00% … 97.03% Time (median): 7.657 μs ┊ GC (median): 0.00% Time (mean ± σ): 9.019 μs ± 9.710 μs ┊ GC (mean ± σ): 4.59% ± 4.28% ▆█▆▄▃▂▂▁▂▃▂▁ ▁ ▁ ██████████████████████▇▇▇▇▇▆██████▇▇█▇▇▇▆▆▆▆▅▆▅▄▄▄▅▄▄▃▄▄▂▄▅ █ 6.99 μs Histogram: log(frequency) by time 20.7 μs < Memory estimate: 23.27 KiB, allocs estimate: 1489. ``` > on this branch ```julia julia> @benchmark compute(Foo, 1000) BenchmarkTools.Trial: 10000 samples with 1000 evaluations. Range (min … max): 1.234 ns … 116.188 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 1.246 ns ┊ GC (median): 0.00% Time (mean ± σ): 1.307 ns ± 1.444 ns ┊ GC (mean ± σ): 0.00% ± 0.00% █▇ ▂▂▁ ▂ ▁ ██████▇█▇▅▄▆▇▆▁▃▄▁▁▁▁▁▃▁▃▁▁▄▇▅▃▃▃▁▃▄▁▃▃▁▃▁▁▃▁▁▁▄▃▁▃▇███▇▇▇▆ █ 1.23 ns Histogram: log(frequency) by time 1.94 ns < Memory estimate: 0 bytes, allocs estimate: 0. julia> @benchmark compute(Bar, 1000) BenchmarkTools.Trial: 10000 samples with 1000 evaluations. Range (min … max): 1.234 ns … 33.790 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 1.245 ns ┊ GC (median): 0.00% Time (mean ± σ): 1.297 ns ± 0.677 ns ┊ GC (mean ± σ): 0.00% ± 0.00% █▇ ▃▂▁ ▁ ██████▆▆▅▁▄▅▅▄▁▄▄▄▃▄▃▁▃▁▃▄▃▁▃▁▃▁▁▁▃▃▁▃▃▁▁▁▁▁▁▁▃▁▄█████▇▇▇▇ █ 1.23 ns Histogram: log(frequency) by time 1.96 ns < Memory estimate: 0 bytes, allocs estimate: 0. ``` This `typeassert` elimination would be much more effective if we implement more aggressive SROA based on strong [alias analysis](https://github.com/aviatesk/EscapeAnalysis.jl) and/or [more aggressive Julia-level DCE](#27547). But this change is so simple that I don't think it hurts anything to have it for now.
Adds a very simple optimization pass to eliminate `typeassert` calls. The motivation is, when SROA replaces `getfield` calls with scalar values, then we can often prove `typeassert` whose first operand is a replaced value is no-op: ```julia julia> struct Foo; x; end julia> code_typed((Int,)) do a x1 = Foo(a) x2 = Foo(x1) typeassert(x2.x, Foo).x end |> only |> first CodeInfo( 1 ─ %1 = Main.Foo::Type{Foo} │ %2 = %new(%1, a)::Foo │ Main.typeassert(%2, Main.Foo)::Foo # can be nullified └── return a ) ``` Nullifying `typeassert` helps succeeding (simple) DCE to eliminate dead allocations, and also allows LLVM to do more aggressive DCE to emit simpler code. Here is a simple benchmarking: > sample target code: ```julia julia> function compute(T, n) r = 0 for i in 1:n x1 = T(i) x2 = T(x1) r += (x2.x::T).x::Int end r end compute (generic function with 1 method) julia> struct Foo; x; end julia> mutable struct Bar; x; end ``` > on master ```julia julia> @benchmark compute(Foo, 1000) BenchmarkTools.Trial: 10000 samples with 8 evaluations. Range (min … max): 3.263 μs … 145.828 μs ┊ GC (min … max): 0.00% … 97.14% Time (median): 3.516 μs ┊ GC (median): 0.00% Time (mean ± σ): 4.015 μs ± 3.726 μs ┊ GC (mean ± σ): 3.16% ± 3.46% ▇█▆▄▅▄▄▃▂▁▂▁ ▂ ▇███████████████▇██▇▇█▇▇▆▇▇▇▇▅▆▅▇▇▅██▇▇▆▇▇▇█▇█▇▇▅▆▆▆▆▅▅▅▅▄▄ █ 3.26 μs Histogram: log(frequency) by time 8.52 μs < Memory estimate: 7.64 KiB, allocs estimate: 489. julia> @benchmark compute(Bar, 1000) BenchmarkTools.Trial: 10000 samples with 4 evaluations. Range (min … max): 6.990 μs … 288.079 μs ┊ GC (min … max): 0.00% … 97.03% Time (median): 7.657 μs ┊ GC (median): 0.00% Time (mean ± σ): 9.019 μs ± 9.710 μs ┊ GC (mean ± σ): 4.59% ± 4.28% ▆█▆▄▃▂▂▁▂▃▂▁ ▁ ▁ ██████████████████████▇▇▇▇▇▆██████▇▇█▇▇▇▆▆▆▆▅▆▅▄▄▄▅▄▄▃▄▄▂▄▅ █ 6.99 μs Histogram: log(frequency) by time 20.7 μs < Memory estimate: 23.27 KiB, allocs estimate: 1489. ``` > on this branch ```julia julia> @benchmark compute(Foo, 1000) BenchmarkTools.Trial: 10000 samples with 1000 evaluations. Range (min … max): 1.234 ns … 116.188 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 1.246 ns ┊ GC (median): 0.00% Time (mean ± σ): 1.307 ns ± 1.444 ns ┊ GC (mean ± σ): 0.00% ± 0.00% █▇ ▂▂▁ ▂ ▁ ██████▇█▇▅▄▆▇▆▁▃▄▁▁▁▁▁▃▁▃▁▁▄▇▅▃▃▃▁▃▄▁▃▃▁▃▁▁▃▁▁▁▄▃▁▃▇███▇▇▇▆ █ 1.23 ns Histogram: log(frequency) by time 1.94 ns < Memory estimate: 0 bytes, allocs estimate: 0. julia> @benchmark compute(Bar, 1000) BenchmarkTools.Trial: 10000 samples with 1000 evaluations. Range (min … max): 1.234 ns … 33.790 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 1.245 ns ┊ GC (median): 0.00% Time (mean ± σ): 1.297 ns ± 0.677 ns ┊ GC (mean ± σ): 0.00% ± 0.00% █▇ ▃▂▁ ▁ ██████▆▆▅▁▄▅▅▄▁▄▄▄▃▄▃▁▃▁▃▄▃▁▃▁▃▁▁▁▃▃▁▃▃▁▁▁▁▁▁▁▃▁▄█████▇▇▇▇ █ 1.23 ns Histogram: log(frequency) by time 1.96 ns < Memory estimate: 0 bytes, allocs estimate: 0. ``` This `typeassert` elimination would be much more effective if we implement more aggressive SROA based on strong [alias analysis](https://github.com/aviatesk/EscapeAnalysis.jl) and/or [more aggressive Julia-level DCE](JuliaLang#27547). But this change is so simple that I don't think it hurts anything to have it for now.
Adds a very simple optimization pass to eliminate `typeassert` calls. The motivation is, when SROA replaces `getfield` calls with scalar values, then we can often prove `typeassert` whose first operand is a replaced value is no-op: ```julia julia> struct Foo; x; end julia> code_typed((Int,)) do a x1 = Foo(a) x2 = Foo(x1) typeassert(x2.x, Foo).x end |> only |> first CodeInfo( 1 ─ %1 = Main.Foo::Type{Foo} │ %2 = %new(%1, a)::Foo │ Main.typeassert(%2, Main.Foo)::Foo # can be nullified └── return a ) ``` Nullifying `typeassert` helps succeeding (simple) DCE to eliminate dead allocations, and also allows LLVM to do more aggressive DCE to emit simpler code. Here is a simple benchmarking: > sample target code: ```julia julia> function compute(T, n) r = 0 for i in 1:n x1 = T(i) x2 = T(x1) r += (x2.x::T).x::Int end r end compute (generic function with 1 method) julia> struct Foo; x; end julia> mutable struct Bar; x; end ``` > on master ```julia julia> @benchmark compute(Foo, 1000) BenchmarkTools.Trial: 10000 samples with 8 evaluations. Range (min … max): 3.263 μs … 145.828 μs ┊ GC (min … max): 0.00% … 97.14% Time (median): 3.516 μs ┊ GC (median): 0.00% Time (mean ± σ): 4.015 μs ± 3.726 μs ┊ GC (mean ± σ): 3.16% ± 3.46% ▇█▆▄▅▄▄▃▂▁▂▁ ▂ ▇███████████████▇██▇▇█▇▇▆▇▇▇▇▅▆▅▇▇▅██▇▇▆▇▇▇█▇█▇▇▅▆▆▆▆▅▅▅▅▄▄ █ 3.26 μs Histogram: log(frequency) by time 8.52 μs < Memory estimate: 7.64 KiB, allocs estimate: 489. julia> @benchmark compute(Bar, 1000) BenchmarkTools.Trial: 10000 samples with 4 evaluations. Range (min … max): 6.990 μs … 288.079 μs ┊ GC (min … max): 0.00% … 97.03% Time (median): 7.657 μs ┊ GC (median): 0.00% Time (mean ± σ): 9.019 μs ± 9.710 μs ┊ GC (mean ± σ): 4.59% ± 4.28% ▆█▆▄▃▂▂▁▂▃▂▁ ▁ ▁ ██████████████████████▇▇▇▇▇▆██████▇▇█▇▇▇▆▆▆▆▅▆▅▄▄▄▅▄▄▃▄▄▂▄▅ █ 6.99 μs Histogram: log(frequency) by time 20.7 μs < Memory estimate: 23.27 KiB, allocs estimate: 1489. ``` > on this branch ```julia julia> @benchmark compute(Foo, 1000) BenchmarkTools.Trial: 10000 samples with 1000 evaluations. Range (min … max): 1.234 ns … 116.188 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 1.246 ns ┊ GC (median): 0.00% Time (mean ± σ): 1.307 ns ± 1.444 ns ┊ GC (mean ± σ): 0.00% ± 0.00% █▇ ▂▂▁ ▂ ▁ ██████▇█▇▅▄▆▇▆▁▃▄▁▁▁▁▁▃▁▃▁▁▄▇▅▃▃▃▁▃▄▁▃▃▁▃▁▁▃▁▁▁▄▃▁▃▇███▇▇▇▆ █ 1.23 ns Histogram: log(frequency) by time 1.94 ns < Memory estimate: 0 bytes, allocs estimate: 0. julia> @benchmark compute(Bar, 1000) BenchmarkTools.Trial: 10000 samples with 1000 evaluations. Range (min … max): 1.234 ns … 33.790 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 1.245 ns ┊ GC (median): 0.00% Time (mean ± σ): 1.297 ns ± 0.677 ns ┊ GC (mean ± σ): 0.00% ± 0.00% █▇ ▃▂▁ ▁ ██████▆▆▅▁▄▅▅▄▁▄▄▄▃▄▃▁▃▁▃▄▃▁▃▁▃▁▁▁▃▃▁▃▃▁▁▁▁▁▁▁▃▁▄█████▇▇▇▇ █ 1.23 ns Histogram: log(frequency) by time 1.96 ns < Memory estimate: 0 bytes, allocs estimate: 0. ``` This `typeassert` elimination would be much more effective if we implement more aggressive SROA based on strong [alias analysis](https://github.com/aviatesk/EscapeAnalysis.jl) and/or [more aggressive Julia-level DCE](JuliaLang#27547). But this change is so simple that I don't think it hurts anything to have it for now.
It would be nice to be able to remove more dead code in the julia-level optimizer. Example case:
In this IR there are a couple redundant basic blocks (consisting only of a goto to the next block). There is also unreachable bounds error code (
goto 10 if not false
). LLVM can remove this code very easily, but it would still be useful for us to remove it (1) to cut down stored IR size, (2) for inlining heuristics, and (3) to spend less time lowering to LLVM.The text was updated successfully, but these errors were encountered: