better early DCE #27547

JeffBezanson · 2018-06-12T20:59:41Z

It would be nice to be able to remove more dead code in the julia-level optimizer. Example case:

julia> function f(A)
           @inbounds for I in 1:length(A)
               view(A, I)
           end
       end
f (generic function with 1 method)

julia> @code_typed f([1])
CodeInfo(
2 1 ── %1  = Base.arraylen(%%A)::Int64                          │╻          length
  │    %2  = Base.sle_int(1, %1)::Bool                          ││╻╷╷╷       Type
  │          Base.sub_int(%1, 1)                                │││╻          unitrange_last
  │    %4  = Base.ifelse(%2, %1, 0)::Int64                      ││││       
  │    %5  = Base.slt_int(%4, 1)::Bool                          ││╻╷╷        isempty
  └───       goto 3 if not %5                                   ││         
  2 ──       goto 4                                             ││         
  3 ──       goto 4                                             ││         
  4 ┄─ %9  = φ (2 => true, 3 => false)::Bool                    │          
  │    %10 = φ (3 => 1)::Int64                                  │          
  │    %11 = φ (3 => 1)::Int64                                  │          
  │    %12 = Base.not_int(%9)::Bool                             │          
  └───       goto 16 if not %12                                 │          
  5 ┄─ %14 = φ (4 => %10, 15 => %35)::Int64                     │          
  │    %15 = φ (4 => %11, 15 => %36)::Int64                     │          
3 └───       goto 10 if not false                               │╻          view
  6 ── %17 = Core.tuple(%14)::Tuple{Int64}                      ││         
  │    %18 = Base.arraysize(%%A, 1)::Int64                      ││╻╷╷╷╷      checkbounds
  │    %19 = Base.slt_int(%18, 0)::Bool                         │││╻╷╷╷       checkbounds
  │    %20 = Base.ifelse(%19, 0, %18)::Int64                    ││││┃││││││    eachindex
  │    %21 = Base.sle_int(1, %14)::Bool                         │││││╻          <=
  │    %22 = Base.sle_int(%14, %20)::Bool                       ││││││     
  │    %23 = Base.and_int(%21, %22)::Bool                       │││││╻          &
  └───       goto 8 if not %23                                  │││        
  7 ──       goto 9                                             │││        
  8 ──       invoke Base.throw_boundserror(%%A::Array{Int64,1}, %17::Tuple{Int64})
  └───       unreachable                                        │││        
  9 ──       nothing                                            │          
  10 ┄       goto 11                                            │╻          view
  11 ─ %30 = Base.:===(%15, %4)::Bool                           ││╻          ==
  └───       goto 13 if not %30                                 ││         
  12 ─       goto 14                                            ││         
  13 ─ %33 = Base.add_int(%15, 1)::Int64                        ││╻          +
  └───       goto 14                                            │╻          iterate
  14 ┄ %35 = φ (13 => %33)::Int64                               │          
  │    %36 = φ (13 => %33)::Int64                               │          
  │    %37 = φ (12 => true, 13 => false)::Bool                  │          
  │    %38 = Base.not_int(%37)::Bool                            │          
  └───       goto 16 if not %38                                 │          
  15 ─       goto 5                                             │          
  16 ┄       return nothing                                     │          
) => Nothing

In this IR there are a couple redundant basic blocks (consisting only of a goto to the next block). There is also unreachable bounds error code (goto 10 if not false). LLVM can remove this code very easily, but it would still be useful for us to remove it (1) to cut down stored IR size, (2) for inlining heuristics, and (3) to spend less time lowering to LLVM.

The text was updated successfully, but these errors were encountered:

Keno · 2018-06-12T21:50:12Z

The action item here is a better representation of the CFG and the ability to update the domtree (optional, but we should at least make sure to mark it as invalidated and recompute it if necessary).

c42f · 2020-11-04T07:33:20Z

I just ran into this when puzzling over some StaticArrays code — I somehow expected the @boundscheck checkbounds(v,i) to be removed on the Julia side when viewing output of @code_typed.

Is the compiler now in a state where this would be fairly easy, given #28978 and other changes since then?

vchuravy · 2020-11-04T16:36:36Z

If you want to take a look at this #37882 is probably the right starting point.

@benchmark

Adds a very simple optimization pass to eliminate `typeassert` calls. The motivation is, when SROA replaces `getfield` calls with scalar values, then we can often prove `typeassert` whose first operand is a replaced value is no-op: ```julia julia> struct Foo; x; end julia> code_typed((Int,)) do a x1 = Foo(a) x2 = Foo(x1) typeassert(x2.x, Foo).x end |> only |> first CodeInfo( 1 ─ %1 = Main.Foo::Type{Foo} │ %2 = %new(%1, a)::Foo │ Main.typeassert(%2, Main.Foo)::Foo # can be nullified └── return a ) ``` Nullifying `typeassert` helps succeeding (simple) DCE to eliminate dead allocations, and also allows LLVM to do more aggressive DCE to emit simpler code. Here is a simple benchmarking: > sample target code: ```julia julia> function compute(T, n) r = 0 for i in 1:n x1 = T(i) x2 = T(x1) r += (x2.x::T).x::Int end r end compute (generic function with 1 method) julia> struct Foo; x; end julia> mutable struct Bar; x; end ``` > on master ```julia julia> @benchmark compute(Foo, 1000) BenchmarkTools.Trial: 10000 samples with 8 evaluations. Range (min … max): 3.263 μs … 145.828 μs ┊ GC (min … max): 0.00% … 97.14% Time (median): 3.516 μs ┊ GC (median): 0.00% Time (mean ± σ): 4.015 μs ± 3.726 μs ┊ GC (mean ± σ): 3.16% ± 3.46% ▇█▆▄▅▄▄▃▂▁▂▁ ▂ ▇███████████████▇██▇▇█▇▇▆▇▇▇▇▅▆▅▇▇▅██▇▇▆▇▇▇█▇█▇▇▅▆▆▆▆▅▅▅▅▄▄ █ 3.26 μs Histogram: log(frequency) by time 8.52 μs < Memory estimate: 7.64 KiB, allocs estimate: 489. julia> @benchmark compute(Bar, 1000) BenchmarkTools.Trial: 10000 samples with 4 evaluations. Range (min … max): 6.990 μs … 288.079 μs ┊ GC (min … max): 0.00% … 97.03% Time (median): 7.657 μs ┊ GC (median): 0.00% Time (mean ± σ): 9.019 μs ± 9.710 μs ┊ GC (mean ± σ): 4.59% ± 4.28% ▆█▆▄▃▂▂▁▂▃▂▁ ▁ ▁ ██████████████████████▇▇▇▇▇▆██████▇▇█▇▇▇▆▆▆▆▅▆▅▄▄▄▅▄▄▃▄▄▂▄▅ █ 6.99 μs Histogram: log(frequency) by time 20.7 μs < Memory estimate: 23.27 KiB, allocs estimate: 1489. ``` > on this branch ```julia julia> @benchmark compute(Foo, 1000) BenchmarkTools.Trial: 10000 samples with 1000 evaluations. Range (min … max): 1.234 ns … 116.188 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 1.246 ns ┊ GC (median): 0.00% Time (mean ± σ): 1.307 ns ± 1.444 ns ┊ GC (mean ± σ): 0.00% ± 0.00% █▇ ▂▂▁ ▂ ▁ ██████▇█▇▅▄▆▇▆▁▃▄▁▁▁▁▁▃▁▃▁▁▄▇▅▃▃▃▁▃▄▁▃▃▁▃▁▁▃▁▁▁▄▃▁▃▇███▇▇▇▆ █ 1.23 ns Histogram: log(frequency) by time 1.94 ns < Memory estimate: 0 bytes, allocs estimate: 0. julia> @benchmark compute(Bar, 1000) BenchmarkTools.Trial: 10000 samples with 1000 evaluations. Range (min … max): 1.234 ns … 33.790 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 1.245 ns ┊ GC (median): 0.00% Time (mean ± σ): 1.297 ns ± 0.677 ns ┊ GC (mean ± σ): 0.00% ± 0.00% █▇ ▃▂▁ ▁ ██████▆▆▅▁▄▅▅▄▁▄▄▄▃▄▃▁▃▁▃▄▃▁▃▁▃▁▁▁▃▃▁▃▃▁▁▁▁▁▁▁▃▁▄█████▇▇▇▇ █ 1.23 ns Histogram: log(frequency) by time 1.96 ns < Memory estimate: 0 bytes, allocs estimate: 0. ``` We may want to enable this `typeassert` elimination after we implement more aggressive SROA based on [escape analysis](https://github.com/aviatesk/EscapeAnalysis.jl) and [more aggressive Julia-level DCE](#27547), but since this pass is super simple I think it doesn't hurt things to have it for now.

@benchmark

Adds a very simple optimization pass to eliminate `typeassert` calls. The motivation is, when SROA replaces `getfield` calls with scalar values, then we can often prove `typeassert` whose first operand is a replaced value is no-op: ```julia julia> struct Foo; x; end julia> code_typed((Int,)) do a x1 = Foo(a) x2 = Foo(x1) typeassert(x2.x, Foo).x end |> only |> first CodeInfo( 1 ─ %1 = Main.Foo::Type{Foo} │ %2 = %new(%1, a)::Foo │ Main.typeassert(%2, Main.Foo)::Foo # can be nullified └── return a ) ``` Nullifying `typeassert` helps succeeding (simple) DCE to eliminate dead allocations, and also allows LLVM to do more aggressive DCE to emit simpler code. Here is a simple benchmarking: > sample target code: ```julia julia> function compute(T, n) r = 0 for i in 1:n x1 = T(i) x2 = T(x1) r += (x2.x::T).x::Int end r end compute (generic function with 1 method) julia> struct Foo; x; end julia> mutable struct Bar; x; end ``` > on master ```julia julia> @benchmark compute(Foo, 1000) BenchmarkTools.Trial: 10000 samples with 8 evaluations. Range (min … max): 3.263 μs … 145.828 μs ┊ GC (min … max): 0.00% … 97.14% Time (median): 3.516 μs ┊ GC (median): 0.00% Time (mean ± σ): 4.015 μs ± 3.726 μs ┊ GC (mean ± σ): 3.16% ± 3.46% ▇█▆▄▅▄▄▃▂▁▂▁ ▂ ▇███████████████▇██▇▇█▇▇▆▇▇▇▇▅▆▅▇▇▅██▇▇▆▇▇▇█▇█▇▇▅▆▆▆▆▅▅▅▅▄▄ █ 3.26 μs Histogram: log(frequency) by time 8.52 μs < Memory estimate: 7.64 KiB, allocs estimate: 489. julia> @benchmark compute(Bar, 1000) BenchmarkTools.Trial: 10000 samples with 4 evaluations. Range (min … max): 6.990 μs … 288.079 μs ┊ GC (min … max): 0.00% … 97.03% Time (median): 7.657 μs ┊ GC (median): 0.00% Time (mean ± σ): 9.019 μs ± 9.710 μs ┊ GC (mean ± σ): 4.59% ± 4.28% ▆█▆▄▃▂▂▁▂▃▂▁ ▁ ▁ ██████████████████████▇▇▇▇▇▆██████▇▇█▇▇▇▆▆▆▆▅▆▅▄▄▄▅▄▄▃▄▄▂▄▅ █ 6.99 μs Histogram: log(frequency) by time 20.7 μs < Memory estimate: 23.27 KiB, allocs estimate: 1489. ``` > on this branch ```julia julia> @benchmark compute(Foo, 1000) BenchmarkTools.Trial: 10000 samples with 1000 evaluations. Range (min … max): 1.234 ns … 116.188 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 1.246 ns ┊ GC (median): 0.00% Time (mean ± σ): 1.307 ns ± 1.444 ns ┊ GC (mean ± σ): 0.00% ± 0.00% █▇ ▂▂▁ ▂ ▁ ██████▇█▇▅▄▆▇▆▁▃▄▁▁▁▁▁▃▁▃▁▁▄▇▅▃▃▃▁▃▄▁▃▃▁▃▁▁▃▁▁▁▄▃▁▃▇███▇▇▇▆ █ 1.23 ns Histogram: log(frequency) by time 1.94 ns < Memory estimate: 0 bytes, allocs estimate: 0. julia> @benchmark compute(Bar, 1000) BenchmarkTools.Trial: 10000 samples with 1000 evaluations. Range (min … max): 1.234 ns … 33.790 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 1.245 ns ┊ GC (median): 0.00% Time (mean ± σ): 1.297 ns ± 0.677 ns ┊ GC (mean ± σ): 0.00% ± 0.00% █▇ ▃▂▁ ▁ ██████▆▆▅▁▄▅▅▄▁▄▄▄▃▄▃▁▃▁▃▄▃▁▃▁▃▁▁▁▃▃▁▃▃▁▁▁▁▁▁▁▃▁▄█████▇▇▇▇ █ 1.23 ns Histogram: log(frequency) by time 1.96 ns < Memory estimate: 0 bytes, allocs estimate: 0. ``` This `typeassert` elimination would be much more effective if we implement more aggressive SROA based on strong [alias analysis](https://github.com/aviatesk/EscapeAnalysis.jl) and/or [more aggressive Julia-level DCE](#27547). But this change is so simple that I don't think it hurts anything to have it for now.

@benchmark

Adds a very simple optimization pass to eliminate `typeassert` calls. The motivation is, when SROA replaces `getfield` calls with scalar values, then we can often prove `typeassert` whose first operand is a replaced value is no-op: ```julia julia> struct Foo; x; end julia> code_typed((Int,)) do a x1 = Foo(a) x2 = Foo(x1) typeassert(x2.x, Foo).x end |> only |> first CodeInfo( 1 ─ %1 = Main.Foo::Type{Foo} │ %2 = %new(%1, a)::Foo │ Main.typeassert(%2, Main.Foo)::Foo # can be nullified └── return a ) ``` Nullifying `typeassert` helps succeeding (simple) DCE to eliminate dead allocations, and also allows LLVM to do more aggressive DCE to emit simpler code. Here is a simple benchmarking: > sample target code: ```julia julia> function compute(T, n) r = 0 for i in 1:n x1 = T(i) x2 = T(x1) r += (x2.x::T).x::Int end r end compute (generic function with 1 method) julia> struct Foo; x; end julia> mutable struct Bar; x; end ``` > on master ```julia julia> @benchmark compute(Foo, 1000) BenchmarkTools.Trial: 10000 samples with 8 evaluations. Range (min … max): 3.263 μs … 145.828 μs ┊ GC (min … max): 0.00% … 97.14% Time (median): 3.516 μs ┊ GC (median): 0.00% Time (mean ± σ): 4.015 μs ± 3.726 μs ┊ GC (mean ± σ): 3.16% ± 3.46% ▇█▆▄▅▄▄▃▂▁▂▁ ▂ ▇███████████████▇██▇▇█▇▇▆▇▇▇▇▅▆▅▇▇▅██▇▇▆▇▇▇█▇█▇▇▅▆▆▆▆▅▅▅▅▄▄ █ 3.26 μs Histogram: log(frequency) by time 8.52 μs < Memory estimate: 7.64 KiB, allocs estimate: 489. julia> @benchmark compute(Bar, 1000) BenchmarkTools.Trial: 10000 samples with 4 evaluations. Range (min … max): 6.990 μs … 288.079 μs ┊ GC (min … max): 0.00% … 97.03% Time (median): 7.657 μs ┊ GC (median): 0.00% Time (mean ± σ): 9.019 μs ± 9.710 μs ┊ GC (mean ± σ): 4.59% ± 4.28% ▆█▆▄▃▂▂▁▂▃▂▁ ▁ ▁ ██████████████████████▇▇▇▇▇▆██████▇▇█▇▇▇▆▆▆▆▅▆▅▄▄▄▅▄▄▃▄▄▂▄▅ █ 6.99 μs Histogram: log(frequency) by time 20.7 μs < Memory estimate: 23.27 KiB, allocs estimate: 1489. ``` > on this branch ```julia julia> @benchmark compute(Foo, 1000) BenchmarkTools.Trial: 10000 samples with 1000 evaluations. Range (min … max): 1.234 ns … 116.188 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 1.246 ns ┊ GC (median): 0.00% Time (mean ± σ): 1.307 ns ± 1.444 ns ┊ GC (mean ± σ): 0.00% ± 0.00% █▇ ▂▂▁ ▂ ▁ ██████▇█▇▅▄▆▇▆▁▃▄▁▁▁▁▁▃▁▃▁▁▄▇▅▃▃▃▁▃▄▁▃▃▁▃▁▁▃▁▁▁▄▃▁▃▇███▇▇▇▆ █ 1.23 ns Histogram: log(frequency) by time 1.94 ns < Memory estimate: 0 bytes, allocs estimate: 0. julia> @benchmark compute(Bar, 1000) BenchmarkTools.Trial: 10000 samples with 1000 evaluations. Range (min … max): 1.234 ns … 33.790 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 1.245 ns ┊ GC (median): 0.00% Time (mean ± σ): 1.297 ns ± 0.677 ns ┊ GC (mean ± σ): 0.00% ± 0.00% █▇ ▃▂▁ ▁ ██████▆▆▅▁▄▅▅▄▁▄▄▄▃▄▃▁▃▁▃▄▃▁▃▁▃▁▁▁▃▃▁▃▃▁▁▁▁▁▁▁▃▁▄█████▇▇▇▇ █ 1.23 ns Histogram: log(frequency) by time 1.96 ns < Memory estimate: 0 bytes, allocs estimate: 0. ``` This `typeassert` elimination would be much more effective if we implement more aggressive SROA based on strong [alias analysis](https://github.com/aviatesk/EscapeAnalysis.jl) and/or [more aggressive Julia-level DCE](#27547). But this change is so simple that I don't think it hurts anything to have it for now.

@benchmark

Adds a very simple optimization pass to eliminate `typeassert` calls. The motivation is, when SROA replaces `getfield` calls with scalar values, then we can often prove `typeassert` whose first operand is a replaced value is no-op: ```julia julia> struct Foo; x; end julia> code_typed((Int,)) do a x1 = Foo(a) x2 = Foo(x1) typeassert(x2.x, Foo).x end |> only |> first CodeInfo( 1 ─ %1 = Main.Foo::Type{Foo} │ %2 = %new(%1, a)::Foo │ Main.typeassert(%2, Main.Foo)::Foo # can be nullified └── return a ) ``` Nullifying `typeassert` helps succeeding (simple) DCE to eliminate dead allocations, and also allows LLVM to do more aggressive DCE to emit simpler code. Here is a simple benchmarking: > sample target code: ```julia julia> function compute(T, n) r = 0 for i in 1:n x1 = T(i) x2 = T(x1) r += (x2.x::T).x::Int end r end compute (generic function with 1 method) julia> struct Foo; x; end julia> mutable struct Bar; x; end ``` > on master ```julia julia> @benchmark compute(Foo, 1000) BenchmarkTools.Trial: 10000 samples with 8 evaluations. Range (min … max): 3.263 μs … 145.828 μs ┊ GC (min … max): 0.00% … 97.14% Time (median): 3.516 μs ┊ GC (median): 0.00% Time (mean ± σ): 4.015 μs ± 3.726 μs ┊ GC (mean ± σ): 3.16% ± 3.46% ▇█▆▄▅▄▄▃▂▁▂▁ ▂ ▇███████████████▇██▇▇█▇▇▆▇▇▇▇▅▆▅▇▇▅██▇▇▆▇▇▇█▇█▇▇▅▆▆▆▆▅▅▅▅▄▄ █ 3.26 μs Histogram: log(frequency) by time 8.52 μs < Memory estimate: 7.64 KiB, allocs estimate: 489. julia> @benchmark compute(Bar, 1000) BenchmarkTools.Trial: 10000 samples with 4 evaluations. Range (min … max): 6.990 μs … 288.079 μs ┊ GC (min … max): 0.00% … 97.03% Time (median): 7.657 μs ┊ GC (median): 0.00% Time (mean ± σ): 9.019 μs ± 9.710 μs ┊ GC (mean ± σ): 4.59% ± 4.28% ▆█▆▄▃▂▂▁▂▃▂▁ ▁ ▁ ██████████████████████▇▇▇▇▇▆██████▇▇█▇▇▇▆▆▆▆▅▆▅▄▄▄▅▄▄▃▄▄▂▄▅ █ 6.99 μs Histogram: log(frequency) by time 20.7 μs < Memory estimate: 23.27 KiB, allocs estimate: 1489. ``` > on this branch ```julia julia> @benchmark compute(Foo, 1000) BenchmarkTools.Trial: 10000 samples with 1000 evaluations. Range (min … max): 1.234 ns … 116.188 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 1.246 ns ┊ GC (median): 0.00% Time (mean ± σ): 1.307 ns ± 1.444 ns ┊ GC (mean ± σ): 0.00% ± 0.00% █▇ ▂▂▁ ▂ ▁ ██████▇█▇▅▄▆▇▆▁▃▄▁▁▁▁▁▃▁▃▁▁▄▇▅▃▃▃▁▃▄▁▃▃▁▃▁▁▃▁▁▁▄▃▁▃▇███▇▇▇▆ █ 1.23 ns Histogram: log(frequency) by time 1.94 ns < Memory estimate: 0 bytes, allocs estimate: 0. julia> @benchmark compute(Bar, 1000) BenchmarkTools.Trial: 10000 samples with 1000 evaluations. Range (min … max): 1.234 ns … 33.790 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 1.245 ns ┊ GC (median): 0.00% Time (mean ± σ): 1.297 ns ± 0.677 ns ┊ GC (mean ± σ): 0.00% ± 0.00% █▇ ▃▂▁ ▁ ██████▆▆▅▁▄▅▅▄▁▄▄▄▃▄▃▁▃▁▃▄▃▁▃▁▃▁▁▁▃▃▁▃▃▁▁▁▁▁▁▁▃▁▄█████▇▇▇▇ █ 1.23 ns Histogram: log(frequency) by time 1.96 ns < Memory estimate: 0 bytes, allocs estimate: 0. ``` This `typeassert` elimination would be much more effective if we implement more aggressive SROA based on strong [alias analysis](https://github.com/aviatesk/EscapeAnalysis.jl) and/or [more aggressive Julia-level DCE](#27547). But this change is so simple that I don't think it hurts anything to have it for now.

@benchmark

Adds a very simple optimization pass to eliminate `typeassert` calls. The motivation is, when SROA replaces `getfield` calls with scalar values, then we can often prove `typeassert` whose first operand is a replaced value is no-op: ```julia julia> struct Foo; x; end julia> code_typed((Int,)) do a x1 = Foo(a) x2 = Foo(x1) typeassert(x2.x, Foo).x end |> only |> first CodeInfo( 1 ─ %1 = Main.Foo::Type{Foo} │ %2 = %new(%1, a)::Foo │ Main.typeassert(%2, Main.Foo)::Foo # can be nullified └── return a ) ``` Nullifying `typeassert` helps succeeding (simple) DCE to eliminate dead allocations, and also allows LLVM to do more aggressive DCE to emit simpler code. Here is a simple benchmarking: > sample target code: ```julia julia> function compute(T, n) r = 0 for i in 1:n x1 = T(i) x2 = T(x1) r += (x2.x::T).x::Int end r end compute (generic function with 1 method) julia> struct Foo; x; end julia> mutable struct Bar; x; end ``` > on master ```julia julia> @benchmark compute(Foo, 1000) BenchmarkTools.Trial: 10000 samples with 8 evaluations. Range (min … max): 3.263 μs … 145.828 μs ┊ GC (min … max): 0.00% … 97.14% Time (median): 3.516 μs ┊ GC (median): 0.00% Time (mean ± σ): 4.015 μs ± 3.726 μs ┊ GC (mean ± σ): 3.16% ± 3.46% ▇█▆▄▅▄▄▃▂▁▂▁ ▂ ▇███████████████▇██▇▇█▇▇▆▇▇▇▇▅▆▅▇▇▅██▇▇▆▇▇▇█▇█▇▇▅▆▆▆▆▅▅▅▅▄▄ █ 3.26 μs Histogram: log(frequency) by time 8.52 μs < Memory estimate: 7.64 KiB, allocs estimate: 489. julia> @benchmark compute(Bar, 1000) BenchmarkTools.Trial: 10000 samples with 4 evaluations. Range (min … max): 6.990 μs … 288.079 μs ┊ GC (min … max): 0.00% … 97.03% Time (median): 7.657 μs ┊ GC (median): 0.00% Time (mean ± σ): 9.019 μs ± 9.710 μs ┊ GC (mean ± σ): 4.59% ± 4.28% ▆█▆▄▃▂▂▁▂▃▂▁ ▁ ▁ ██████████████████████▇▇▇▇▇▆██████▇▇█▇▇▇▆▆▆▆▅▆▅▄▄▄▅▄▄▃▄▄▂▄▅ █ 6.99 μs Histogram: log(frequency) by time 20.7 μs < Memory estimate: 23.27 KiB, allocs estimate: 1489. ``` > on this branch ```julia julia> @benchmark compute(Foo, 1000) BenchmarkTools.Trial: 10000 samples with 1000 evaluations. Range (min … max): 1.234 ns … 116.188 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 1.246 ns ┊ GC (median): 0.00% Time (mean ± σ): 1.307 ns ± 1.444 ns ┊ GC (mean ± σ): 0.00% ± 0.00% █▇ ▂▂▁ ▂ ▁ ██████▇█▇▅▄▆▇▆▁▃▄▁▁▁▁▁▃▁▃▁▁▄▇▅▃▃▃▁▃▄▁▃▃▁▃▁▁▃▁▁▁▄▃▁▃▇███▇▇▇▆ █ 1.23 ns Histogram: log(frequency) by time 1.94 ns < Memory estimate: 0 bytes, allocs estimate: 0. julia> @benchmark compute(Bar, 1000) BenchmarkTools.Trial: 10000 samples with 1000 evaluations. Range (min … max): 1.234 ns … 33.790 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 1.245 ns ┊ GC (median): 0.00% Time (mean ± σ): 1.297 ns ± 0.677 ns ┊ GC (mean ± σ): 0.00% ± 0.00% █▇ ▃▂▁ ▁ ██████▆▆▅▁▄▅▅▄▁▄▄▄▃▄▃▁▃▁▃▄▃▁▃▁▃▁▁▁▃▃▁▃▃▁▁▁▁▁▁▁▃▁▄█████▇▇▇▇ █ 1.23 ns Histogram: log(frequency) by time 1.96 ns < Memory estimate: 0 bytes, allocs estimate: 0. ``` This `typeassert` elimination would be much more effective if we implement more aggressive SROA based on strong [alias analysis](https://github.com/aviatesk/EscapeAnalysis.jl) and/or [more aggressive Julia-level DCE](JuliaLang#27547). But this change is so simple that I don't think it hurts anything to have it for now.

@benchmark

Adds a very simple optimization pass to eliminate `typeassert` calls. The motivation is, when SROA replaces `getfield` calls with scalar values, then we can often prove `typeassert` whose first operand is a replaced value is no-op: ```julia julia> struct Foo; x; end julia> code_typed((Int,)) do a x1 = Foo(a) x2 = Foo(x1) typeassert(x2.x, Foo).x end |> only |> first CodeInfo( 1 ─ %1 = Main.Foo::Type{Foo} │ %2 = %new(%1, a)::Foo │ Main.typeassert(%2, Main.Foo)::Foo # can be nullified └── return a ) ``` Nullifying `typeassert` helps succeeding (simple) DCE to eliminate dead allocations, and also allows LLVM to do more aggressive DCE to emit simpler code. Here is a simple benchmarking: > sample target code: ```julia julia> function compute(T, n) r = 0 for i in 1:n x1 = T(i) x2 = T(x1) r += (x2.x::T).x::Int end r end compute (generic function with 1 method) julia> struct Foo; x; end julia> mutable struct Bar; x; end ``` > on master ```julia julia> @benchmark compute(Foo, 1000) BenchmarkTools.Trial: 10000 samples with 8 evaluations. Range (min … max): 3.263 μs … 145.828 μs ┊ GC (min … max): 0.00% … 97.14% Time (median): 3.516 μs ┊ GC (median): 0.00% Time (mean ± σ): 4.015 μs ± 3.726 μs ┊ GC (mean ± σ): 3.16% ± 3.46% ▇█▆▄▅▄▄▃▂▁▂▁ ▂ ▇███████████████▇██▇▇█▇▇▆▇▇▇▇▅▆▅▇▇▅██▇▇▆▇▇▇█▇█▇▇▅▆▆▆▆▅▅▅▅▄▄ █ 3.26 μs Histogram: log(frequency) by time 8.52 μs < Memory estimate: 7.64 KiB, allocs estimate: 489. julia> @benchmark compute(Bar, 1000) BenchmarkTools.Trial: 10000 samples with 4 evaluations. Range (min … max): 6.990 μs … 288.079 μs ┊ GC (min … max): 0.00% … 97.03% Time (median): 7.657 μs ┊ GC (median): 0.00% Time (mean ± σ): 9.019 μs ± 9.710 μs ┊ GC (mean ± σ): 4.59% ± 4.28% ▆█▆▄▃▂▂▁▂▃▂▁ ▁ ▁ ██████████████████████▇▇▇▇▇▆██████▇▇█▇▇▇▆▆▆▆▅▆▅▄▄▄▅▄▄▃▄▄▂▄▅ █ 6.99 μs Histogram: log(frequency) by time 20.7 μs < Memory estimate: 23.27 KiB, allocs estimate: 1489. ``` > on this branch ```julia julia> @benchmark compute(Foo, 1000) BenchmarkTools.Trial: 10000 samples with 1000 evaluations. Range (min … max): 1.234 ns … 116.188 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 1.246 ns ┊ GC (median): 0.00% Time (mean ± σ): 1.307 ns ± 1.444 ns ┊ GC (mean ± σ): 0.00% ± 0.00% █▇ ▂▂▁ ▂ ▁ ██████▇█▇▅▄▆▇▆▁▃▄▁▁▁▁▁▃▁▃▁▁▄▇▅▃▃▃▁▃▄▁▃▃▁▃▁▁▃▁▁▁▄▃▁▃▇███▇▇▇▆ █ 1.23 ns Histogram: log(frequency) by time 1.94 ns < Memory estimate: 0 bytes, allocs estimate: 0. julia> @benchmark compute(Bar, 1000) BenchmarkTools.Trial: 10000 samples with 1000 evaluations. Range (min … max): 1.234 ns … 33.790 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 1.245 ns ┊ GC (median): 0.00% Time (mean ± σ): 1.297 ns ± 0.677 ns ┊ GC (mean ± σ): 0.00% ± 0.00% █▇ ▃▂▁ ▁ ██████▆▆▅▁▄▅▅▄▁▄▄▄▃▄▃▁▃▁▃▄▃▁▃▁▃▁▁▁▃▃▁▃▃▁▁▁▁▁▁▁▃▁▄█████▇▇▇▇ █ 1.23 ns Histogram: log(frequency) by time 1.96 ns < Memory estimate: 0 bytes, allocs estimate: 0. ``` This `typeassert` elimination would be much more effective if we implement more aggressive SROA based on strong [alias analysis](https://github.com/aviatesk/EscapeAnalysis.jl) and/or [more aggressive Julia-level DCE](JuliaLang#27547). But this change is so simple that I don't think it hurts anything to have it for now.

JeffBezanson added the compiler:optimizer Optimization passes (mostly in base/compiler/ssair/) label Jun 12, 2018

Keno mentioned this issue Jul 20, 2018

new-IR lacks a Basic-DCE (dead code elimination) pass #28212

Closed

Keno mentioned this issue Aug 30, 2018

Allow CFG transforms during compaction and do CFG simplifications on the fly #28978

Merged

aviatesk mentioned this issue Oct 19, 2021

optimizer: eliminate safe typeassert calls #42706

Merged

nsajko mentioned this issue Nov 29, 2023

docs: perf tips: deemphasize assume in favor of UnsafeAssume.jl JuliaGPU/CUDA.jl#2181

Open

nsajko mentioned this issue Jan 11, 2024

An intrinisic/function/macro like assume #52851

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

better early DCE #27547

better early DCE #27547

JeffBezanson commented Jun 12, 2018

Keno commented Jun 12, 2018

c42f commented Nov 4, 2020 •

edited

Loading

vchuravy commented Nov 4, 2020

better early DCE #27547

better early DCE #27547

Comments

JeffBezanson commented Jun 12, 2018

Keno commented Jun 12, 2018

c42f commented Nov 4, 2020 • edited Loading

vchuravy commented Nov 4, 2020

c42f commented Nov 4, 2020 •

edited

Loading