Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

better early DCE #27547

Open
JeffBezanson opened this issue Jun 12, 2018 · 3 comments
Open

better early DCE #27547

JeffBezanson opened this issue Jun 12, 2018 · 3 comments
Labels
compiler:optimizer Optimization passes (mostly in base/compiler/ssair/)

Comments

@JeffBezanson
Copy link
Member

It would be nice to be able to remove more dead code in the julia-level optimizer. Example case:

julia> function f(A)
           @inbounds for I in 1:length(A)
               view(A, I)
           end
       end
f (generic function with 1 method)

julia> @code_typed f([1])
CodeInfo(
2 1 ── %1  = Base.arraylen(%%A)::Int64                          │╻          length
  │    %2  = Base.sle_int(1, %1)::Bool                          ││╻╷╷╷       Type
  │          Base.sub_int(%1, 1)                                │││╻          unitrange_last
  │    %4  = Base.ifelse(%2, %1, 0)::Int64                      ││││       
  │    %5  = Base.slt_int(%4, 1)::Bool                          ││╻╷╷        isempty
  └───       goto 3 if not %5                                   ││         
  2 ──       goto 4                                             ││         
  3 ──       goto 4                                             ││         
  4 ┄─ %9  = φ (2 => true, 3 => false)::Bool                    │          
  │    %10 = φ (3 => 1)::Int64                                  │          
  │    %11 = φ (3 => 1)::Int64                                  │          
  │    %12 = Base.not_int(%9)::Bool                             │          
  └───       goto 16 if not %12                                 │          
  5 ┄─ %14 = φ (4 => %10, 15 => %35)::Int64                     │          
  │    %15 = φ (4 => %11, 15 => %36)::Int64                     │          
3 └───       goto 10 if not false                               │╻          view
  6 ── %17 = Core.tuple(%14)::Tuple{Int64}                      ││         
  │    %18 = Base.arraysize(%%A, 1)::Int64                      ││╻╷╷╷╷      checkbounds
  │    %19 = Base.slt_int(%18, 0)::Bool                         │││╻╷╷╷       checkbounds
  │    %20 = Base.ifelse(%19, 0, %18)::Int64                    ││││┃││││││    eachindex
  │    %21 = Base.sle_int(1, %14)::Bool                         │││││╻          <=
  │    %22 = Base.sle_int(%14, %20)::Bool                       ││││││     
  │    %23 = Base.and_int(%21, %22)::Bool                       │││││╻          &
  └───       goto 8 if not %23                                  │││        
  7 ──       goto 9                                             │││        
  8 ──       invoke Base.throw_boundserror(%%A::Array{Int64,1}, %17::Tuple{Int64})
  └───       unreachable                                        │││        
  9 ──       nothing                                            │          
  10 ┄       goto 11                                            │╻          view
  11 ─ %30 = Base.:===(%15, %4)::Bool                           ││╻          ==
  └───       goto 13 if not %30                                 ││         
  12 ─       goto 14                                            ││         
  13 ─ %33 = Base.add_int(%15, 1)::Int64                        ││╻          +
  └───       goto 14                                            │╻          iterate
  14 ┄ %35 = φ (13 => %33)::Int64                               │          
  │    %36 = φ (13 => %33)::Int64                               │          
  │    %37 = φ (12 => true, 13 => false)::Bool                  │          
  │    %38 = Base.not_int(%37)::Bool                            │          
  └───       goto 16 if not %38                                 │          
  15 ─       goto 5                                             │          
  16 ┄       return nothing                                     │          
) => Nothing

In this IR there are a couple redundant basic blocks (consisting only of a goto to the next block). There is also unreachable bounds error code (goto 10 if not false). LLVM can remove this code very easily, but it would still be useful for us to remove it (1) to cut down stored IR size, (2) for inlining heuristics, and (3) to spend less time lowering to LLVM.

@JeffBezanson JeffBezanson added the compiler:optimizer Optimization passes (mostly in base/compiler/ssair/) label Jun 12, 2018
@Keno
Copy link
Member

Keno commented Jun 12, 2018

The action item here is a better representation of the CFG and the ability to update the domtree (optional, but we should at least make sure to mark it as invalidated and recompute it if necessary).

@c42f
Copy link
Member

c42f commented Nov 4, 2020

I just ran into this when puzzling over some StaticArrays code — I somehow expected the @boundscheck checkbounds(v,i) to be removed on the Julia side when viewing output of @code_typed.

Is the compiler now in a state where this would be fairly easy, given #28978 and other changes since then?

@vchuravy
Copy link
Member

vchuravy commented Nov 4, 2020

If you want to take a look at this #37882 is probably the right starting point.

aviatesk added a commit that referenced this issue Oct 19, 2021
Adds a very simple optimization pass to eliminate `typeassert` calls.
The motivation is, when SROA replaces `getfield` calls with scalar values,
then we can often prove `typeassert` whose first operand is a replaced
value is no-op:
```julia
julia> struct Foo; x; end

julia> code_typed((Int,)) do a
                   x1 = Foo(a)
                   x2 = Foo(x1)
                   typeassert(x2.x, Foo).x
               end |> only |> first
CodeInfo(
1 ─ %1 = Main.Foo::Type{Foo}
│   %2 = %new(%1, a)::Foo
│        Main.typeassert(%2, Main.Foo)::Foo # can be nullified
└──      return a
)
```

Nullifying `typeassert` helps succeeding (simple) DCE to eliminate dead
allocations, and also allows LLVM to do more aggressive DCE to emit simpler code.

Here is a simple benchmarking:
> sample target code:
```julia
julia> function compute(T, n)
           r = 0
           for i in 1:n
               x1 = T(i)
               x2 = T(x1)
               r += (x2.x::T).x::Int
           end
           r
       end
compute (generic function with 1 method)

julia> struct Foo; x; end

julia> mutable struct Bar; x; end
```

> on master
```julia
julia> @benchmark compute(Foo, 1000)
BenchmarkTools.Trial: 10000 samples with 8 evaluations.
 Range (min … max):  3.263 μs … 145.828 μs  ┊ GC (min … max): 0.00% … 97.14%
 Time  (median):     3.516 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   4.015 μs ±   3.726 μs  ┊ GC (mean ± σ):  3.16% ±  3.46%

   ▇█▆▄▅▄▄▃▂▁▂▁                                               ▂
  ▇███████████████▇██▇▇█▇▇▆▇▇▇▇▅▆▅▇▇▅██▇▇▆▇▇▇█▇█▇▇▅▆▆▆▆▅▅▅▅▄▄ █
  3.26 μs      Histogram: log(frequency) by time      8.52 μs <

 Memory estimate: 7.64 KiB, allocs estimate: 489.

julia> @benchmark compute(Bar, 1000)
BenchmarkTools.Trial: 10000 samples with 4 evaluations.
 Range (min … max):  6.990 μs … 288.079 μs  ┊ GC (min … max): 0.00% … 97.03%
 Time  (median):     7.657 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   9.019 μs ±   9.710 μs  ┊ GC (mean ± σ):  4.59% ±  4.28%

   ▆█▆▄▃▂▂▁▂▃▂▁  ▁                                            ▁
  ██████████████████████▇▇▇▇▇▆██████▇▇█▇▇▇▆▆▆▆▅▆▅▄▄▄▅▄▄▃▄▄▂▄▅ █
  6.99 μs      Histogram: log(frequency) by time      20.7 μs <

 Memory estimate: 23.27 KiB, allocs estimate: 1489.
```

> on this branch
```julia
julia> @benchmark compute(Foo, 1000)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  1.234 ns … 116.188 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.246 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.307 ns ±   1.444 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▇ ▂▂▁ ▂                                                    ▁
  ██████▇█▇▅▄▆▇▆▁▃▄▁▁▁▁▁▃▁▃▁▁▄▇▅▃▃▃▁▃▄▁▃▃▁▃▁▁▃▁▁▁▄▃▁▃▇███▇▇▇▆ █
  1.23 ns      Histogram: log(frequency) by time      1.94 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark compute(Bar, 1000)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  1.234 ns … 33.790 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.245 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.297 ns ±  0.677 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▇ ▃▂▁                                                     ▁
  ██████▆▆▅▁▄▅▅▄▁▄▄▄▃▄▃▁▃▁▃▄▃▁▃▁▃▁▁▁▃▃▁▃▃▁▁▁▁▁▁▁▃▁▄█████▇▇▇▇ █
  1.23 ns      Histogram: log(frequency) by time     1.96 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.
```

We may want to enable this `typeassert` elimination after we implement more
aggressive SROA based on [escape analysis](https://github.com/aviatesk/EscapeAnalysis.jl)
and [more aggressive Julia-level DCE](#27547),
but since this pass is super simple I think it doesn't hurt things to have
it for now.
aviatesk added a commit that referenced this issue Oct 20, 2021
Adds a very simple optimization pass to eliminate `typeassert` calls.
The motivation is, when SROA replaces `getfield` calls with scalar values,
then we can often prove `typeassert` whose first operand is a replaced
value is no-op:
```julia
julia> struct Foo; x; end

julia> code_typed((Int,)) do a
                   x1 = Foo(a)
                   x2 = Foo(x1)
                   typeassert(x2.x, Foo).x
               end |> only |> first
CodeInfo(
1 ─ %1 = Main.Foo::Type{Foo}
│   %2 = %new(%1, a)::Foo
│        Main.typeassert(%2, Main.Foo)::Foo # can be nullified
└──      return a
)
```

Nullifying `typeassert` helps succeeding (simple) DCE to eliminate dead
allocations, and also allows LLVM to do more aggressive DCE to emit simpler code.

Here is a simple benchmarking:
> sample target code:
```julia
julia> function compute(T, n)
           r = 0
           for i in 1:n
               x1 = T(i)
               x2 = T(x1)
               r += (x2.x::T).x::Int
           end
           r
       end
compute (generic function with 1 method)

julia> struct Foo; x; end

julia> mutable struct Bar; x; end
```

> on master
```julia
julia> @benchmark compute(Foo, 1000)
BenchmarkTools.Trial: 10000 samples with 8 evaluations.
 Range (min … max):  3.263 μs … 145.828 μs  ┊ GC (min … max): 0.00% … 97.14%
 Time  (median):     3.516 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   4.015 μs ±   3.726 μs  ┊ GC (mean ± σ):  3.16% ±  3.46%

   ▇█▆▄▅▄▄▃▂▁▂▁                                               ▂
  ▇███████████████▇██▇▇█▇▇▆▇▇▇▇▅▆▅▇▇▅██▇▇▆▇▇▇█▇█▇▇▅▆▆▆▆▅▅▅▅▄▄ █
  3.26 μs      Histogram: log(frequency) by time      8.52 μs <

 Memory estimate: 7.64 KiB, allocs estimate: 489.

julia> @benchmark compute(Bar, 1000)
BenchmarkTools.Trial: 10000 samples with 4 evaluations.
 Range (min … max):  6.990 μs … 288.079 μs  ┊ GC (min … max): 0.00% … 97.03%
 Time  (median):     7.657 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   9.019 μs ±   9.710 μs  ┊ GC (mean ± σ):  4.59% ±  4.28%

   ▆█▆▄▃▂▂▁▂▃▂▁  ▁                                            ▁
  ██████████████████████▇▇▇▇▇▆██████▇▇█▇▇▇▆▆▆▆▅▆▅▄▄▄▅▄▄▃▄▄▂▄▅ █
  6.99 μs      Histogram: log(frequency) by time      20.7 μs <

 Memory estimate: 23.27 KiB, allocs estimate: 1489.
```

> on this branch
```julia
julia> @benchmark compute(Foo, 1000)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  1.234 ns … 116.188 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.246 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.307 ns ±   1.444 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▇ ▂▂▁ ▂                                                    ▁
  ██████▇█▇▅▄▆▇▆▁▃▄▁▁▁▁▁▃▁▃▁▁▄▇▅▃▃▃▁▃▄▁▃▃▁▃▁▁▃▁▁▁▄▃▁▃▇███▇▇▇▆ █
  1.23 ns      Histogram: log(frequency) by time      1.94 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark compute(Bar, 1000)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  1.234 ns … 33.790 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.245 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.297 ns ±  0.677 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▇ ▃▂▁                                                     ▁
  ██████▆▆▅▁▄▅▅▄▁▄▄▄▃▄▃▁▃▁▃▄▃▁▃▁▃▁▁▁▃▃▁▃▃▁▁▁▁▁▁▁▃▁▄█████▇▇▇▇ █
  1.23 ns      Histogram: log(frequency) by time     1.96 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.
```

This `typeassert` elimination would be much more effective if we implement
more aggressive SROA based on strong [alias analysis](https://github.com/aviatesk/EscapeAnalysis.jl)
and/or [more aggressive Julia-level DCE](#27547).
But this change is so simple that I don't think it hurts anything to have
it for now.
aviatesk added a commit that referenced this issue Oct 20, 2021
Adds a very simple optimization pass to eliminate `typeassert` calls.
The motivation is, when SROA replaces `getfield` calls with scalar values,
then we can often prove `typeassert` whose first operand is a replaced
value is no-op:
```julia
julia> struct Foo; x; end

julia> code_typed((Int,)) do a
                   x1 = Foo(a)
                   x2 = Foo(x1)
                   typeassert(x2.x, Foo).x
               end |> only |> first
CodeInfo(
1 ─ %1 = Main.Foo::Type{Foo}
│   %2 = %new(%1, a)::Foo
│        Main.typeassert(%2, Main.Foo)::Foo # can be nullified
└──      return a
)
```

Nullifying `typeassert` helps succeeding (simple) DCE to eliminate dead
allocations, and also allows LLVM to do more aggressive DCE to emit simpler code.

Here is a simple benchmarking:
> sample target code:
```julia
julia> function compute(T, n)
           r = 0
           for i in 1:n
               x1 = T(i)
               x2 = T(x1)
               r += (x2.x::T).x::Int
           end
           r
       end
compute (generic function with 1 method)

julia> struct Foo; x; end

julia> mutable struct Bar; x; end
```

> on master
```julia
julia> @benchmark compute(Foo, 1000)
BenchmarkTools.Trial: 10000 samples with 8 evaluations.
 Range (min … max):  3.263 μs … 145.828 μs  ┊ GC (min … max): 0.00% … 97.14%
 Time  (median):     3.516 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   4.015 μs ±   3.726 μs  ┊ GC (mean ± σ):  3.16% ±  3.46%

   ▇█▆▄▅▄▄▃▂▁▂▁                                               ▂
  ▇███████████████▇██▇▇█▇▇▆▇▇▇▇▅▆▅▇▇▅██▇▇▆▇▇▇█▇█▇▇▅▆▆▆▆▅▅▅▅▄▄ █
  3.26 μs      Histogram: log(frequency) by time      8.52 μs <

 Memory estimate: 7.64 KiB, allocs estimate: 489.

julia> @benchmark compute(Bar, 1000)
BenchmarkTools.Trial: 10000 samples with 4 evaluations.
 Range (min … max):  6.990 μs … 288.079 μs  ┊ GC (min … max): 0.00% … 97.03%
 Time  (median):     7.657 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   9.019 μs ±   9.710 μs  ┊ GC (mean ± σ):  4.59% ±  4.28%

   ▆█▆▄▃▂▂▁▂▃▂▁  ▁                                            ▁
  ██████████████████████▇▇▇▇▇▆██████▇▇█▇▇▇▆▆▆▆▅▆▅▄▄▄▅▄▄▃▄▄▂▄▅ █
  6.99 μs      Histogram: log(frequency) by time      20.7 μs <

 Memory estimate: 23.27 KiB, allocs estimate: 1489.
```

> on this branch
```julia
julia> @benchmark compute(Foo, 1000)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  1.234 ns … 116.188 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.246 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.307 ns ±   1.444 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▇ ▂▂▁ ▂                                                    ▁
  ██████▇█▇▅▄▆▇▆▁▃▄▁▁▁▁▁▃▁▃▁▁▄▇▅▃▃▃▁▃▄▁▃▃▁▃▁▁▃▁▁▁▄▃▁▃▇███▇▇▇▆ █
  1.23 ns      Histogram: log(frequency) by time      1.94 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark compute(Bar, 1000)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  1.234 ns … 33.790 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.245 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.297 ns ±  0.677 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▇ ▃▂▁                                                     ▁
  ██████▆▆▅▁▄▅▅▄▁▄▄▄▃▄▃▁▃▁▃▄▃▁▃▁▃▁▁▁▃▃▁▃▃▁▁▁▁▁▁▁▃▁▄█████▇▇▇▇ █
  1.23 ns      Histogram: log(frequency) by time     1.96 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.
```

This `typeassert` elimination would be much more effective if we implement
more aggressive SROA based on strong [alias analysis](https://github.com/aviatesk/EscapeAnalysis.jl)
and/or [more aggressive Julia-level DCE](#27547).
But this change is so simple that I don't think it hurts anything to have
it for now.
aviatesk added a commit that referenced this issue Oct 21, 2021
Adds a very simple optimization pass to eliminate `typeassert` calls.
The motivation is, when SROA replaces `getfield` calls with scalar values,
then we can often prove `typeassert` whose first operand is a replaced
value is no-op:
```julia
julia> struct Foo; x; end

julia> code_typed((Int,)) do a
                   x1 = Foo(a)
                   x2 = Foo(x1)
                   typeassert(x2.x, Foo).x
               end |> only |> first
CodeInfo(
1 ─ %1 = Main.Foo::Type{Foo}
│   %2 = %new(%1, a)::Foo
│        Main.typeassert(%2, Main.Foo)::Foo # can be nullified
└──      return a
)
```

Nullifying `typeassert` helps succeeding (simple) DCE to eliminate dead
allocations, and also allows LLVM to do more aggressive DCE to emit simpler code.

Here is a simple benchmarking:
> sample target code:
```julia
julia> function compute(T, n)
           r = 0
           for i in 1:n
               x1 = T(i)
               x2 = T(x1)
               r += (x2.x::T).x::Int
           end
           r
       end
compute (generic function with 1 method)

julia> struct Foo; x; end

julia> mutable struct Bar; x; end
```

> on master
```julia
julia> @benchmark compute(Foo, 1000)
BenchmarkTools.Trial: 10000 samples with 8 evaluations.
 Range (min … max):  3.263 μs … 145.828 μs  ┊ GC (min … max): 0.00% … 97.14%
 Time  (median):     3.516 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   4.015 μs ±   3.726 μs  ┊ GC (mean ± σ):  3.16% ±  3.46%

   ▇█▆▄▅▄▄▃▂▁▂▁                                               ▂
  ▇███████████████▇██▇▇█▇▇▆▇▇▇▇▅▆▅▇▇▅██▇▇▆▇▇▇█▇█▇▇▅▆▆▆▆▅▅▅▅▄▄ █
  3.26 μs      Histogram: log(frequency) by time      8.52 μs <

 Memory estimate: 7.64 KiB, allocs estimate: 489.

julia> @benchmark compute(Bar, 1000)
BenchmarkTools.Trial: 10000 samples with 4 evaluations.
 Range (min … max):  6.990 μs … 288.079 μs  ┊ GC (min … max): 0.00% … 97.03%
 Time  (median):     7.657 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   9.019 μs ±   9.710 μs  ┊ GC (mean ± σ):  4.59% ±  4.28%

   ▆█▆▄▃▂▂▁▂▃▂▁  ▁                                            ▁
  ██████████████████████▇▇▇▇▇▆██████▇▇█▇▇▇▆▆▆▆▅▆▅▄▄▄▅▄▄▃▄▄▂▄▅ █
  6.99 μs      Histogram: log(frequency) by time      20.7 μs <

 Memory estimate: 23.27 KiB, allocs estimate: 1489.
```

> on this branch
```julia
julia> @benchmark compute(Foo, 1000)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  1.234 ns … 116.188 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.246 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.307 ns ±   1.444 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▇ ▂▂▁ ▂                                                    ▁
  ██████▇█▇▅▄▆▇▆▁▃▄▁▁▁▁▁▃▁▃▁▁▄▇▅▃▃▃▁▃▄▁▃▃▁▃▁▁▃▁▁▁▄▃▁▃▇███▇▇▇▆ █
  1.23 ns      Histogram: log(frequency) by time      1.94 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark compute(Bar, 1000)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  1.234 ns … 33.790 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.245 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.297 ns ±  0.677 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▇ ▃▂▁                                                     ▁
  ██████▆▆▅▁▄▅▅▄▁▄▄▄▃▄▃▁▃▁▃▄▃▁▃▁▃▁▁▁▃▃▁▃▃▁▁▁▁▁▁▁▃▁▄█████▇▇▇▇ █
  1.23 ns      Histogram: log(frequency) by time     1.96 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.
```

This `typeassert` elimination would be much more effective if we implement
more aggressive SROA based on strong [alias analysis](https://github.com/aviatesk/EscapeAnalysis.jl)
and/or [more aggressive Julia-level DCE](#27547).
But this change is so simple that I don't think it hurts anything to have
it for now.
LilithHafner pushed a commit to LilithHafner/julia that referenced this issue Feb 22, 2022
Adds a very simple optimization pass to eliminate `typeassert` calls.
The motivation is, when SROA replaces `getfield` calls with scalar values,
then we can often prove `typeassert` whose first operand is a replaced
value is no-op:
```julia
julia> struct Foo; x; end

julia> code_typed((Int,)) do a
                   x1 = Foo(a)
                   x2 = Foo(x1)
                   typeassert(x2.x, Foo).x
               end |> only |> first
CodeInfo(
1 ─ %1 = Main.Foo::Type{Foo}
│   %2 = %new(%1, a)::Foo
│        Main.typeassert(%2, Main.Foo)::Foo # can be nullified
└──      return a
)
```

Nullifying `typeassert` helps succeeding (simple) DCE to eliminate dead
allocations, and also allows LLVM to do more aggressive DCE to emit simpler code.

Here is a simple benchmarking:
> sample target code:
```julia
julia> function compute(T, n)
           r = 0
           for i in 1:n
               x1 = T(i)
               x2 = T(x1)
               r += (x2.x::T).x::Int
           end
           r
       end
compute (generic function with 1 method)

julia> struct Foo; x; end

julia> mutable struct Bar; x; end
```

> on master
```julia
julia> @benchmark compute(Foo, 1000)
BenchmarkTools.Trial: 10000 samples with 8 evaluations.
 Range (min … max):  3.263 μs … 145.828 μs  ┊ GC (min … max): 0.00% … 97.14%
 Time  (median):     3.516 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   4.015 μs ±   3.726 μs  ┊ GC (mean ± σ):  3.16% ±  3.46%

   ▇█▆▄▅▄▄▃▂▁▂▁                                               ▂
  ▇███████████████▇██▇▇█▇▇▆▇▇▇▇▅▆▅▇▇▅██▇▇▆▇▇▇█▇█▇▇▅▆▆▆▆▅▅▅▅▄▄ █
  3.26 μs      Histogram: log(frequency) by time      8.52 μs <

 Memory estimate: 7.64 KiB, allocs estimate: 489.

julia> @benchmark compute(Bar, 1000)
BenchmarkTools.Trial: 10000 samples with 4 evaluations.
 Range (min … max):  6.990 μs … 288.079 μs  ┊ GC (min … max): 0.00% … 97.03%
 Time  (median):     7.657 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   9.019 μs ±   9.710 μs  ┊ GC (mean ± σ):  4.59% ±  4.28%

   ▆█▆▄▃▂▂▁▂▃▂▁  ▁                                            ▁
  ██████████████████████▇▇▇▇▇▆██████▇▇█▇▇▇▆▆▆▆▅▆▅▄▄▄▅▄▄▃▄▄▂▄▅ █
  6.99 μs      Histogram: log(frequency) by time      20.7 μs <

 Memory estimate: 23.27 KiB, allocs estimate: 1489.
```

> on this branch
```julia
julia> @benchmark compute(Foo, 1000)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  1.234 ns … 116.188 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.246 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.307 ns ±   1.444 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▇ ▂▂▁ ▂                                                    ▁
  ██████▇█▇▅▄▆▇▆▁▃▄▁▁▁▁▁▃▁▃▁▁▄▇▅▃▃▃▁▃▄▁▃▃▁▃▁▁▃▁▁▁▄▃▁▃▇███▇▇▇▆ █
  1.23 ns      Histogram: log(frequency) by time      1.94 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark compute(Bar, 1000)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  1.234 ns … 33.790 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.245 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.297 ns ±  0.677 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▇ ▃▂▁                                                     ▁
  ██████▆▆▅▁▄▅▅▄▁▄▄▄▃▄▃▁▃▁▃▄▃▁▃▁▃▁▁▁▃▃▁▃▃▁▁▁▁▁▁▁▃▁▄█████▇▇▇▇ █
  1.23 ns      Histogram: log(frequency) by time     1.96 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.
```

This `typeassert` elimination would be much more effective if we implement
more aggressive SROA based on strong [alias analysis](https://github.com/aviatesk/EscapeAnalysis.jl)
and/or [more aggressive Julia-level DCE](JuliaLang#27547).
But this change is so simple that I don't think it hurts anything to have
it for now.
LilithHafner pushed a commit to LilithHafner/julia that referenced this issue Mar 8, 2022
Adds a very simple optimization pass to eliminate `typeassert` calls.
The motivation is, when SROA replaces `getfield` calls with scalar values,
then we can often prove `typeassert` whose first operand is a replaced
value is no-op:
```julia
julia> struct Foo; x; end

julia> code_typed((Int,)) do a
                   x1 = Foo(a)
                   x2 = Foo(x1)
                   typeassert(x2.x, Foo).x
               end |> only |> first
CodeInfo(
1 ─ %1 = Main.Foo::Type{Foo}
│   %2 = %new(%1, a)::Foo
│        Main.typeassert(%2, Main.Foo)::Foo # can be nullified
└──      return a
)
```

Nullifying `typeassert` helps succeeding (simple) DCE to eliminate dead
allocations, and also allows LLVM to do more aggressive DCE to emit simpler code.

Here is a simple benchmarking:
> sample target code:
```julia
julia> function compute(T, n)
           r = 0
           for i in 1:n
               x1 = T(i)
               x2 = T(x1)
               r += (x2.x::T).x::Int
           end
           r
       end
compute (generic function with 1 method)

julia> struct Foo; x; end

julia> mutable struct Bar; x; end
```

> on master
```julia
julia> @benchmark compute(Foo, 1000)
BenchmarkTools.Trial: 10000 samples with 8 evaluations.
 Range (min … max):  3.263 μs … 145.828 μs  ┊ GC (min … max): 0.00% … 97.14%
 Time  (median):     3.516 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   4.015 μs ±   3.726 μs  ┊ GC (mean ± σ):  3.16% ±  3.46%

   ▇█▆▄▅▄▄▃▂▁▂▁                                               ▂
  ▇███████████████▇██▇▇█▇▇▆▇▇▇▇▅▆▅▇▇▅██▇▇▆▇▇▇█▇█▇▇▅▆▆▆▆▅▅▅▅▄▄ █
  3.26 μs      Histogram: log(frequency) by time      8.52 μs <

 Memory estimate: 7.64 KiB, allocs estimate: 489.

julia> @benchmark compute(Bar, 1000)
BenchmarkTools.Trial: 10000 samples with 4 evaluations.
 Range (min … max):  6.990 μs … 288.079 μs  ┊ GC (min … max): 0.00% … 97.03%
 Time  (median):     7.657 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   9.019 μs ±   9.710 μs  ┊ GC (mean ± σ):  4.59% ±  4.28%

   ▆█▆▄▃▂▂▁▂▃▂▁  ▁                                            ▁
  ██████████████████████▇▇▇▇▇▆██████▇▇█▇▇▇▆▆▆▆▅▆▅▄▄▄▅▄▄▃▄▄▂▄▅ █
  6.99 μs      Histogram: log(frequency) by time      20.7 μs <

 Memory estimate: 23.27 KiB, allocs estimate: 1489.
```

> on this branch
```julia
julia> @benchmark compute(Foo, 1000)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  1.234 ns … 116.188 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.246 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.307 ns ±   1.444 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▇ ▂▂▁ ▂                                                    ▁
  ██████▇█▇▅▄▆▇▆▁▃▄▁▁▁▁▁▃▁▃▁▁▄▇▅▃▃▃▁▃▄▁▃▃▁▃▁▁▃▁▁▁▄▃▁▃▇███▇▇▇▆ █
  1.23 ns      Histogram: log(frequency) by time      1.94 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark compute(Bar, 1000)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  1.234 ns … 33.790 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.245 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.297 ns ±  0.677 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▇ ▃▂▁                                                     ▁
  ██████▆▆▅▁▄▅▅▄▁▄▄▄▃▄▃▁▃▁▃▄▃▁▃▁▃▁▁▁▃▃▁▃▃▁▁▁▁▁▁▁▃▁▄█████▇▇▇▇ █
  1.23 ns      Histogram: log(frequency) by time     1.96 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.
```

This `typeassert` elimination would be much more effective if we implement
more aggressive SROA based on strong [alias analysis](https://github.com/aviatesk/EscapeAnalysis.jl)
and/or [more aggressive Julia-level DCE](JuliaLang#27547).
But this change is so simple that I don't think it hurts anything to have
it for now.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:optimizer Optimization passes (mostly in base/compiler/ssair/)
Projects
None yet
Development

No branches or pull requests

4 participants