Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

codegen emits bounds-checking instructions when destructuring tuple even though size is statically known #23303

Closed
jrevels opened this issue Aug 17, 2017 · 4 comments
Labels
compiler:codegen Generation of LLVM IR and native code performance Must go faster

Comments

@jrevels
Copy link
Member

jrevels commented Aug 17, 2017

There might be a better MRE for this, but here's what I've got for now:

struct Wrapper{F}
    f::F
end

@inline unwrap(w::Wrapper) = w.f
@inline unwrap(x) = x
@inline unwrapcall(w, x, y) = unwrap(w)(x, y)
@inline (w::Wrapper)(x...) = unwrapcall(w, x...)

w = Wrapper(+)
julia> @code_warntype w(1, 1)
Variables:
  w::Wrapper{Base.#+}
  x::Tuple{Int64,Int64}

Body:
  begin
      return (Base.add_int)((Core.getfield)(x::Tuple{Int64,Int64}, 1)::Int64, (Core.getfield)(x::Tuple{Int64,Int64}, 2)::Int64)::Int64
  end::Int64

julia> @code_llvm w(1, 1)

; Function Wrapper
; Location: REPL[5]
define %jl_value_t addrspace(10)* @japi1_Wrapper_61798(%jl_value_t addrspace(10)*, %jl_value_t addrspace(10)**, i32) #0 {
top:
  %3 = alloca %jl_value_t addrspace(10)**, align 8
  store volatile %jl_value_t addrspace(10)** %1, %jl_value_t addrspace(10)*** %3, align 8
; Location: REPL[5]:1
  switch i32 %2, label %pass2 [
    i32 0, label %fail
    i32 1, label %fail1
  ]

fail:                                             ; preds = %top
  call void @jl_bounds_error_tuple_int(%jl_value_t addrspace(10)** %1, i64 0, i64 1)
  unreachable

fail1:                                            ; preds = %top
  call void @jl_bounds_error_tuple_int(%jl_value_t addrspace(10)** %1, i64 1, i64 2)
  unreachable

pass2:                                            ; preds = %top
  %4 = bitcast %jl_value_t addrspace(10)** %1 to i64 addrspace(10)**
  %5 = load i64 addrspace(10)*, i64 addrspace(10)** %4, align 8
  %6 = getelementptr %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)** %1, i64 1
  %7 = bitcast %jl_value_t addrspace(10)** %6 to i64 addrspace(10)**
  %8 = load i64 addrspace(10)*, i64 addrspace(10)** %7, align 8
  %9 = load i64, i64 addrspace(10)* %5, align 8
  %10 = load i64, i64 addrspace(10)* %8, align 8
  %11 = add i64 %10, %9
  %12 = call %jl_value_t addrspace(10)* @jl_box_int64(i64 signext %11)
  ret %jl_value_t addrspace(10)* %12
}

Keno pointed me to this code, so I'll start poking around there. Also kind of weird/surprising that inference doesn't inline the getfield call, but maybe my surprise is just due to my ignorance 😛

@jrevels jrevels added compiler:codegen Generation of LLVM IR and native code performance Must go faster labels Aug 17, 2017
@yuyichao
Copy link
Contributor

Related to that we don't specsig on vararg? (The bounds check itself can probably be removed without this)

@jrevels
Copy link
Member Author

jrevels commented Aug 18, 2017

Related to that we don't specsig on vararg?

Ref #5402?

@jrevels
Copy link
Member Author

jrevels commented Aug 18, 2017

Actually, the example in #5402's OP is a better MRE than the one I gave here:

julia> function f(a...)
                  z = a[1]+a[2]
                  z
              end
f (generic function with 1 method)

julia> @code_llvm f(1, 1)

define i8** @japi1_f_60877(i8**, i8***, i32) #0 !dbg !5 {
top:
  %3 = alloca i8***, align 8
  store volatile i8*** %1, i8**** %3, align 8
  switch i32 %2, label %pass2 [
    i32 0, label %fail
    i32 1, label %fail1
  ]

fail:                                             ; preds = %top
  call void @jl_bounds_error_tuple_int(i8*** %1, i64 0, i64 1)
  unreachable

fail1:                                            ; preds = %top
  call void @jl_bounds_error_tuple_int(i8*** %1, i64 1, i64 2)
  unreachable

pass2:                                            ; preds = %top
  %4 = bitcast i8*** %1 to i64**
  %5 = load i64*, i64** %4, align 8
  %6 = getelementptr i8**, i8*** %1, i64 1
  %7 = bitcast i8*** %6 to i64**
  %8 = load i64*, i64** %7, align 8
  %9 = load i64, i64* %5, align 16
  %10 = load i64, i64* %8, align 16
  %11 = add i64 %10, %9
  %12 = call i8** @jl_box_int64(i64 signext %11)
  ret i8** %12
}

Should this be closed as a dup of #5402?

@yuyichao
Copy link
Contributor

Should this be closed as a dup of #5402?

This can probably be fixed without fixing #5402 but since the bounds check is really the last thing you care about in these cases I think this can be closed for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:codegen Generation of LLVM IR and native code performance Must go faster
Projects
None yet
Development

No branches or pull requests

2 participants