Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve codegen of varargs functions #5402

Closed
JeffBezanson opened this issue Jan 15, 2014 · 6 comments
Closed

improve codegen of varargs functions #5402

JeffBezanson opened this issue Jan 15, 2014 · 6 comments
Assignees
Labels
compiler:codegen Generation of LLVM IR and native code performance Must go faster

Comments

@JeffBezanson
Copy link
Member

The code generator currently pessimizes varargs functions in various ways.
For example:

function f(a...)
    z = a[1]+a[2]
    z
end

function g()
    x = f(3,5)
end

We don't generate a specialized signature for f, and its return value is even boxed.

We should generate a specialized f(Int,Int) (if the function is specialized for 2 Int arguments), or f(Int...) (using llvm varargs).

For cases where we can't generate an especially nice signature (e.g. for f(Any...)), we could keep using the julia calling convention for arguments but specialize the return type.

@KristofferC
Copy link
Member

FWIW, this example currently generates:

julia> @code_llvm g()

define i64 @julia_g_53587() #0 {
top:
  ret i64 8
}

@andyferris
Copy link
Member

Just a checkup - is there any progress on this issue?

We should generate a specialized f(Int,Int) (if the function is specialized for 2 Int arguments)

Indeed! This would be really awesome. It can just specialize to invoking f(a::Tuple{Int,Int}) which has similar ABI (right? tuples are flattened and the same as C calling convention, at least for immutables, no?) and the correct syntax (a[1], a[2]).

I find I have to avoid varargs functions in a lot of performance critical code (e.g. map and broadcast for StaticArrays) which makes life painful. :(

@timholy
Copy link
Member

timholy commented Sep 5, 2016

I suspect #265 is ahead of this one in the queue. Seems appropriate, since most varargs performance problems can be circumvented by forced inlining.

@andyferris
Copy link
Member

Oh? I didn't know that - thanks! That helps a lot (but makes using @code_x a bit more arduous... maybe we need a macro for that... (actually that's not a bad idea!)).

Keno added a commit that referenced this issue Sep 13, 2017
This just handles the simple case where the varargs arguments are all
isbits_spec. More complicated additions are of course possible, but this
already gets a bunch of interesting cases.

Before:
```
julia> @noinline f(x::Vararg{Int, N}) where {N} = x[1]
f (generic function with 1 method)

julia> g(x) = f(x)
g (generic function with 1 method)

julia> @code_llvm f(1)

; Function f
; Location: REPL[1]
define %jl_value_t addrspace(10)* @japi1_f_62200(%jl_value_t addrspace(10)*, %jl_value_t addrspace(10)**, i32) #0 {
top:
  %3 = alloca %jl_value_t addrspace(10)**, align 8
  store volatile %jl_value_t addrspace(10)** %1, %jl_value_t addrspace(10)*** %3, align 8
; Location: REPL[1]:1
  %4 = icmp eq i32 %2, 0
  br i1 %4, label %fail, label %pass

fail:                                             ; preds = %top
  call void @jl_bounds_error_tuple_int(%jl_value_t addrspace(10)** %1, i64 0, i64 1)
  unreachable

pass:                                             ; preds = %top
  %5 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)** %1, align 8
  ret %jl_value_t addrspace(10)* %5
}

julia> @code_llvm g(1)

; Function g
; Location: REPL[2]
define i64 @julia_g_62202(i64) #0 {
top:
  %1 = alloca %jl_value_t addrspace(10)*, align 8
  %gcframe1 = alloca [3 x %jl_value_t addrspace(10)*], align 8
  %gcframe1.sub = getelementptr inbounds [3 x %jl_value_t addrspace(10)*], [3 x %jl_value_t addrspace(10)*]* %gcframe1, i64 0, i64 0
  %2 = getelementptr inbounds [3 x %jl_value_t addrspace(10)*], [3 x %jl_value_t addrspace(10)*]* %gcframe1, i64 0, i64 1
  %3 = bitcast %jl_value_t addrspace(10)** %2 to i8*
  call void @llvm.memset.p0i8.i32(i8* %3, i8 0, i32 16, i32 8, i1 false)
  %4 = call %jl_value_t*** @jl_get_ptls_states() #4
  %5 = bitcast [3 x %jl_value_t addrspace(10)*]* %gcframe1 to i64*
  store i64 2, i64* %5, align 8
  %6 = bitcast %jl_value_t*** %4 to i64*
  %7 = load i64, i64* %6, align 8
  %8 = getelementptr [3 x %jl_value_t addrspace(10)*], [3 x %jl_value_t addrspace(10)*]* %gcframe1, i64 0, i64 1
  %9 = bitcast %jl_value_t addrspace(10)** %8 to i64*
  store i64 %7, i64* %9, align 8
  %10 = bitcast %jl_value_t*** %4 to %jl_value_t addrspace(10)***
  store %jl_value_t addrspace(10)** %gcframe1.sub, %jl_value_t addrspace(10)*** %10, align 8
; Location: REPL[2]:1
  %11 = call %jl_value_t addrspace(10)* @jl_box_int64(i64 signext %0)
  %12 = getelementptr [3 x %jl_value_t addrspace(10)*], [3 x %jl_value_t addrspace(10)*]* %gcframe1, i64 0, i64 2
  store %jl_value_t addrspace(10)* %11, %jl_value_t addrspace(10)** %12, align 8
  store %jl_value_t addrspace(10)* %11, %jl_value_t addrspace(10)** %1, align 8
  %13 = call %jl_value_t addrspace(10)* @japi1_f_62202(%jl_value_t addrspace(10)* addrspacecast (%jl_value_t* inttoptr (i64 4511449272 to %jl_value_t*) to %jl_value_t addrspace(10)*), %jl_value_t addrspace(10)** nonnull %1, i32 1)
  %14 = bitcast %jl_value_t addrspace(10)* %13 to i64 addrspace(10)*
  %15 = load i64, i64 addrspace(10)* %14, align 8
  %16 = load i64, i64* %9, align 8
  store i64 %16, i64* %6, align 8
  ret i64 %15
}
```

After:
```
julia> @code_llvm f(1)

define i64 @julia_f_63017(i64) #0 {
top:
  ret i64 %0
}

julia> @code_llvm g(1)

define i64 @julia_g_63053(i64) #0 {
top:
  %1 = call i64 @julia_f_63054(i64 %0)
  ret i64 %1
}
```

Part of #5402
Keno added a commit that referenced this issue Sep 14, 2017
This just handles the simple case where the varargs arguments are all
isbits_spec. More complicated additions are of course possible, but this
already gets a bunch of interesting cases.

Before:
```
julia> @noinline f(x::Vararg{Int, N}) where {N} = x[1]
f (generic function with 1 method)

julia> g(x) = f(x)
g (generic function with 1 method)

julia> @code_llvm f(1)

; Function f
; Location: REPL[1]
define %jl_value_t addrspace(10)* @japi1_f_62200(%jl_value_t addrspace(10)*, %jl_value_t addrspace(10)**, i32) #0 {
top:
  %3 = alloca %jl_value_t addrspace(10)**, align 8
  store volatile %jl_value_t addrspace(10)** %1, %jl_value_t addrspace(10)*** %3, align 8
; Location: REPL[1]:1
  %4 = icmp eq i32 %2, 0
  br i1 %4, label %fail, label %pass

fail:                                             ; preds = %top
  call void @jl_bounds_error_tuple_int(%jl_value_t addrspace(10)** %1, i64 0, i64 1)
  unreachable

pass:                                             ; preds = %top
  %5 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)** %1, align 8
  ret %jl_value_t addrspace(10)* %5
}

julia> @code_llvm g(1)

; Function g
; Location: REPL[2]
define i64 @julia_g_62202(i64) #0 {
top:
  %1 = alloca %jl_value_t addrspace(10)*, align 8
  %gcframe1 = alloca [3 x %jl_value_t addrspace(10)*], align 8
  %gcframe1.sub = getelementptr inbounds [3 x %jl_value_t addrspace(10)*], [3 x %jl_value_t addrspace(10)*]* %gcframe1, i64 0, i64 0
  %2 = getelementptr inbounds [3 x %jl_value_t addrspace(10)*], [3 x %jl_value_t addrspace(10)*]* %gcframe1, i64 0, i64 1
  %3 = bitcast %jl_value_t addrspace(10)** %2 to i8*
  call void @llvm.memset.p0i8.i32(i8* %3, i8 0, i32 16, i32 8, i1 false)
  %4 = call %jl_value_t*** @jl_get_ptls_states() #4
  %5 = bitcast [3 x %jl_value_t addrspace(10)*]* %gcframe1 to i64*
  store i64 2, i64* %5, align 8
  %6 = bitcast %jl_value_t*** %4 to i64*
  %7 = load i64, i64* %6, align 8
  %8 = getelementptr [3 x %jl_value_t addrspace(10)*], [3 x %jl_value_t addrspace(10)*]* %gcframe1, i64 0, i64 1
  %9 = bitcast %jl_value_t addrspace(10)** %8 to i64*
  store i64 %7, i64* %9, align 8
  %10 = bitcast %jl_value_t*** %4 to %jl_value_t addrspace(10)***
  store %jl_value_t addrspace(10)** %gcframe1.sub, %jl_value_t addrspace(10)*** %10, align 8
; Location: REPL[2]:1
  %11 = call %jl_value_t addrspace(10)* @jl_box_int64(i64 signext %0)
  %12 = getelementptr [3 x %jl_value_t addrspace(10)*], [3 x %jl_value_t addrspace(10)*]* %gcframe1, i64 0, i64 2
  store %jl_value_t addrspace(10)* %11, %jl_value_t addrspace(10)** %12, align 8
  store %jl_value_t addrspace(10)* %11, %jl_value_t addrspace(10)** %1, align 8
  %13 = call %jl_value_t addrspace(10)* @japi1_f_62202(%jl_value_t addrspace(10)* addrspacecast (%jl_value_t* inttoptr (i64 4511449272 to %jl_value_t*) to %jl_value_t addrspace(10)*), %jl_value_t addrspace(10)** nonnull %1, i32 1)
  %14 = bitcast %jl_value_t addrspace(10)* %13 to i64 addrspace(10)*
  %15 = load i64, i64 addrspace(10)* %14, align 8
  %16 = load i64, i64* %9, align 8
  store i64 %16, i64* %6, align 8
  ret i64 %15
}
```

After:
```
julia> @code_llvm f(1)

define i64 @julia_f_63017(i64) #0 {
top:
  ret i64 %0
}

julia> @code_llvm g(1)

define i64 @julia_g_63053(i64) #0 {
top:
  %1 = call i64 @julia_f_63054(i64 %0)
  ret i64 %1
}
```

Part of #5402
@jrevels jrevels mentioned this issue Jan 10, 2018
5 tasks
@KristofferC
Copy link
Member

Is there anything left to do here?

@KristofferC
Copy link
Member

Closing since original example is fixed. Can always open more specific examples if they are encounted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:codegen Generation of LLVM IR and native code performance Must go faster
Projects
None yet
Development

No branches or pull requests

6 participants