improve codegen of varargs functions #5402

JeffBezanson · 2014-01-15T06:43:32Z

The code generator currently pessimizes varargs functions in various ways.
For example:

function f(a...)
    z = a[1]+a[2]
    z
end

function g()
    x = f(3,5)
end

We don't generate a specialized signature for f, and its return value is even boxed.

We should generate a specialized f(Int,Int) (if the function is specialized for 2 Int arguments), or f(Int...) (using llvm varargs).

For cases where we can't generate an especially nice signature (e.g. for f(Any...)), we could keep using the julia calling convention for arguments but specialize the return type.

The text was updated successfully, but these errors were encountered:

KristofferC · 2016-05-12T20:41:23Z

FWIW, this example currently generates:

julia> @code_llvm g()

define i64 @julia_g_53587() #0 {
top:
  ret i64 8
}

andyferris · 2016-09-05T00:00:16Z

Just a checkup - is there any progress on this issue?

We should generate a specialized f(Int,Int) (if the function is specialized for 2 Int arguments)

Indeed! This would be really awesome. It can just specialize to invoking f(a::Tuple{Int,Int}) which has similar ABI (right? tuples are flattened and the same as C calling convention, at least for immutables, no?) and the correct syntax (a[1], a[2]).

I find I have to avoid varargs functions in a lot of performance critical code (e.g. map and broadcast for StaticArrays) which makes life painful. :(

timholy · 2016-09-05T02:12:00Z

I suspect #265 is ahead of this one in the queue. Seems appropriate, since most varargs performance problems can be circumvented by forced inlining.

andyferris · 2016-09-05T03:19:22Z

Oh? I didn't know that - thanks! That helps a lot (but makes using @code_x a bit more arduous... maybe we need a macro for that... (actually that's not a bad idea!)).

@noinline

This just handles the simple case where the varargs arguments are all isbits_spec. More complicated additions are of course possible, but this already gets a bunch of interesting cases. Before: ``` julia> @noinline f(x::Vararg{Int, N}) where {N} = x[1] f (generic function with 1 method) julia> g(x) = f(x) g (generic function with 1 method) julia> @code_llvm f(1) ; Function f ; Location: REPL[1] define %jl_value_t addrspace(10)* @japi1_f_62200(%jl_value_t addrspace(10)*, %jl_value_t addrspace(10)**, i32) #0 { top: %3 = alloca %jl_value_t addrspace(10)**, align 8 store volatile %jl_value_t addrspace(10)** %1, %jl_value_t addrspace(10)*** %3, align 8 ; Location: REPL[1]:1 %4 = icmp eq i32 %2, 0 br i1 %4, label %fail, label %pass fail: ; preds = %top call void @jl_bounds_error_tuple_int(%jl_value_t addrspace(10)** %1, i64 0, i64 1) unreachable pass: ; preds = %top %5 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)** %1, align 8 ret %jl_value_t addrspace(10)* %5 } julia> @code_llvm g(1) ; Function g ; Location: REPL[2] define i64 @julia_g_62202(i64) #0 { top: %1 = alloca %jl_value_t addrspace(10)*, align 8 %gcframe1 = alloca [3 x %jl_value_t addrspace(10)*], align 8 %gcframe1.sub = getelementptr inbounds [3 x %jl_value_t addrspace(10)*], [3 x %jl_value_t addrspace(10)*]* %gcframe1, i64 0, i64 0 %2 = getelementptr inbounds [3 x %jl_value_t addrspace(10)*], [3 x %jl_value_t addrspace(10)*]* %gcframe1, i64 0, i64 1 %3 = bitcast %jl_value_t addrspace(10)** %2 to i8* call void @llvm.memset.p0i8.i32(i8* %3, i8 0, i32 16, i32 8, i1 false) %4 = call %jl_value_t*** @jl_get_ptls_states() #4 %5 = bitcast [3 x %jl_value_t addrspace(10)*]* %gcframe1 to i64* store i64 2, i64* %5, align 8 %6 = bitcast %jl_value_t*** %4 to i64* %7 = load i64, i64* %6, align 8 %8 = getelementptr [3 x %jl_value_t addrspace(10)*], [3 x %jl_value_t addrspace(10)*]* %gcframe1, i64 0, i64 1 %9 = bitcast %jl_value_t addrspace(10)** %8 to i64* store i64 %7, i64* %9, align 8 %10 = bitcast %jl_value_t*** %4 to %jl_value_t addrspace(10)*** store %jl_value_t addrspace(10)** %gcframe1.sub, %jl_value_t addrspace(10)*** %10, align 8 ; Location: REPL[2]:1 %11 = call %jl_value_t addrspace(10)* @jl_box_int64(i64 signext %0) %12 = getelementptr [3 x %jl_value_t addrspace(10)*], [3 x %jl_value_t addrspace(10)*]* %gcframe1, i64 0, i64 2 store %jl_value_t addrspace(10)* %11, %jl_value_t addrspace(10)** %12, align 8 store %jl_value_t addrspace(10)* %11, %jl_value_t addrspace(10)** %1, align 8 %13 = call %jl_value_t addrspace(10)* @japi1_f_62202(%jl_value_t addrspace(10)* addrspacecast (%jl_value_t* inttoptr (i64 4511449272 to %jl_value_t*) to %jl_value_t addrspace(10)*), %jl_value_t addrspace(10)** nonnull %1, i32 1) %14 = bitcast %jl_value_t addrspace(10)* %13 to i64 addrspace(10)* %15 = load i64, i64 addrspace(10)* %14, align 8 %16 = load i64, i64* %9, align 8 store i64 %16, i64* %6, align 8 ret i64 %15 } ``` After: ``` julia> @code_llvm f(1) define i64 @julia_f_63017(i64) #0 { top: ret i64 %0 } julia> @code_llvm g(1) define i64 @julia_g_63053(i64) #0 { top: %1 = call i64 @julia_f_63054(i64 %0) ret i64 %1 } ``` Part of #5402

@noinline

This just handles the simple case where the varargs arguments are all isbits_spec. More complicated additions are of course possible, but this already gets a bunch of interesting cases. Before: ``` julia> @noinline f(x::Vararg{Int, N}) where {N} = x[1] f (generic function with 1 method) julia> g(x) = f(x) g (generic function with 1 method) julia> @code_llvm f(1) ; Function f ; Location: REPL[1] define %jl_value_t addrspace(10)* @japi1_f_62200(%jl_value_t addrspace(10)*, %jl_value_t addrspace(10)**, i32) #0 { top: %3 = alloca %jl_value_t addrspace(10)**, align 8 store volatile %jl_value_t addrspace(10)** %1, %jl_value_t addrspace(10)*** %3, align 8 ; Location: REPL[1]:1 %4 = icmp eq i32 %2, 0 br i1 %4, label %fail, label %pass fail: ; preds = %top call void @jl_bounds_error_tuple_int(%jl_value_t addrspace(10)** %1, i64 0, i64 1) unreachable pass: ; preds = %top %5 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)** %1, align 8 ret %jl_value_t addrspace(10)* %5 } julia> @code_llvm g(1) ; Function g ; Location: REPL[2] define i64 @julia_g_62202(i64) #0 { top: %1 = alloca %jl_value_t addrspace(10)*, align 8 %gcframe1 = alloca [3 x %jl_value_t addrspace(10)*], align 8 %gcframe1.sub = getelementptr inbounds [3 x %jl_value_t addrspace(10)*], [3 x %jl_value_t addrspace(10)*]* %gcframe1, i64 0, i64 0 %2 = getelementptr inbounds [3 x %jl_value_t addrspace(10)*], [3 x %jl_value_t addrspace(10)*]* %gcframe1, i64 0, i64 1 %3 = bitcast %jl_value_t addrspace(10)** %2 to i8* call void @llvm.memset.p0i8.i32(i8* %3, i8 0, i32 16, i32 8, i1 false) %4 = call %jl_value_t*** @jl_get_ptls_states() #4 %5 = bitcast [3 x %jl_value_t addrspace(10)*]* %gcframe1 to i64* store i64 2, i64* %5, align 8 %6 = bitcast %jl_value_t*** %4 to i64* %7 = load i64, i64* %6, align 8 %8 = getelementptr [3 x %jl_value_t addrspace(10)*], [3 x %jl_value_t addrspace(10)*]* %gcframe1, i64 0, i64 1 %9 = bitcast %jl_value_t addrspace(10)** %8 to i64* store i64 %7, i64* %9, align 8 %10 = bitcast %jl_value_t*** %4 to %jl_value_t addrspace(10)*** store %jl_value_t addrspace(10)** %gcframe1.sub, %jl_value_t addrspace(10)*** %10, align 8 ; Location: REPL[2]:1 %11 = call %jl_value_t addrspace(10)* @jl_box_int64(i64 signext %0) %12 = getelementptr [3 x %jl_value_t addrspace(10)*], [3 x %jl_value_t addrspace(10)*]* %gcframe1, i64 0, i64 2 store %jl_value_t addrspace(10)* %11, %jl_value_t addrspace(10)** %12, align 8 store %jl_value_t addrspace(10)* %11, %jl_value_t addrspace(10)** %1, align 8 %13 = call %jl_value_t addrspace(10)* @japi1_f_62202(%jl_value_t addrspace(10)* addrspacecast (%jl_value_t* inttoptr (i64 4511449272 to %jl_value_t*) to %jl_value_t addrspace(10)*), %jl_value_t addrspace(10)** nonnull %1, i32 1) %14 = bitcast %jl_value_t addrspace(10)* %13 to i64 addrspace(10)* %15 = load i64, i64 addrspace(10)* %14, align 8 %16 = load i64, i64* %9, align 8 store i64 %16, i64* %6, align 8 ret i64 %15 } ``` After: ``` julia> @code_llvm f(1) define i64 @julia_f_63017(i64) #0 { top: ret i64 %0 } julia> @code_llvm g(1) define i64 @julia_g_63053(i64) #0 { top: %1 = call i64 @julia_f_63054(i64 %0) ret i64 %1 } ``` Part of #5402

KristofferC · 2018-08-15T17:51:20Z

Is there anything left to do here?

KristofferC · 2018-10-18T03:28:15Z

Closing since original example is fixed. Can always open more specific examples if they are encounted.

JeffBezanson mentioned this issue Jan 15, 2014

Length of varargs not inferred from type signature #5383

Closed

simonster mentioned this issue Feb 1, 2015

cat performance #3645

Closed

JeffBezanson mentioned this issue May 25, 2015

Make f(args...) as efficient as f(args) #11248

Closed

yuyichao mentioned this issue Jun 6, 2015

Indexing performance #11595

Closed

yuyichao added the compiler:codegen Generation of LLVM IR and native code label Aug 16, 2015

yuyichao mentioned this issue Aug 16, 2015

varargs generated functions generate inefficient code #12643

Closed

yuyichao mentioned this issue Oct 5, 2015

Call-site splatting optimization #13359

Open

yuyichao mentioned this issue Dec 15, 2015

Performance regression with threads #14417

Closed

vtjnash self-assigned this Apr 22, 2016

vtjnash mentioned this issue May 12, 2016

Strange allocation with unused Vararg argument in a function creating tuple: #16327

Closed

jrevels mentioned this issue Aug 18, 2017

codegen emits bounds-checking instructions when destructuring tuple even though size is statically known #23303

Closed

jrevels mentioned this issue Sep 5, 2017

optimization of "pass-through" varargs methods JuliaLabs/Cassette.jl#5

Closed

Keno mentioned this issue Sep 13, 2017

Improve codegen for certain varargs functions #23696

Merged

cstjean mentioned this issue Oct 13, 2017

Cassette.jl backend cstjean/TraceCalls.jl#57

Open

jrevels mentioned this issue Jan 10, 2018

Cassette Compatibility #25492

Closed

5 tasks

ChrisRackauckas mentioned this issue Jan 18, 2018

AutoDiff TuringLang/Turing.jl#331

Closed

KristofferC closed this as completed Oct 18, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve codegen of varargs functions #5402

improve codegen of varargs functions #5402

JeffBezanson commented Jan 15, 2014

KristofferC commented May 12, 2016

andyferris commented Sep 5, 2016

timholy commented Sep 5, 2016

andyferris commented Sep 5, 2016

KristofferC commented Aug 15, 2018

KristofferC commented Oct 18, 2018

improve codegen of varargs functions #5402

improve codegen of varargs functions #5402

Comments

JeffBezanson commented Jan 15, 2014

KristofferC commented May 12, 2016

andyferris commented Sep 5, 2016

timholy commented Sep 5, 2016

andyferris commented Sep 5, 2016

KristofferC commented Aug 15, 2018

KristofferC commented Oct 18, 2018