-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve codegen of varargs functions #5402
Comments
FWIW, this example currently generates: julia> @code_llvm g()
define i64 @julia_g_53587() #0 {
top:
ret i64 8
} |
Just a checkup - is there any progress on this issue?
Indeed! This would be really awesome. It can just specialize to invoking I find I have to avoid varargs functions in a lot of performance critical code (e.g. |
I suspect #265 is ahead of this one in the queue. Seems appropriate, since most varargs performance problems can be circumvented by forced inlining. |
Oh? I didn't know that - thanks! That helps a lot (but makes using |
This just handles the simple case where the varargs arguments are all isbits_spec. More complicated additions are of course possible, but this already gets a bunch of interesting cases. Before: ``` julia> @noinline f(x::Vararg{Int, N}) where {N} = x[1] f (generic function with 1 method) julia> g(x) = f(x) g (generic function with 1 method) julia> @code_llvm f(1) ; Function f ; Location: REPL[1] define %jl_value_t addrspace(10)* @japi1_f_62200(%jl_value_t addrspace(10)*, %jl_value_t addrspace(10)**, i32) #0 { top: %3 = alloca %jl_value_t addrspace(10)**, align 8 store volatile %jl_value_t addrspace(10)** %1, %jl_value_t addrspace(10)*** %3, align 8 ; Location: REPL[1]:1 %4 = icmp eq i32 %2, 0 br i1 %4, label %fail, label %pass fail: ; preds = %top call void @jl_bounds_error_tuple_int(%jl_value_t addrspace(10)** %1, i64 0, i64 1) unreachable pass: ; preds = %top %5 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)** %1, align 8 ret %jl_value_t addrspace(10)* %5 } julia> @code_llvm g(1) ; Function g ; Location: REPL[2] define i64 @julia_g_62202(i64) #0 { top: %1 = alloca %jl_value_t addrspace(10)*, align 8 %gcframe1 = alloca [3 x %jl_value_t addrspace(10)*], align 8 %gcframe1.sub = getelementptr inbounds [3 x %jl_value_t addrspace(10)*], [3 x %jl_value_t addrspace(10)*]* %gcframe1, i64 0, i64 0 %2 = getelementptr inbounds [3 x %jl_value_t addrspace(10)*], [3 x %jl_value_t addrspace(10)*]* %gcframe1, i64 0, i64 1 %3 = bitcast %jl_value_t addrspace(10)** %2 to i8* call void @llvm.memset.p0i8.i32(i8* %3, i8 0, i32 16, i32 8, i1 false) %4 = call %jl_value_t*** @jl_get_ptls_states() #4 %5 = bitcast [3 x %jl_value_t addrspace(10)*]* %gcframe1 to i64* store i64 2, i64* %5, align 8 %6 = bitcast %jl_value_t*** %4 to i64* %7 = load i64, i64* %6, align 8 %8 = getelementptr [3 x %jl_value_t addrspace(10)*], [3 x %jl_value_t addrspace(10)*]* %gcframe1, i64 0, i64 1 %9 = bitcast %jl_value_t addrspace(10)** %8 to i64* store i64 %7, i64* %9, align 8 %10 = bitcast %jl_value_t*** %4 to %jl_value_t addrspace(10)*** store %jl_value_t addrspace(10)** %gcframe1.sub, %jl_value_t addrspace(10)*** %10, align 8 ; Location: REPL[2]:1 %11 = call %jl_value_t addrspace(10)* @jl_box_int64(i64 signext %0) %12 = getelementptr [3 x %jl_value_t addrspace(10)*], [3 x %jl_value_t addrspace(10)*]* %gcframe1, i64 0, i64 2 store %jl_value_t addrspace(10)* %11, %jl_value_t addrspace(10)** %12, align 8 store %jl_value_t addrspace(10)* %11, %jl_value_t addrspace(10)** %1, align 8 %13 = call %jl_value_t addrspace(10)* @japi1_f_62202(%jl_value_t addrspace(10)* addrspacecast (%jl_value_t* inttoptr (i64 4511449272 to %jl_value_t*) to %jl_value_t addrspace(10)*), %jl_value_t addrspace(10)** nonnull %1, i32 1) %14 = bitcast %jl_value_t addrspace(10)* %13 to i64 addrspace(10)* %15 = load i64, i64 addrspace(10)* %14, align 8 %16 = load i64, i64* %9, align 8 store i64 %16, i64* %6, align 8 ret i64 %15 } ``` After: ``` julia> @code_llvm f(1) define i64 @julia_f_63017(i64) #0 { top: ret i64 %0 } julia> @code_llvm g(1) define i64 @julia_g_63053(i64) #0 { top: %1 = call i64 @julia_f_63054(i64 %0) ret i64 %1 } ``` Part of #5402
This just handles the simple case where the varargs arguments are all isbits_spec. More complicated additions are of course possible, but this already gets a bunch of interesting cases. Before: ``` julia> @noinline f(x::Vararg{Int, N}) where {N} = x[1] f (generic function with 1 method) julia> g(x) = f(x) g (generic function with 1 method) julia> @code_llvm f(1) ; Function f ; Location: REPL[1] define %jl_value_t addrspace(10)* @japi1_f_62200(%jl_value_t addrspace(10)*, %jl_value_t addrspace(10)**, i32) #0 { top: %3 = alloca %jl_value_t addrspace(10)**, align 8 store volatile %jl_value_t addrspace(10)** %1, %jl_value_t addrspace(10)*** %3, align 8 ; Location: REPL[1]:1 %4 = icmp eq i32 %2, 0 br i1 %4, label %fail, label %pass fail: ; preds = %top call void @jl_bounds_error_tuple_int(%jl_value_t addrspace(10)** %1, i64 0, i64 1) unreachable pass: ; preds = %top %5 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)** %1, align 8 ret %jl_value_t addrspace(10)* %5 } julia> @code_llvm g(1) ; Function g ; Location: REPL[2] define i64 @julia_g_62202(i64) #0 { top: %1 = alloca %jl_value_t addrspace(10)*, align 8 %gcframe1 = alloca [3 x %jl_value_t addrspace(10)*], align 8 %gcframe1.sub = getelementptr inbounds [3 x %jl_value_t addrspace(10)*], [3 x %jl_value_t addrspace(10)*]* %gcframe1, i64 0, i64 0 %2 = getelementptr inbounds [3 x %jl_value_t addrspace(10)*], [3 x %jl_value_t addrspace(10)*]* %gcframe1, i64 0, i64 1 %3 = bitcast %jl_value_t addrspace(10)** %2 to i8* call void @llvm.memset.p0i8.i32(i8* %3, i8 0, i32 16, i32 8, i1 false) %4 = call %jl_value_t*** @jl_get_ptls_states() #4 %5 = bitcast [3 x %jl_value_t addrspace(10)*]* %gcframe1 to i64* store i64 2, i64* %5, align 8 %6 = bitcast %jl_value_t*** %4 to i64* %7 = load i64, i64* %6, align 8 %8 = getelementptr [3 x %jl_value_t addrspace(10)*], [3 x %jl_value_t addrspace(10)*]* %gcframe1, i64 0, i64 1 %9 = bitcast %jl_value_t addrspace(10)** %8 to i64* store i64 %7, i64* %9, align 8 %10 = bitcast %jl_value_t*** %4 to %jl_value_t addrspace(10)*** store %jl_value_t addrspace(10)** %gcframe1.sub, %jl_value_t addrspace(10)*** %10, align 8 ; Location: REPL[2]:1 %11 = call %jl_value_t addrspace(10)* @jl_box_int64(i64 signext %0) %12 = getelementptr [3 x %jl_value_t addrspace(10)*], [3 x %jl_value_t addrspace(10)*]* %gcframe1, i64 0, i64 2 store %jl_value_t addrspace(10)* %11, %jl_value_t addrspace(10)** %12, align 8 store %jl_value_t addrspace(10)* %11, %jl_value_t addrspace(10)** %1, align 8 %13 = call %jl_value_t addrspace(10)* @japi1_f_62202(%jl_value_t addrspace(10)* addrspacecast (%jl_value_t* inttoptr (i64 4511449272 to %jl_value_t*) to %jl_value_t addrspace(10)*), %jl_value_t addrspace(10)** nonnull %1, i32 1) %14 = bitcast %jl_value_t addrspace(10)* %13 to i64 addrspace(10)* %15 = load i64, i64 addrspace(10)* %14, align 8 %16 = load i64, i64* %9, align 8 store i64 %16, i64* %6, align 8 ret i64 %15 } ``` After: ``` julia> @code_llvm f(1) define i64 @julia_f_63017(i64) #0 { top: ret i64 %0 } julia> @code_llvm g(1) define i64 @julia_g_63053(i64) #0 { top: %1 = call i64 @julia_f_63054(i64 %0) ret i64 %1 } ``` Part of #5402
Is there anything left to do here? |
Closing since original example is fixed. Can always open more specific examples if they are encounted. |
The code generator currently pessimizes varargs functions in various ways.
For example:
We don't generate a specialized signature for
f
, and its return value is even boxed.We should generate a specialized
f(Int,Int)
(if the function is specialized for 2Int
arguments), orf(Int...)
(using llvm varargs).For cases where we can't generate an especially nice signature (e.g. for
f(Any...)
), we could keep using the julia calling convention for arguments but specialize the return type.The text was updated successfully, but these errors were encountered: