Make `f(args...)` as efficient as `f(args)` #11248

yuyichao · 2015-05-12T19:06:09Z

It seems that for both f(args...) and f(args) (called with a tuple), the function is specialized to take a tuple as argument. However, in the first case, the argument is always boxed into a jl_value_t* before passing in while for the second case (at least after) the tuple type change, no boxing/gc frame is necessary.

From the output from the example code below, it seems that the function is already specialized on the types of the arguments although they are somehow still using the generic jl_value_t* type as the type for both input and output.

Example code

@noinline f1(args...) = args
@noinline f2(args) = args

g1() = f1(1, 2, 3)
g2() = f2((1, 2, 3))

function time_func(f::Function, args...)
    println(f)
    f(args...)
    gc()
    @time for i in 1:1000000
        f(args...)
    end
    gc()
end

@code_llvm g1()
@code_llvm f1(1, 2, 3)

@code_llvm g2()
@code_llvm f2((1, 2, 3))

time_func(g1)
time_func(g2)

Output

define [3 x i64] @julia_g1_44267() {
top:
  %0 = alloca [5 x %jl_value_t*], align 8
  %.sub = getelementptr inbounds [5 x %jl_value_t*]* %0, i64 0, i64 0
  %1 = getelementptr [5 x %jl_value_t*]* %0, i64 0, i64 2
  %2 = bitcast [5 x %jl_value_t*]* %0 to i64*
  store i64 6, i64* %2, align 8
  %3 = getelementptr [5 x %jl_value_t*]* %0, i64 0, i64 1
  %4 = bitcast %jl_value_t** %3 to %jl_value_t***
  %5 = load %jl_value_t*** @jl_pgcstack, align 8
  store %jl_value_t** %5, %jl_value_t*** %4, align 8
  store %jl_value_t** %.sub, %jl_value_t*** @jl_pgcstack, align 8
  %6 = getelementptr [5 x %jl_value_t*]* %0, i64 0, i64 3
  %7 = getelementptr [5 x %jl_value_t*]* %0, i64 0, i64 4
  store %jl_value_t* inttoptr (i64 139909651832960 to %jl_value_t*), %jl_value_t** %1, align 8
  store %jl_value_t* inttoptr (i64 139909651833008 to %jl_value_t*), %jl_value_t** %6, align 8
  store %jl_value_t* inttoptr (i64 139909651833056 to %jl_value_t*), %jl_value_t** %7, align 8
  %8 = call %jl_value_t* @julia_f1_44268(%jl_value_t* inttoptr (i64 139909684716976 to %jl_value_t*), %jl_value_t** %1, i32 3)
  %9 = bitcast %jl_value_t* %8 to [3 x i64]*
  %10 = load [3 x i64]* %9, align 8
  %11 = load %jl_value_t*** %4, align 8
  store %jl_value_t** %11, %jl_value_t*** @jl_pgcstack, align 8
  ret [3 x i64] %10
}

define %jl_value_t* @julia_f1_44268(%jl_value_t*, %jl_value_t**, i32) {
top:
  %3 = alloca [3 x %jl_value_t*], align 8
  %.sub = getelementptr inbounds [3 x %jl_value_t*]* %3, i64 0, i64 0
  %4 = getelementptr [3 x %jl_value_t*]* %3, i64 0, i64 2
  %5 = bitcast [3 x %jl_value_t*]* %3 to i64*
  store i64 2, i64* %5, align 8
  %6 = getelementptr [3 x %jl_value_t*]* %3, i64 0, i64 1
  %7 = bitcast %jl_value_t** %6 to %jl_value_t***
  %8 = load %jl_value_t*** @jl_pgcstack, align 8
  store %jl_value_t** %8, %jl_value_t*** %7, align 8
  store %jl_value_t** %.sub, %jl_value_t*** @jl_pgcstack, align 8
  store %jl_value_t* null, %jl_value_t** %4, align 8
  %9 = call %jl_value_t* @jl_f_tuple(%jl_value_t* null, %jl_value_t** %1, i32 %2)
  store %jl_value_t* %9, %jl_value_t** %4, align 8
  %10 = load %jl_value_t*** %7, align 8
  store %jl_value_t** %10, %jl_value_t*** @jl_pgcstack, align 8
  ret %jl_value_t* %9
}

define [3 x i64] @julia_g2_44275() {
top:
  %0 = call [3 x i64] @julia_f2_44276([3 x i64] [i64 1, i64 2, i64 3])
  ret [3 x i64] %0
}

define [3 x i64] @julia_f2_44276([3 x i64]) {
top:
  ret [3 x i64] %0
}
g1
elapsed time: 0.193401883 seconds (61 MB allocated, 14.06% gc time in 2 pauses with 0 full sweep)
g2
elapsed time: 0.025389954 seconds (30 MB allocated, 1.64% gc time in 1 pauses with 0 full sweep)

(Related to #11244)

The text was updated successfully, but these errors were encountered:

yuyichao · 2015-05-12T19:08:14Z

Also, if f1 is called with f1(1, 2, 3) or f2("1", "2", "3"), two identical function are generated (I suppose this should not be the case if sth non-trivial is done in the function...)

vtjnash · 2015-05-12T22:32:31Z

jeff mentioned to me recently that it is now possible, following the recent tuple change, to enable this.

the general tradeoff is that you don't want to make a new optimized function for every possible argument list that you encounter, since the codegen time would end up dominating the runtime of that function. but there is probably a happy medium where args... gets specialized to have at least one direct argument and at least 3 function arguments.

yuyichao · 2015-05-12T22:38:21Z

@vtjnash I agree there's probably a tradeoff between efficiency of generated code and how much to generate.

As mentioned above, the point of this issue though, is that the methods are already specialized. If you call @code_llvm on f1 with different types, you would notice that it actually generate different functions for each time (although as I said, in this trivial case they are identical in content).

IMHO, what needs to be changed is probably the convention for calling vararg functions that can be specialized.

yuyichao · 2015-05-12T22:40:50Z

And also I'm not talking about making f(args...) the same with defining f(arg1) f(arg1, arg2) f(arg1, arg2, arg3) ... (which would also be nice BTW). I'm only talking about passing the packed argument tuple in a more specialized form to the method that is already specialized.

vtjnash · 2015-05-12T22:48:33Z

ah, good call:

julia> @code_typed f1(1, 2, 3)
1-element Array{Any,1}:
 :($(Expr(:lambda, Any[:(args::Any...)], Any[Any[],Any[Any[:args,Tuple{Int64,Int64,Int64},0]],Any[],Any[]], :(begin 
        $(Expr(:meta, :noinline)) # none, line 1:
        return args::Tuple{Int64,Int64,Int64}
    end::Tuple{Int64,Int64,Int64}))))

if you want to take a shot at this, take a look at to_function in codegen.cpp, specifically the va and specsig variables, and add code to handle the case where the vararg is a bitstype.

yuyichao · 2015-05-12T22:52:24Z

@vtjnash Thanks for the pointer. Probably not now but I'll keep it on my list and will probably try it sometime unless someone else implement it (or sth better) before I get to it (e.g. after a thesis defence).

timholy · 2015-05-13T01:11:13Z

This would certainly be a big deal. Currently a good number of @generated functions in Base are that way only to avoid the splatting penalty.

yuyichao · 2015-05-13T01:26:14Z

@timholy Didn't know about that. Do you have an example of it? Not that I think it should be the long term solution but I'm interested to see what's the current work around (and that I have thought about using @generated function to work around it a little but but didn't figure out how)

timholy · 2015-05-13T09:40:32Z

#10337 has extensive discussion. See also

julia/base/multidimensional.jl

Lines 283 to 292 in 08f4aa0

    
           # The @generated function here is just to work around the performance hit 
        
           # of splatting 
        
           @generated function getindex(A::Array, I::Union(Real,AbstractVector)...) 
        
               N = length(I) 
        
               Isplat = Expr[:(I[$d]) for d = 1:N] 
        
               quote 
        
                   checkbounds(A, $(Isplat...)) 
        
                   _getindex(A, to_index($(Isplat...))) 
        
               end 
        
           end

(as one of quite a few examples).

JeffBezanson · 2015-05-25T04:49:21Z

dup of #5402

yuyichao · 2015-08-16T05:29:16Z

Close as dup of #5402.

ihnorton added performance Must go faster compiler:codegen Generation of LLVM IR and native code labels May 13, 2015

yuyichao mentioned this issue May 15, 2015

Throw exceptions with arguments without creating a GC frame #11284

Closed

This was referenced Jun 5, 2015

Don't overload * for linearindexing type computations #11579

Merged

Indexing performance #11595

Closed

yuyichao mentioned this issue Aug 16, 2015

varargs generated functions generate inefficient code #12643

Closed

yuyichao closed this as completed Aug 16, 2015

JeffBezanson mentioned this issue Apr 17, 2016

NTuples made me sad (so I nixed them) #11242

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make `f(args...)` as efficient as `f(args)` #11248

Make `f(args...)` as efficient as `f(args)` #11248

yuyichao commented May 12, 2015

yuyichao commented May 12, 2015

vtjnash commented May 12, 2015

yuyichao commented May 12, 2015

yuyichao commented May 12, 2015

vtjnash commented May 12, 2015

yuyichao commented May 12, 2015

timholy commented May 13, 2015

yuyichao commented May 13, 2015

timholy commented May 13, 2015

JeffBezanson commented May 25, 2015

yuyichao commented Aug 16, 2015

Make f(args...) as efficient as f(args) #11248

Make f(args...) as efficient as f(args) #11248

Comments

yuyichao commented May 12, 2015

yuyichao commented May 12, 2015

vtjnash commented May 12, 2015

yuyichao commented May 12, 2015

yuyichao commented May 12, 2015

vtjnash commented May 12, 2015

yuyichao commented May 12, 2015

timholy commented May 13, 2015

yuyichao commented May 13, 2015

timholy commented May 13, 2015

JeffBezanson commented May 25, 2015

yuyichao commented Aug 16, 2015

Make `f(args...)` as efficient as `f(args)` #11248

Make `f(args...)` as efficient as `f(args)` #11248