-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cat performance #3645
Comments
Things are better now:
|
If I'm reading it correctly there's still a big gap, e.g, with |
|
|
Well, I can easily bring down the timings of |
@Jutho I'm not sure we necessarily want to use a staged function here, at least not always. Someone might write |
The use of the stagedfunction is to write specialized code depending on the dimensionality of the output array, not specialize on the number of arguments. However, since I have no clue how a |
I believe stagedfunctions are always fully specialized on varargs, or at least it seemed that way when I tested this. |
Well, certainly not beyond 8 arguments, but that might of course change with #8974 . But is that truly different than the amount of specialization on varargs in normal functions? I really have no clue. |
I don't think they get type inference for >8 arguments, but a different function seems to be compiled for any number of varargs, e.g.: julia> stagedfunction f(x...)
:($(length(x)))
end
f (generic function with 1 method)
julia> code_llvm(f, NTuple{100,Int})
define %jl_value_t* @julia_anonymous_134264(%jl_value_t*, %jl_value_t**, i32) {
top:
%3 = alloca [3 x %jl_value_t*], align 8
%.sub = getelementptr inbounds [3 x %jl_value_t*]* %3, i64 0, i64 0
%4 = getelementptr [3 x %jl_value_t*]* %3, i64 0, i64 2, !dbg !603
store %jl_value_t* inttoptr (i64 2 to %jl_value_t*), %jl_value_t** %.sub, align 8
%5 = getelementptr [3 x %jl_value_t*]* %3, i64 0, i64 1, !dbg !603
%6 = load %jl_value_t*** @jl_pgcstack, align 8, !dbg !603
%.c = bitcast %jl_value_t** %6 to %jl_value_t*, !dbg !603
store %jl_value_t* %.c, %jl_value_t** %5, align 8, !dbg !603
store %jl_value_t** %.sub, %jl_value_t*** @jl_pgcstack, align 8, !dbg !603
store %jl_value_t* null, %jl_value_t** %4, align 8, !dbg !603
%7 = load %jl_value_t** %5, align 8, !dbg !603
%8 = getelementptr inbounds %jl_value_t* %7, i64 0, i32 0, !dbg !603
store %jl_value_t** %8, %jl_value_t*** @jl_pgcstack, align 8, !dbg !603
ret %jl_value_t* inttoptr (i64 140626142645440 to %jl_value_t*), !dbg !603
}
julia> unsafe_pointer_to_objref(convert(Ptr{Void}, 140626142645440))
100 (not sure that GC root is necessary, but the code is just returning 100) Ordinarily I think varargs functions are not specialized at all. See #5402, although |
Indeed. |
There seems to be a general slowdown in 0.6 across the board than earlier. The times are minimum, maximum, mean, and median.
|
Comparing with my own reports above from 2015, |
Things appear singificantly improved here. I am not sure, but I am probably using a newer computer. I think we will need better targeted benchmarking if there's anything to do here.
|
) Co-authored-by: Dilum Aluthge <dilum@aluthge.com> Fix Pkg.precompile ext races (#3645)
The
cat
performance compares the performance of various concatenation functions with a simplesetindex
based implementation. The performance benchmark suggests that something is not quite right, since thesetindex
versions are generally much faster:The text was updated successfully, but these errors were encountered: