cat performance #3645

ViralBShah · 2013-07-07T17:45:58Z

The cat performance compares the performance of various concatenation functions with a simple setindex based implementation. The performance benchmark suggests that something is not quite right, since the setindex versions are generally much faster:

small_hvcat          57.877
small_hvcat_setind   55.922
large_hvcat          11.106
large_hvcat_setind    4.726
small_hcat           21.654
small_hcat_setind    35.115
large_hcat            4.293
large_hcat_setind     4.306
small_vcat           64.364
small_vcat_setind    56.346
large_vcat            4.761
large_vcat_setind     4.710
small_catnd         581.283
small_catnd_setind  143.262
large_catnd           7.422
large_catnd_setind    4.897

The text was updated successfully, but these errors were encountered:

ViralBShah · 2015-02-01T13:13:11Z

Things are better now:

julia,hvcat_small,5.841653,28.443581,7.240933,1.619755
julia,hvcat_large,5.855422,12.409402,8.069710,1.251207
julia,hvcat_setind_small,4.436304,8.289098,5.479964,0.786027
julia,hvcat_setind_large,5.882567,11.567854,8.076142,1.444297
julia,hcat_small,3.399357,6.442293,4.299652,0.643592
julia,hcat_large,4.687004,10.086189,7.417788,1.377123
julia,hcat_setind_small,4.385627,7.691682,5.325854,0.777748
julia,hcat_setind_large,5.387160,11.101184,7.654127,1.433939
julia,vcat_small,18.599786,27.051015,22.315713,1.944881
julia,vcat_large,5.777338,11.221460,8.061492,1.234265
julia,vcat_setind_small,4.661357,8.321572,5.696195,0.790171
julia,vcat_setind_large,5.738162,12.823408,8.011272,1.470064
julia,catnd_small,182.341470,196.423850,188.391843,4.648532
julia,catnd_large,11.016256,18.077194,14.226727,1.661805
julia,catnd_setind_small,30.878567,40.430500,35.517463,2.441230
julia,catnd_setind_large,5.563092,10.719209,8.023162,1.131493

ViralBShah · 2015-02-01T13:14:00Z

Cc: @timholy @Jutho

timholy · 2015-02-01T16:17:43Z

If I'm reading it correctly there's still a big gap, e.g, with vcat_small. Have you profiled it?

Jutho · 2015-02-01T17:49:10Z

catnd also seems pretty bad. Where can I find this benchmark code?

jiahao · 2015-02-01T19:26:16Z

julia/test/perf/cat/perf.jl

Jutho · 2015-02-01T22:09:30Z

Well, I can easily bring down the timings of catnd_small by a factor 3 and make those of catnd_large approximately equal to the setind version by replacing the internal cat_t function with a mutating cat! function and writing the latter as a stagedfunction.

simonster · 2015-02-01T22:56:53Z

@Jutho I'm not sure we necessarily want to use a staged function here, at least not always. Someone might write cat(3, X...) where length(X) is very large (and possibly also varies across iterations of a loop, etc.). In that case taking a hit at runtime is probably better than generating specialized code.

Jutho · 2015-02-01T23:00:29Z

The use of the stagedfunction is to write specialized code depending on the dimensionality of the output array, not specialize on the number of arguments. However, since I have no clue how a stagedfunction handles a varargs argument, I have no clue about the effect of this...

simonster · 2015-02-01T23:09:47Z

I believe stagedfunctions are always fully specialized on varargs, or at least it seemed that way when I tested this.

Jutho · 2015-02-01T23:13:44Z

Well, certainly not beyond 8 arguments, but that might of course change with #8974 . But is that truly different than the amount of specialization on varargs in normal functions? I really have no clue.

simonster · 2015-02-01T23:25:02Z

I don't think they get type inference for >8 arguments, but a different function seems to be compiled for any number of varargs, e.g.:

julia> stagedfunction f(x...)
       :($(length(x)))
       end
f (generic function with 1 method)

julia> code_llvm(f, NTuple{100,Int})

define %jl_value_t* @julia_anonymous_134264(%jl_value_t*, %jl_value_t**, i32) {                                                               
top:                                                                                                                                          
  %3 = alloca [3 x %jl_value_t*], align 8                                                                                                     
  %.sub = getelementptr inbounds [3 x %jl_value_t*]* %3, i64 0, i64 0                                                                         
  %4 = getelementptr [3 x %jl_value_t*]* %3, i64 0, i64 2, !dbg !603                                                                          
  store %jl_value_t* inttoptr (i64 2 to %jl_value_t*), %jl_value_t** %.sub, align 8                                                           
  %5 = getelementptr [3 x %jl_value_t*]* %3, i64 0, i64 1, !dbg !603                                                                          
  %6 = load %jl_value_t*** @jl_pgcstack, align 8, !dbg !603                                                                                   
  %.c = bitcast %jl_value_t** %6 to %jl_value_t*, !dbg !603                                                                                   
  store %jl_value_t* %.c, %jl_value_t** %5, align 8, !dbg !603                                                                                
  store %jl_value_t** %.sub, %jl_value_t*** @jl_pgcstack, align 8, !dbg !603                                                                  
  store %jl_value_t* null, %jl_value_t** %4, align 8, !dbg !603                                                                               
  %7 = load %jl_value_t** %5, align 8, !dbg !603                                                                                              
  %8 = getelementptr inbounds %jl_value_t* %7, i64 0, i32 0, !dbg !603                                                                        
  store %jl_value_t** %8, %jl_value_t*** @jl_pgcstack, align 8, !dbg !603                                                                     
  ret %jl_value_t* inttoptr (i64 140626142645440 to %jl_value_t*), !dbg !603                                                                  
}

julia> unsafe_pointer_to_objref(convert(Ptr{Void}, 140626142645440))
100

(not sure that GC root is necessary, but the code is just returning 100)

Ordinarily I think varargs functions are not specialized at all. See #5402, although f there is now inlined so you need to put a dummy loop to see the suboptimal calling convention in code_llvm.

vtjnash · 2015-08-24T20:59:03Z

that PR eventually became #10338 before becoming subsumed by #7128. i'm going to assume that needs to be resolved first and remove the target milestone from this in the meantime.

Jutho · 2015-08-24T21:04:01Z

Indeed.

ViralBShah · 2017-07-17T04:31:01Z

There seems to be a general slowdown in 0.6 across the board than earlier. The times are minimum, maximum, mean, and median.

julia,hvcat_small,7.336272,83.110470,9.151902,8.535927
julia,hvcat_large,6.195210,75.650329,9.320567,4.827748
julia,hvcat_setind_small,6.520121,11.430050,7.255832,0.744990
julia,hvcat_setind_large,6.038093,78.406094,9.554000,5.253685
julia,hcat_small,10.091909,22.847853,12.177417,2.105768
julia,hcat_large,5.449313,82.616243,9.762316,5.675814
julia,hcat_setind_small,5.350112,14.789836,7.732790,1.713924
julia,hcat_setind_large,5.640484,102.515004,9.175704,6.632492
julia,vcat_small,18.460564,24.171685,21.647538,1.211606
julia,vcat_large,6.181153,72.471150,9.499165,4.639837
julia,vcat_setind_small,5.896819,9.408016,7.532382,0.676082
julia,vcat_setind_large,6.327230,73.536199,9.397235,4.706905
julia,catnd_small,306.599528,318.999051,314.978989,5.774449
julia,catnd_large,8.290890,93.080390,12.043643,6.528367
julia,catnd_setind_small,34.817327,41.600884,37.610669,1.792527
julia,catnd_setind_large,5.847496,73.403590,9.057344,4.637332

ViralBShah · 2017-07-17T04:34:15Z

Comparing with my own reports above from 2015, catnd_small is twice as slow as before.

ViralBShah · 2020-05-25T19:18:10Z

Things appear singificantly improved here. I am not sure, but I am probably using a newer computer. I think we will need better targeted benchmarking if there's anything to do here.

julia,hvcat_small,4.396699,16.297507,6.608667,2.066157
julia,hvcat_large,3.235825,45.255710,4.303054,2.729414
julia,hvcat_setind_small,3.508096,7.443855,3.734079,0.333990
julia,hvcat_setind_large,2.954424,45.062366,4.296051,2.754234
julia,hcat_small,1.585352,4.087416,1.865025,0.304821
julia,hcat_large,2.568260,44.529903,3.658572,2.554980
julia,hcat_setind_small,3.553062,5.410028,3.776157,0.243217
julia,hcat_setind_large,2.678956,48.617825,4.238269,2.884385
julia,vcat_small,3.770044,6.230312,4.076772,0.331431
julia,vcat_large,3.296056,45.546644,4.428436,2.800709
julia,vcat_setind_small,3.533616,5.267436,3.708259,0.175622
julia,vcat_setind_large,2.655591,44.732889,4.285145,2.764478
julia,catnd_small,169.010907,178.360217,170.545414,2.971016
julia,catnd_large,5.419363,48.499889,6.802585,3.402186
julia,catnd_setind_small,13.439711,16.501161,14.084918,0.389261
julia,catnd_setind_large,3.996800,46.777139,5.237512,2.980763

) Co-authored-by: Dilum Aluthge <dilum@aluthge.com> Fix Pkg.precompile ext races (#3645)

ViralBShah added this to the 0.4 milestone Apr 27, 2014

Jutho mentioned this issue Feb 3, 2015

RFC: more efficient cat #10037

Closed

vtjnash removed this from the 0.4.x milestone Aug 24, 2015

ViralBShah closed this as completed May 25, 2020

IanButterworth pushed a commit that referenced this issue Oct 11, 2023

🤖 [release-1.10] Bump the Pkg stdlib from 484bc3ec0 to 9261a54d3 (#51664

45461be

) Co-authored-by: Dilum Aluthge <dilum@aluthge.com> Fix Pkg.precompile ext races (#3645)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cat performance #3645

cat performance #3645

ViralBShah commented Jul 7, 2013

ViralBShah commented Feb 1, 2015

ViralBShah commented Feb 1, 2015

timholy commented Feb 1, 2015

Jutho commented Feb 1, 2015

jiahao commented Feb 1, 2015

Jutho commented Feb 1, 2015

simonster commented Feb 1, 2015

Jutho commented Feb 1, 2015

simonster commented Feb 1, 2015

Jutho commented Feb 1, 2015

simonster commented Feb 1, 2015

vtjnash commented Aug 24, 2015

Jutho commented Aug 24, 2015

ViralBShah commented Jul 17, 2017 •

edited

Loading

ViralBShah commented Jul 17, 2017

ViralBShah commented May 25, 2020

cat performance #3645

cat performance #3645

Comments

ViralBShah commented Jul 7, 2013

ViralBShah commented Feb 1, 2015

ViralBShah commented Feb 1, 2015

timholy commented Feb 1, 2015

Jutho commented Feb 1, 2015

jiahao commented Feb 1, 2015

Jutho commented Feb 1, 2015

simonster commented Feb 1, 2015

Jutho commented Feb 1, 2015

simonster commented Feb 1, 2015

Jutho commented Feb 1, 2015

simonster commented Feb 1, 2015

vtjnash commented Aug 24, 2015

Jutho commented Aug 24, 2015

ViralBShah commented Jul 17, 2017 • edited Loading

ViralBShah commented Jul 17, 2017

ViralBShah commented May 25, 2020

ViralBShah commented Jul 17, 2017 •

edited

Loading