Concatenation is slower than it could be #21673

jebej · 2017-05-02T18:05:09Z

This is an issue on 0.5.1 and 0.6:

__ this was not an inference issue, the case below was updated to reflect that __

function test1(N::Integer)
    return [1;1:N]
end
function test2(N::Integer)
    A = Vector{Int}(undef, N+1)
    A[1] = 1
    A[2:end] = 1:N
    return A
end
using BenchmarkTools
test1(20) == test2(20)
@benchmark test1(20)
@benchmark test2(20)

julia> test1(20) == test2(20)
true

julia> @benchmark test1(20)
BenchmarkTools.Trial:
  memory estimate:  2.25 KiB
  allocs estimate:  49
  --------------
  minimum time:     19.282 μs (0.00% GC)
  median time:      19.905 μs (0.00% GC)
  mean time:        20.430 μs (0.00% GC)
  maximum time:     98.278 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> @benchmark test2(20)
BenchmarkTools.Trial:
  memory estimate:  256 bytes
  allocs estimate:  1
  --------------
  minimum time:     63.902 ns (0.00% GC)
  median time:      70.512 ns (0.00% GC)
  mean time:        76.032 ns (5.29% GC)
  maximum time:     571.020 ns (72.77% GC)
  --------------
  samples:          10000
  evals/sample:     988

jebej · 2017-05-02T18:20:09Z

This bug is making speye very slow.

TotalVerb · 2017-05-05T23:56:18Z

This doesn't look like an inference problem.

jebej · 2017-05-06T01:40:54Z

Shouldn't the compiler know A is a vector of ints?

TotalVerb · 2017-05-06T01:42:30Z

There's no longer an A by the time inference runs; it's optimized out. All of the intermediate expressions have been inferred.

StefanKarpinski · 2017-05-08T14:14:47Z

To clarify what @TotalVerb is saying – the fact that A has type Any in the type warn output is a red herring (misleading clue) since A doesn't actually appear anywhere in the code, which is the only reason it doesn't get a type annotation – what is the type of a variable that doesn't exist?

I would note that in either function, calling collect is quite unhelpful here – it would be far better to let the values be generated on-demand. The second version seems to manage to optimize away the creation of collect(1:N) entirely. Unfortunately, removing the collect from test1 doesn't fix the performance issue either.

KristofferC · 2017-05-08T14:34:07Z

Ref #21281, #20801

jebej · 2017-05-08T14:42:29Z

I see, thanks for the explanation, I've updated the OP.

jebej · 2017-09-16T21:23:14Z

Any idea what the issue is here? This is still a problem on release 0.6

mbauman · 2019-09-20T22:02:56Z

Somewhat better on 1.4-dev:

julia> @benchmark test1(20)
BenchmarkTools.Trial:
  memory estimate:  928 bytes
  allocs estimate:  17
  --------------
  minimum time:     1.113 μs (0.00% GC)
  median time:      1.408 μs (0.00% GC)
  mean time:        1.464 μs (4.21% GC)
  maximum time:     317.081 μs (99.30% GC)
  --------------
  samples:          10000
  evals/sample:     10

julia> @benchmark test2(20)
BenchmarkTools.Trial:
  memory estimate:  256 bytes
  allocs estimate:  1
  --------------
  minimum time:     49.606 ns (0.00% GC)
  median time:      54.791 ns (0.00% GC)
  mean time:        59.367 ns (4.12% GC)
  maximum time:     676.165 ns (88.80% GC)
  --------------
  samples:          10000
  evals/sample:     987

So we're only ~20x slower than ideal, instead of ~300x. Progress! (I updated the OP with the undef syntax for 1.x).

jebej · 2020-10-04T16:11:09Z

For the example in the original issue, where we are concatenating a range with a number of the same type, the issue could be fixed by changing the signature of the method below to Union{AbstractRange{T},T}.

julia/base/range.jl

Line 1010 in 24c468d

function vcat(rs::AbstractRange{T}...) where T

I can make the PR if that's acceptable, though it would be nice to fix the more general case as well.

The issue seems to be with the general __cat method in abstractarrays.jl which is slow because it needs to handles concatenation of any types in any dimension.

Note that there is a _typed_vcat method in abstract_array.jl that seems like it would work for this case, as long as the assignment a[pos:p1] = Vk was replaced by another for loop (as in the range case):

julia/base/abstractarray.jl

Line 1440 in 1e6d771

function _typed_vcat(::Type{T}, V::AbstractVecOrTuple{AbstractVector}) where T

We would also need to widen the signature of both vcat and typed_vcat in the call chain so that the _typed_vcat method would get called.

PS: Note that this _typed_vcat method appears like it would work just as well for ranges, so why is there a specialization in range.jl? Basically it seems like we could have a single vcat method for any know-length 1D container.

The `cat` pipeline has long had poor inferrability. Together with #39292 and #39294, this should basically put an end to that problem. Together, at least in simple cases these make the performance of `cat` essentially equivalent to the manual version. In other words, the `test1` and `test2` of #21673 benchmark very similarly.

jebej · 2021-01-19T16:06:22Z

There is a regressions on 1.6 compared to 1.5, so hopefully we can backport @timholy's PRs. On 1.5, the slowdown is ~26x, whereas on 1.6 beta1, I get a 74x slowdown:

julia> @benchmark test1(20)
BenchmarkTools.Trial:
  memory estimate:  1.56 KiB
  allocs estimate:  34
  --------------
  minimum time:     2.544 μs (0.00% GC)
  median time:      2.633 μs (0.00% GC)
  mean time:        2.918 μs (1.40% GC)
  maximum time:     146.111 μs (96.37% GC)
  --------------
  samples:          10000
  evals/sample:     9

julia> @benchmark test2(20)
BenchmarkTools.Trial:
  memory estimate:  256 bytes
  allocs estimate:  1
  --------------
  minimum time:     33.133 ns (0.00% GC)
  median time:      35.743 ns (0.00% GC)
  mean time:        41.742 ns (3.74% GC)
  maximum time:     560.241 ns (60.00% GC)
  --------------
  samples:          10000
  evals/sample:     996

The `cat` pipeline has long had poor inferrability. Together with #39292 and #39294, this should basically put an end to that problem. Together, at least in simple cases these make the performance of `cat` essentially equivalent to the manual version. In other words, the `test1` and `test2` of #21673 benchmark very similarly. (cherry picked from commit 78d55e2)

The `cat` pipeline has long had poor inferrability. Together with JuliaLang#39292 and JuliaLang#39294, this should basically put an end to that problem. Together, at least in simple cases these make the performance of `cat` essentially equivalent to the manual version. In other words, the `test1` and `test2` of JuliaLang#21673 benchmark very similarly.

The `cat` pipeline has long had poor inferrability. Together with #39292 and #39294, this should basically put an end to that problem. Together, at least in simple cases these make the performance of `cat` essentially equivalent to the manual version. In other words, the `test1` and `test2` of #21673 benchmark very similarly. (cherry picked from commit 78d55e2)

ararslan added arrays [a, r, r, a, y, s] compiler:inference Type inference labels May 2, 2017

jebej changed the title ~~Inference issue with array concatenation~~ Concatenation issue May 8, 2017

mbauman changed the title ~~Concatenation issue~~ Concatenation is slower than it could be Sep 20, 2019

mbauman added performance Must go faster and removed compiler:inference Type inference labels Sep 20, 2019

jebej mentioned this issue Jan 18, 2021

Improve inferability of shape::Dims for cat #39294

Merged

timholy mentioned this issue Jan 19, 2021

Use lispy tuples in cat (fixes #21673) #39314

Merged

timholy closed this as completed in 78d55e2 Jan 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Concatenation is slower than it could be #21673

Concatenation is slower than it could be #21673

jebej commented May 2, 2017 •

edited by mbauman

Loading

jebej commented May 2, 2017

TotalVerb commented May 5, 2017

jebej commented May 6, 2017

TotalVerb commented May 6, 2017

StefanKarpinski commented May 8, 2017

KristofferC commented May 8, 2017

jebej commented May 8, 2017

jebej commented Sep 16, 2017

mbauman commented Sep 20, 2019

jebej commented Oct 4, 2020

jebej commented Jan 19, 2021

Concatenation is slower than it could be #21673

Concatenation is slower than it could be #21673

Comments

jebej commented May 2, 2017 • edited by mbauman Loading

jebej commented May 2, 2017

TotalVerb commented May 5, 2017

jebej commented May 6, 2017

TotalVerb commented May 6, 2017

StefanKarpinski commented May 8, 2017

KristofferC commented May 8, 2017

jebej commented May 8, 2017

jebej commented Sep 16, 2017

mbauman commented Sep 20, 2019

jebej commented Oct 4, 2020

jebej commented Jan 19, 2021

jebej commented May 2, 2017 •

edited by mbauman

Loading