`cat` overflows on >= 10000 inputs #50473

GiggleLiu · 2023-07-08T14:14:54Z

I want to concatenate multiple arrays. The hcat function works well when the number of input arguments is large, however for cat, it errors on 10000 input argument. The error message as well as a MWE is as bellow:

julia> hcat([[1,2] for i=1:10000]...)
2×10000 Matrix{Int64}:
 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  …  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
 2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2     2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2

julia> cat([[1,2] for i=1:10000]...; dims=2)
ERROR: StackOverflowError:
Stacktrace:
 [1] __cat_offset!(::Matrix{Int64}, ::Tuple{Int64, Int64}, ::Tuple{Bool, Bool}, ::Tuple{Int64, Int64}, ::Vector{Int64}, ::Vector{Int64}, ::Vararg{Vector{Int64}}) (repeats 9787 times)
   @ Base ./abstractarray.jl:1802
 [2] __cat(::Matrix{Int64}, ::Tuple{Int64, Int64}, ::Tuple{Bool, Bool}, ::Vector{Int64}, ::Vararg{Vector{Int64}})
   @ Base ./abstractarray.jl:1797
 [3] _cat_t(::Int64, ::Type{Int64}, ::Vector{Int64}, ::Vararg{Vector{Int64}})
   @ Base ./abstractarray.jl:1790
 [4] _cat(::Int64, ::Vector{Int64}, ::Vararg{Vector{Int64}})
   @ LinearAlgebra ~/.julia/juliaup/julia-1.9.2+0.x64.linux.gnu/share/julia/stdlib/v1.9/LinearAlgebra/src/special.jl:415
 [5] cat(::Vector{Int64}, ::Vararg{Vector{Int64}}; dims::Int64)
   @ Base ./abstractarray.jl:1979
 [6] top-level scope
   @ REPL[13]:1

julia> versioninfo()
Julia Version 1.9.2
Commit e4ee485e909 (2023-07-05 09:39 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 32 × Intel(R) Xeon(R) Gold 6226R CPU @ 2.90GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, cascadelake)
  Threads: 1 on 32 virtual cores
Environment:
  JULIA_PKG_SERVER = http://cn-southeast.pkg.juliacn.com/

The text was updated successfully, but these errors were encountered:

Seelengrab · 2023-07-08T22:12:45Z

Splatting arbitrarily large collections is bound to overflow at some point.

jakobnissen · 2023-07-09T10:15:27Z

Having it stack overflow is a pretty bad error message for something that's essentially a compiler/language limitation.
Could we instead manually throw an error when a function is being called with e.g. more than 200 arguments?

Is there an issue tracking this problem? It's a fairly common newbie problem in Julia, and it requires specialized knowledge of the language internals to understand why it shouldn't just work.

Seelengrab · 2023-07-09T10:31:09Z

It's pretty difficult to detect this kind of thing (semi) accurately, as it's not the splatting or slurping that fails here, but the recursion over that and detecting "this will throw before it finishes due to ressource constraints" is a highly irregular and hard to estimate heuristic due to being machine, ressource use AND function specific.

If you want to capture the failure mode during dispatch of the method, you're going to run into much the same issue in regards to estimating how much you're going to need - and it can't capture the allocation of that tuple either, because that needs to happen first.

jakobnissen · 2023-07-09T10:37:18Z

Yes, but practically speaking, the way Julians handle this is to internalise the rule "never splat a large number of elements", so we might as well make that emit a warning.

Seelengrab · 2023-07-09T11:11:14Z

Where do we emit that warning though? Just attach it to any method taking more than N arguments..? I imagine that's not super simple to do, as most slurping functions end up taking Vararg (they don't generally know the number of splatter arguments after all), and unconditionally attaching IO to any slurping method seems bad.

jakobnissen · 2023-07-09T11:20:22Z

I imagine it could be inserted at the language level, since it's a language level limitation, and be compiled away for all function calls where the number of arguments is known at compile time

Seelengrab · 2023-07-09T17:20:15Z

That's the same as attaching IO effects to any Vararg method though. If you only want to attach them conditionally depending on the length, you'll have to know how many arguments are in the collection at compile time, which won't in general be possible for the array case shown in the OP. The array case relies on constant propagation.

GiggleLiu · 2023-07-10T02:15:16Z

For funcitons with unknown number of arguments, recursion should be avoided. Such type of error in cat is very unpredictable!

By checking the code, there is another possible way to improve.

function __cat_offset!(A, shape, catdims, offsets, x, X...)
    # splitting the "work" on x from X... may reduce latency (fewer costly specializations)
    newoffsets = __cat_offset1!(A, shape, catdims, offsets, x)
    return __cat_offset!(A, shape, catdims, newoffsets, X...)
end

Is it possible to optimize out tail recursion with for loops using Julia compiler?

KristofferC · 2023-07-10T09:03:58Z

Dup of #30796

GiggleLiu · 2023-07-10T09:38:04Z

Shouldn't we at least improve the implementation of cat before closing both issues?
Implementing cat with recursion does not make sense, we should at least try replacing it with for loops.

KristofferC · 2023-07-10T10:44:06Z

Why would you not use reduce(hcat, [[1,2] for i=1:10000]) in situations like this?

Cf the doctstring for hcat:

For a large vector of arrays, reduce(hcat, A) calls an efficient method when A isa AbstractVector{<:AbstractVecOrMat}. For a vector of vectors, this can also be written stack(A).

GiggleLiu · 2023-07-10T14:21:24Z

Why would you not use reduce(hcat, [[1,2] for i=1:10000]) in situations like this?

I have a program that plays with high order tensors. Given a sample set with with 10000 configurations, for each configuration, I need to slice some tensors along certain dimensions (which is not possible to know beforehand). With the obtained 10000 sliced tensors, I need to concatenate them into a large tensor to feed them into the batched matrix multiplication routines.

Maybe the following code explains better. This is the current tensor slicing program, which is not needed if cat work for concatenating 10000 slices.

# `x` is the tensor,
# `slicedim` is the dimensions to be sliced
# `configs` is a matrix of size `(nsubindices, nsample)`, each column specifies a subindex for slicing.
function get_slice(x::AbstractArray{T}, slicedim, configs::AbstractMatrix) where T
    outdim = setdiff(1:ndims(x), slicedim)
    res = similar(x, [size(x, d) for d in outdim]..., size(configs, 2))
    return get_slice!(res, x, outdim, slicedim, configs)
end

function get_slice!(res, x::AbstractArray{T}, outdim, slicedim, configs::AbstractMatrix) where T
    xstrides = strides(x)
    @inbounds for ci in CartesianIndices(res)
        idx = 1
        # the output dimensions
        for (dim, k) in zip(outdim, ci.I)
            idx += (k-1) * xstrides[dim]
        end
        # the batch dimension
        batchidx = ci.I[end]
        for (dim, k) in zip(slicedim, view(configs, :, batchidx))
            idx += k * xstrides[dim]
        end
        res[ci] = x[idx]
    end
    return res
end

mikmoore · 2023-07-11T15:33:04Z

For a bit now I've been advocating for concat or concatenate or <bikeshed> that generalizes cat to collections like maximum generalizes max. We recently got stack, but that can only be used while inserting dimensions.

I really dislike the reduce(hcat,itr) specialization. First, it's not discoverable as intuitively it's the worst possible way to perform concatenation. reduce does pairwise reduction for every predicate except for hcat/vcat/merge on very specific input types (see methods(reduce)). The only reason to think it's anything other than a horrible idea is to know that somebody explicitly specialized that particular method. Second, it's fragile in that it's extremely easy to miss. For example,

julia> using BenchmarkTools

julia> x = [zeros(2,2) for _ in 1:100];

julia> @btime reduce(hcat, $x);
  929.167 ns (1 allocation: 3.25 KiB)

julia> @btime mapreduce(vec, hcat, $x);
  20.900 μs (299 allocations: 177.42 KiB)

julia> @btime reduce(hcat, Iterators.map(vec,$x));
  20.400 μs (299 allocations: 177.42 KiB)


julia> t = ntuple(_->zeros(2,2),100);

julia> @btime reduce(hcat, $t);
  18.900 μs (100 allocations: 170.41 KiB)

julia> @btime hcat($t...); # now hcat is better
  4.300 μs (4 allocations: 5.64 KiB)

Predicate specializations only help for the cases we can imagine beforehand. They're performance minefields.

GiggleLiu · 2023-07-11T16:34:47Z

After checking the implementation of cat, I noticed that this function is designed to be generic and low overhead. Then I agree @mikmoore 's proposal of implementing a new function concatenate makes more sense. I also agree with the point that reduce(hcat, x) is not intuitive. When @KristofferC proposed using this one, I was confused because by definition, it should allocate a lot of times. But, very unexpectedly, it is not slow. If it is not from @KristofferC 's comment, I would not even try it out.

StefanKarpinski · 2023-07-24T20:07:50Z

How about we just cap the number of function arguments that are allowed at 1k? Anyone calling a function with that many arguments is doing something bad and should be chastised. That would allow us to give a better error message instead of crashing.

oscardssmith · 2023-07-25T02:26:16Z

This would cause problems.

KristofferC closed this as completed Jul 10, 2023

jakobnissen mentioned this issue Aug 23, 2023

using SparseArrays makes vcat'ing adjoint vectors memory hog, regression in Julia 1.10 beta #51011

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`cat` overflows on >= 10000 inputs #50473

`cat` overflows on >= 10000 inputs #50473

GiggleLiu commented Jul 8, 2023 •

edited

Loading

Seelengrab commented Jul 8, 2023

jakobnissen commented Jul 9, 2023

Seelengrab commented Jul 9, 2023

jakobnissen commented Jul 9, 2023

Seelengrab commented Jul 9, 2023 •

edited

Loading

jakobnissen commented Jul 9, 2023

Seelengrab commented Jul 9, 2023

GiggleLiu commented Jul 10, 2023 •

edited

Loading

KristofferC commented Jul 10, 2023

GiggleLiu commented Jul 10, 2023

KristofferC commented Jul 10, 2023 •

edited

Loading

GiggleLiu commented Jul 10, 2023 •

edited

Loading

mikmoore commented Jul 11, 2023 •

edited

Loading

GiggleLiu commented Jul 11, 2023 •

edited

Loading

StefanKarpinski commented Jul 24, 2023

oscardssmith commented Jul 25, 2023

cat overflows on >= 10000 inputs #50473

cat overflows on >= 10000 inputs #50473

Comments

GiggleLiu commented Jul 8, 2023 • edited Loading

Seelengrab commented Jul 8, 2023

jakobnissen commented Jul 9, 2023

Seelengrab commented Jul 9, 2023

jakobnissen commented Jul 9, 2023

Seelengrab commented Jul 9, 2023 • edited Loading

jakobnissen commented Jul 9, 2023

Seelengrab commented Jul 9, 2023

GiggleLiu commented Jul 10, 2023 • edited Loading

KristofferC commented Jul 10, 2023

GiggleLiu commented Jul 10, 2023

KristofferC commented Jul 10, 2023 • edited Loading

GiggleLiu commented Jul 10, 2023 • edited Loading

mikmoore commented Jul 11, 2023 • edited Loading

GiggleLiu commented Jul 11, 2023 • edited Loading

StefanKarpinski commented Jul 24, 2023

oscardssmith commented Jul 25, 2023

`cat` overflows on >= 10000 inputs #50473

`cat` overflows on >= 10000 inputs #50473

GiggleLiu commented Jul 8, 2023 •

edited

Loading

Seelengrab commented Jul 9, 2023 •

edited

Loading

GiggleLiu commented Jul 10, 2023 •

edited

Loading

KristofferC commented Jul 10, 2023 •

edited

Loading

GiggleLiu commented Jul 10, 2023 •

edited

Loading

mikmoore commented Jul 11, 2023 •

edited

Loading

GiggleLiu commented Jul 11, 2023 •

edited

Loading