Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

using SparseArrays makes vcat'ing adjoint vectors memory hog, regression in Julia 1.10 beta #51011

Closed
jaakkor2 opened this issue Aug 23, 2023 · 8 comments · Fixed by #51135 or JuliaSparse/SparseArrays.jl#432
Assignees
Labels
bug Indicates an unexpected problem or unintended behavior performance Must go faster regression Regression in behavior compared to a previous version
Milestone

Comments

@jaakkor2
Copy link
Contributor

jaakkor2 commented Aug 23, 2023

using SparseArrays
a=[rand(2) for i=1:30000];
vcat(a'...)

EDIT: using SparseArrays that gets loaded by using Statistics

gives

ERROR: StackOverflowError:
Stacktrace:
 [1] _cat_size_shape(::Tuple{…}, ::Tuple{…}, ::LinearAlgebra.Adjoint{…}, ::LinearAlgebra.Adjoint{…}, ::Vararg{…}) (repeats 15353 times)
   @ Base .\abstractarray.jl:1748
 [2] cat_size_shape(::Tuple{…}, ::LinearAlgebra.Adjoint{…}, ::LinearAlgebra.Adjoint{…}, ::Vararg{…})
   @ Base .\abstractarray.jl:1746
 [3] _cat_t(::Val{1}, ::Type{Float64}, ::LinearAlgebra.Adjoint{Float64, Vector{Float64}}, ::Vararg{LinearAlgebra.Adjoint{Float64, Vector{…}}})
   @ Base .\abstractarray.jl:1787
 [4] _cat(::Val{1}, ::LinearAlgebra.Adjoint{Float64, Vector{Float64}}, ::Vararg{LinearAlgebra.Adjoint{Float64, Vector{Float64}}})
   @ SparseArrays C:\Users\jaakkor2\.julia\juliaup\julia-1.10.0-beta2+0.x64.w64.mingw32\share\julia\stdlib\v1.10\SparseArrays\src\sparsevector.jl:1221
 [5] cat(::LinearAlgebra.Adjoint{Float64, Vector{Float64}}, ::Vararg{LinearAlgebra.Adjoint{Float64, Vector{Float64}}}; dims::Val{1})
   @ Base .\abstractarray.jl:1981
 [6] vcat(::LinearAlgebra.Adjoint{…}, ::LinearAlgebra.Adjoint{…}, ::LinearAlgebra.Adjoint{…}, ::Vararg{…})
   @ SparseArrays C:\Users\jaakkor2\.julia\juliaup\julia-1.10.0-beta2+0.x64.w64.mingw32\share\julia\stdlib\v1.10\SparseArrays\src\sparsevector.jl:1233
Some type information was truncated. Use `show(err)` to see complete types.

The example works on Julia 1.9.

On Julia 1.10 both collect(hcat(a...)') and stack(a, dims=1) work ok, latter requires Julia ≥ 1.9.

julia> versioninfo()
Julia Version 1.10.0-beta2
Commit a468aa198d (2023-08-17 06:27 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 20 × 13th Gen Intel(R) Core(TM) i7-1370P
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, goldmont)
  Threads: 1 on 20 virtual cores

Julia from juliaup, started with julia +beta --startup-file=no

@brenhinkeller brenhinkeller added regression Regression in behavior compared to a previous version performance Must go faster labels Aug 23, 2023
@brenhinkeller
Copy link
Sponsor Contributor

brenhinkeller commented Aug 23, 2023

Can reproduce, and also that performance is fine on both versions without the using Statistics.

The stack overflow on splatting seems potentially serious enough to consider a bug as well as a performance regression. I'll add to the 1.10 milestone given that, but others feel free to undo..

@brenhinkeller brenhinkeller added this to the 1.10 milestone Aug 23, 2023
@brenhinkeller brenhinkeller added bug Indicates an unexpected problem or unintended behavior statistics The Statistics stdlib module labels Aug 23, 2023
@jishnub
Copy link
Contributor

jishnub commented Aug 23, 2023

The issue is perhaps SparseArrays and not Statistics. The latter loads SparseArrays and introduces the offending method

@jishnub jishnub removed the statistics The Statistics stdlib module label Aug 23, 2023
@jaakkor2 jaakkor2 changed the title using Statistics makes vcat'ing adjoint vectors memory hog, regression in Julia 1.10 beta using SparseArrays makes vcat'ing adjoint vectors memory hog, regression in Julia 1.10 beta Aug 23, 2023
@jakobnissen
Copy link
Contributor

jakobnissen commented Aug 23, 2023

This is a duplicate of a larger problem, namely that

  • Julia can't handle splatting lots of elements, and
  • There is no mechanism to warn users trying to do this or throw an error. Instead, you just run into various internal buggy behaviour like stack overflows in internal functions, or the compiler taking forever.

May be considered a duplicate of #50473, #30796 and several other issues

Honestly I don't think doing regression whack-a-mole on basically unsupported behaviour is a good long-term strategy. Something like what is suggested by @StefanKarpinski in #50473 (comment) is the only real fix (or, alternatively, combing through Julia internals to guarantee handling of large splats, but I think this will be too hard)

@jishnub
Copy link
Contributor

jishnub commented Aug 23, 2023

The large number of elements leading to a stack overflow is a red herring, as there's a performance regression in vcat that's evident even with a smaller number of terms:

julia> using SparseArrays

julia> a=[rand(2) for i=1:30];

julia> @btime vcat($a'...);
  5.382 μs (125 allocations: 4.25 KiB) # v1.9.2
  81.229 μs (1449 allocations: 35.75 KiB) # v1.10.0-beta2

@jaakkor2
Copy link
Contributor Author

Note that BenchmarkTools pulls in Statistics that pulls in SparseArrays.

@KristofferC
Copy link
Sponsor Member

This is a duplicate of a larger problem, namely that

The issue here seems to be more the type piracy in SparseArrays.

@JeffBezanson
Copy link
Sponsor Member

JeffBezanson commented Aug 24, 2023

Bisect points to 8fd5f27 (as far as the behavior change)

sparsevector.jl:1229 is the method called now. Could maybe be fixed by making anysparse iterative instead of recursive?

@JeffBezanson JeffBezanson self-assigned this Aug 24, 2023
@JeffBezanson
Copy link
Sponsor Member

This patch to SparseArrays mostly fixes it:

diff --git a/src/sparsevector.jl b/src/sparsevector.jl
index 7e66bc0..12c3fe6 100644
--- a/src/sparsevector.jl
+++ b/src/sparsevector.jl
@@ -1229,13 +1229,13 @@ function hcat(X1::_SparseConcatGroup, X::_SparseConcatGroup...)
     if anysparse(X1) || anysparse(X...)
         X1, X = _sparse(X1), map(_makesparse, X)
     end
-    return cat(X1, X..., dims=Val(2))
+    return Base.typed_hcat(Base.promote_eltype(X1, X...), X1, X...)
 end
 function vcat(X1::_SparseConcatGroup, X::_SparseConcatGroup...)
     if anysparse(X1) || anysparse(X...)
         X1, X = _sparse(X1), map(_makesparse, X)
     end
-    return cat(X1, X..., dims=Val(1))
+    return Base.typed_vcat(Base.promote_eltype(X1, X...), X1, X...)
 end

On the benchmark above, I get:
1.9: 3.5us
1.10: 49us
this patch: 7.3us

Not all the way there unfortunately.

vtjnash added a commit that referenced this issue Aug 31, 2023
Helps to short-circuit calls to large splat calls, since those have all
the same type elements.

Fixes #51011
vtjnash added a commit to JuliaSparse/SparseArrays.jl that referenced this issue Aug 31, 2023
The generic `cat` does more shape analysis, that typed_cat does not
always do (before either may then dispatch to _cat_t), so we can make
this faster by calling it instead.

Secondly, we can make `issparse` non-recursive once it hits a base case
where all trailing elements are the same. This makes it much better at
handling large splat, since we do not need to check each recursively
smaller type down to the base case using generic code, and just generate
one const method specialized on the full length instead.

Fix JuliaLang/julia#51011
KristofferC pushed a commit to JuliaSparse/SparseArrays.jl that referenced this issue Sep 1, 2023
The generic `cat` does more shape analysis, that typed_cat does not
always do (before either may then dispatch to _cat_t), so we can make
this faster by calling it instead.

Secondly, we can make `issparse` non-recursive once it hits a base case
where all trailing elements are the same. This makes it much better at
handling large splat, since we do not need to check each recursively
smaller type down to the base case using generic code, and just generate
one const method specialized on the full length instead.

Fix JuliaLang/julia#51011
vtjnash added a commit that referenced this issue Sep 1, 2023
Helps to short-circuit calls to large splat calls, since those have all
the same type elements.

Fixes #51011
matbesancon pushed a commit to matbesancon/SparseArrays.jl that referenced this issue Sep 3, 2023
The generic `cat` does more shape analysis, that typed_cat does not
always do (before either may then dispatch to _cat_t), so we can make
this faster by calling it instead.

Secondly, we can make `issparse` non-recursive once it hits a base case
where all trailing elements are the same. This makes it much better at
handling large splat, since we do not need to check each recursively
smaller type down to the base case using generic code, and just generate
one const method specialized on the full length instead.

Fix JuliaLang/julia#51011
vtjnash added a commit to JuliaSparse/SparseArrays.jl that referenced this issue Sep 5, 2023
The generic `cat` does more shape analysis, that typed_cat does not
always do (before either may then dispatch to _cat_t), so we can make
this faster by calling it instead.

Secondly, we can make `issparse` non-recursive once it hits a base case
where all trailing elements are the same. This makes it much better at
handling large splat, since we do not need to check each recursively
smaller type down to the base case using generic code, and just generate
one const method specialized on the full length instead.

Fix JuliaLang/julia#51011

(cherry picked from commit 4e6776a)
KristofferC pushed a commit that referenced this issue Nov 28, 2023
Helps to short-circuit calls to large splat calls, since those have all
the same type elements.

Fixes #51011

(cherry picked from commit 3527213)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior performance Must go faster regression Regression in behavior compared to a previous version
Projects
None yet
6 participants