Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rdiv! on LU object is much slower than ldiv! #1085

Closed
stevengj opened this issue Aug 8, 2024 · 7 comments · Fixed by JuliaLang/julia#55764
Closed

rdiv! on LU object is much slower than ldiv! #1085

stevengj opened this issue Aug 8, 2024 · 7 comments · Fixed by JuliaLang/julia#55764
Labels
performance Must go faster

Comments

@stevengj
Copy link
Member

stevengj commented Aug 8, 2024

From this post, I'm finding that rdiv! on LU objects (added in JuliaLang/julia#31285) is much slower than ldiv!.

For example:

using LinearAlgebra, BenchmarkTools
function inv2(A)
    LU = lu(A)
    return ldiv!(LU, Matrix{eltype(LU)}(I, size(A)...))
end
function inv3(A)
    LU = lu(A)
    return rdiv!(Matrix{eltype(LU)}(I, size(A)...), LU)
end

gives

julia> A = rand(1000,1000); @btime inv($A); @btime inv2($A); @btime inv3($A);
  10.791 ms (5 allocations: 8.13 MiB)
  10.796 ms (5 allocations: 15.27 MiB)
  912.368 ms (5 allocations: 15.27 MiB)

julia> inv(A) ≈ inv2(A) ≈ inv3(A)
true
@stevengj stevengj added the performance Must go faster label Aug 8, 2024
@stevengj
Copy link
Member Author

stevengj commented Aug 8, 2024

I benchmarked rdiv! and ldiv! on UpperTriangular and UnitLowerTriangular matrices (which should represent most of the time for an LU solve), and they are comparable (both taking a few seconds for 1000x1000 matrices), so it's not obvious to me what the problem could be. What's taking hundreds of ms here?

@KristofferC
Copy link
Member

All time seem to be spent in

https://github.com/JuliaLang/julia/blob/4954197196d657d14edd3e9c61ac101866e6fa25/stdlib/LinearAlgebra/src/triangular.jl#L1162

Since it is an adjoint it is iterating there in a cache unfriendly way (row by row). Not sure if that is the full answer.

@stevengj
Copy link
Member Author

stevengj commented Aug 8, 2024

Why would it be calling generic_trimatdiv! instead of a LAPACK call? A generic $O(n^3)$ routine will be awful here.

@aravindh-krishnamoorthy
Copy link
Contributor

aravindh-krishnamoorthy commented Sep 11, 2024

I benchmarked rdiv! and ldiv! on UpperTriangular and UnitLowerTriangular matrices (which should represent most of the time for an LU solve), and they are comparable (both taking a few seconds for 1000x1000 matrices), so it's not obvious to me what the problem could be. What's taking hundreds of ms here?

The problem is that LAPACK's getrs!(trans,A,ipiv,B) can only apply transposition (trans) to A, whereas https://github.com/JuliaLang/julia/blob/1eabe90fa054abc74f541798ae4dfcc9445bb564/stdlib/LinearAlgebra/src/lu.jl#L789 applies transpositions to both A and B.

With the following new functions

julia> LinearAlgebra.ldiv!(A::LinearAlgebra.AdjointFactorization{T,<:LU{T,<:StridedMatrix}}, B::Adjoint{T,<:StridedVecOrMat{T}}) where {T<:LinearAlgebra.BlasComplex} = LAPACK.getrs!('C', A.parent.factors, A.parent.ipiv, copy(B))
julia> LinearAlgebra.ldiv!(A::LinearAlgebra.TransposeFactorization{T,<:LU{T,<:StridedMatrix}}, B::Transpose{T,<:StridedVecOrMat{T}}) where {T<:LinearAlgebra.BlasFloat} = LAPACK.getrs!('T', A.parent.factors, A.parent.ipiv, copy(B))

which are in line with the implementations of \ in lu.jl: https://github.com/JuliaLang/julia/blob/1eabe90fa054abc74f541798ae4dfcc9445bb564/stdlib/LinearAlgebra/src/lu.jl#L524-L527
the performance is more reasonable:

julia> A = rand(1000,1000); @btime inv($A); @btime inv2($A); @btime inv3($A);
  20.489 ms (5 allocations: 8.13 MiB)
  30.846 ms (5 allocations: 15.27 MiB)
  32.751 ms (7 allocations: 22.90 MiB)

However, this requires a copy (mainly memory impact):

julia> @btime copy(A);
  242.988 μs (2 allocations: 7.63 MiB)

@aravindh-krishnamoorthy
Copy link
Contributor

Update: As indicated by @dkarrasch in the comments of the now-closed PR JuliaLang/julia#55760, this is not a missing functionality issue, but a dispatch issue. See JuliaLang/julia#55764 for a possible solution.

@dkarrasch
Copy link
Member

Indeed. Unfortunately, I have very little time at the moment, so if somebody could debug the "lu of tridiagonal" case in that PR (everything else is working just fine), that would be awesome!

@KristofferC KristofferC transferred this issue from JuliaLang/julia Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants