Avoid materializing arrays in bidiag matmul (#55450)

Currently, small `Bidiagonal`/`Tridiagonal` matrices are materialized in matrix multiplications, but this is wasteful and unnecessary. This PR changes this to use a naive matrix multiplication for small matrices, and fall back to the banded multiplication for larger ones. Multiplication by a `Bidiagonal` falls back to a banded matrix multiplication for all sizes in the current implementation, and iterates in a cache-friendly manner for the non-`Bidiagonal` matrix. In certain cases, the matrices were being materialized if the non-structured matrix was small, even if the structured matrix was large. This is changed as well in this PR. Some improvements in performance: ```julia julia> B = Bidiagonal(rand(3), rand(2), :U); A = rand(size(B)...); C = similar(A); julia> @Btime mul!($C, $A, $B); 193.152 ns (6 allocations: 352 bytes) # nightly v"1.12.0-DEV.1034" 18.826 ns (0 allocations: 0 bytes) # This PR julia> T = Tridiagonal(rand(99), rand(100), rand(99)); A = rand(2, size(T,2)); C = similar(A); julia> @Btime mul!($C, $A, $T); 9.398 μs (8 allocations: 79.94 KiB) # nightly 416.407 ns (0 allocations: 0 bytes) # This PR julia> B = Bidiagonal(rand(300), rand(299), :U); A = rand(20000, size(B,2)); C = similar(A); julia> @Btime mul!($C, $A, $B); 33.395 ms (0 allocations: 0 bytes) # nightly 6.695 ms (0 allocations: 0 bytes) # This PR (cache-friendly) ``` Closes #55414 --------- Co-authored-by: Daniel Karrasch <daniel.karrasch@posteo.de>
JuliaLang · Sep 12, 2024 · 4dbdc80 · 4dbdc80
1 parent c71471b
commit 4dbdc80
Show file tree

Hide file tree

Showing 4 changed files with 422 additions and 68 deletions.
diff --git a/stdlib/LinearAlgebra/src/LinearAlgebra.jl b/stdlib/LinearAlgebra/src/LinearAlgebra.jl
@@ -673,7 +673,9 @@ matprod_dest(A::Diagonal, B::Diagonal, TS) = _matprod_dest_diag(B, TS)
 _matprod_dest_diag(A, TS) = similar(A, TS)
 function _matprod_dest_diag(A::SymTridiagonal, TS)
     n = size(A, 1)
-    Tridiagonal(similar(A, TS, n-1), similar(A, TS, n), similar(A, TS, n-1))
+    ev = similar(A, TS, max(0, n-1))
+    dv = similar(A, TS, n)
+    Tridiagonal(ev, dv, similar(ev))
 end
 
 # Special handling for adj/trans vec