Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A*A' significantly slower than A*A for ArbMatrices #119

Open
nanleij opened this issue Apr 12, 2021 · 3 comments
Open

A*A' significantly slower than A*A for ArbMatrices #119

nanleij opened this issue Apr 12, 2021 · 3 comments

Comments

@nanleij
Copy link
Contributor

nanleij commented Apr 12, 2021

Hi,

I was trying to see if I could use this instead of BigFloats for some of my computations, for which i (among other things like cholesky decompositions and computing the minimum eigenvalue) need to do A*A' for matrices of size up to about 300x300, which is currently the bottleneck.

I came across the following remarkable difference:

julia> A = ArbMatrix(rand(BigFloat,100,100));

julia> @benchmark $A*$A
BenchmarkTools.Trial:
  memory estimate:  48 bytes
  allocs estimate:  1
  --------------
  minimum time:     128.750 ms (0.00% GC)
  median time:      132.684 ms (0.00% GC)
  mean time:        133.845 ms (0.00% GC)
  maximum time:     148.832 ms (0.00% GC)
  --------------
  samples:          38
  evals/sample:     1

julia> @benchmark $A*$A'
BenchmarkTools.Trial:
  memory estimate:  249.02 MiB
  allocs estimate:  4080001
  --------------
  minimum time:     3.125 s (22.31% GC)
  median time:      3.331 s (26.56% GC)
  mean time:        3.331 s (26.56% GC)
  maximum time:     3.537 s (30.31% GC)
  --------------
  samples:          2
  evals/sample:     1

So multiplying with a transpose is significantly slower than normal multiplication, maybe because of all the allocations which do happen in the multiplication with the transpose.
Comparing it to BigFloats:

julia> B = rand(BigFloat,100,100);

julia> @benchmark $B*$B
BenchmarkTools.Trial:
  memory estimate:  218.35 MiB
  allocs estimate:  4100002
  --------------
  minimum time:     911.017 ms (11.87% GC)
  median time:      930.252 ms (11.95% GC)
  mean time:        933.218 ms (11.67% GC)
  maximum time:     962.344 ms (12.72% GC)
  --------------
  samples:          6
  evals/sample:     1

julia> @benchmark $B*$B'
BenchmarkTools.Trial:
  memory estimate:  218.35 MiB
  allocs estimate:  4100002
  --------------
  minimum time:     1.228 s (8.92% GC)
  median time:      1.237 s (9.04% GC)
  mean time:        1.238 s (8.64% GC)
  maximum time:     1.255 s (9.08% GC)
  --------------
  samples:          5
  evals/sample:     1

So for BigFloats, multiplying with the transpose is also a bit slower, but only 30% compared to about 2500%. Another way would be to use normal matrices with Arb entries:

julia> C = rand(Arb,100,100);

julia> @benchmark $C*$C
BenchmarkTools.Trial:
  memory estimate:  124.66 MiB
  allocs estimate:  2040002
  --------------
  minimum time:     1.572 s (14.81% GC)
  median time:      1.807 s (20.96% GC)
  mean time:        1.872 s (25.84% GC)
  maximum time:     2.237 s (37.54% GC)
  --------------
  samples:          3
  evals/sample:     1

julia> @benchmark $C*$C'
BenchmarkTools.Trial:
  memory estimate:  124.66 MiB
  allocs estimate:  2040002
  --------------
  minimum time:     2.014 s (11.50% GC)
  median time:      2.029 s (19.55% GC)
  mean time:        2.166 s (22.57% GC)
  maximum time:     2.456 s (34.14% GC)
  --------------
  samples:          3
  evals/sample:     1

So this is more comparable to the BigFloats, only a factor 2 difference in both cases. (Interestingly, it only uses about half the memory/allocations)

To me it seems like the ArbMatrices fall back to the generic matrix multiplication in the case with the transpose. Is there an easy way around this?

PS: I also tried Nemo (which has similar speed for A*A and A*A'), but I also need the minimum eigenvalue and the cholesky decomposition, which are currently not in Nemo. So I would need to convert back and forth to BigFloats, or compute those myself. With Nemo, I'm not sure how to do this but with Arblib the conversion is easy. Hence why I would very much like a similar speed for A*A' as for A*A.

@saschatimme
Copy link
Collaborator

To me it seems like the ArbMatrices fall back to the generic matrix multiplication in the case with the transpose. Is there an easy way around this?

Arb only implements matrix multiplication A*B and no special versions where A or B are transposed. The easiest thing is probably to realize A' as an ArbMatrix and to then perform the multiplication. This is also how I would probably implement this here in Arblib. A pull request for this is very much welcomed 🙂

@kalmarek
Copy link
Owner

kalmarek commented Apr 12, 2021

julia> @btime A*A;
  11.988 ms (10089 allocations: 4.11 MiB)

julia> @btime A*A';
  786.593 ms (8160005 allocations: 436.25 MiB)

julia> C = similar(A); D = similar(A); @btime Arblib.mul!($D, $A, Arblib.transpose!($C, $A));
  12.785 ms (85 allocations: 3.20 MiB)

we're hitting the generic matmul here:

julia> @which A*A
*(A::T, B::T) where T<:Union{AcbMatrix, AcbRefMatrix, ArbMatrix, ArbRefMatrix} in Arblib at /home/kalmar/.julia/dev/Arblib/src/matrix.jl:166

julia> @which A*A'
*(A::AbstractMatrix{T} where T, B::AbstractMatrix{T} where T) in LinearAlgebra at /home/kalmar/packages/julias/julia-1.6/share/julia/stdlib/v1.6/LinearAlgebra/src/matmul.jl:151

@nanleij
Copy link
Contributor Author

nanleij commented Apr 12, 2021

A*ArbMatrix(A') works like a charm! Thanks!
I might look into a pull request later

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants