Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CUSPARSE] Update the interface for triangular solves #2164

Merged
merged 1 commit into from
Nov 13, 2023

Conversation

amontoison
Copy link
Member

@amontoison amontoison commented Nov 10, 2023

@dkarrasch
It seems that NVIDIA reintroduced in-place triangular solves in CUSPARSE with recent CUDA toolkits.
I also found a few typos that you added in #1946. Can you review the PR?

Can you also confirm that ldiv!(C, A, B), ldiv!(A, B) and A \ B and automatically defined for the four triangular wrappers if we define generic_trimatdiv!?

@amontoison
Copy link
Member Author

amontoison commented Nov 11, 2023

@dkarrasch
Is possible to modify the function

generic_trimatdiv!(C, uploc, isunitc, tfun, A, B)

into

generic_trimatdiv!(C, uploc, isunitc, tfunA, A, tfunB, B)

?

If the right-hand side B is transposed, the current version of generic_trimatdiv! is not working on Julia 1.10.
Julia doesn't dispatch to it on some cases and we have scalar indexing.

@dkarrasch
Copy link
Contributor

No, at least that it is not designed like that. The reason is that triangular solves are typically 2-arg only. But you could create your own.

generic_trimatdiv!(C, uploc, isunitc, tfun, A, B::SomeCuMatrix) = # as usual
generic_trimatdiv!(C, uploc, isunitc, tfun, A, B::Transpose{<:Any,<:SomeCuMatrix}) =
    _generic_trimatdiv!(C, uploc, isunitc, tfunA, A, tfunB, B)

(with the other slots type-annotated as appropriate), which then calls whatever CUSPARSE offers.

@amontoison
Copy link
Member Author

amontoison commented Nov 11, 2023

I already did that but in the case that the triangular solves are in-place, C is also in a Transpose / Adjoint and ldiv! doesn't dispatch to generic_trimatdiv!.

We can use 2-arg methods again with CUDA v12.x so it could simplify many things.
I changed everything last year to use the 3-args ldiv! because the 2-args ldiv! was not anymore supported but it's fixed now.

@dkarrasch
Copy link
Contributor

The dispatch in LinearAlgebra is as follows: if you call ldiv!(T, B), then this gets expanded to the "three-arg" generic_trimatdiv! with the character for potential transposition. This is where packages like this one dispatch one the underlying storage. All the BLAS methods then check if C === B, and in that case ignore B completely and solve in-place of C. So it is a 2-arg solve eventually. For (Open)BLAS, I don't think you can call ldiv! on a transpose rhs directly (I mean you can, but you will end up in a generic solve routine). You can only call T\ transpose(B), and then transpose(B) gets materialized and then mutated. The same happens here: sm2! and sv2! are 2-arg solves.

@amontoison
Copy link
Member Author

Thanks @dkarrasch! I understand how the dispatch with generic_trimatdiv! works now.
I solved the issue by adding the following three three methods:

function LinearAlgebra.generic_trimatdiv!(C::DenseCuMatrix{T}, uploc, isunitc, tfun, A, B::AdjOrTrans{T,<:DenseCuMatrix{T}}) where {T<:BlasFloat}
  ...
end

function LinearAlgebra.generic_trimatdiv!(C::Transpose{T,<:DenseCuMatrix{T}}, uploc, isunitc, tfun, A, B::Transpose{T,<:DenseCuMatrix{T}}) where {T<:BlasFloat}
  ...
end

function LinearAlgebra.generic_trimatdiv!(C::Adjoint{T,<:DenseCuMatrix{T}}, uploc, isunitc, tfun, A, B::Adjoint{T,<:DenseCuMatrix{T}}) where {T<:BlasFloat}
  ...
end

@maleadt maleadt added enhancement New feature or request cuda libraries Stuff about CUDA library wrappers. labels Nov 13, 2023
@maleadt maleadt merged commit 3e59794 into JuliaGPU:master Nov 13, 2023
1 check passed
@amontoison amontoison deleted the triangular_solves branch November 13, 2023 08:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda libraries Stuff about CUDA library wrappers. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants