Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.10 enablement #1946

Merged
merged 5 commits into from
Jul 29, 2023
Merged

1.10 enablement #1946

merged 5 commits into from
Jul 29, 2023

Conversation

dkarrasch
Copy link
Contributor

This is to make use of JuliaLang/julia#50058. While that PR has not been merged, I'm putting this up to see whether this is working properly on older julia versions. Once that PR is merged, CI should obviously also pass on nightly. The amount of code that can eventually be deleted is HUGE, with no loss of functionality!

LinearAlgebra.generic_mattrimul!(C::DenseCuMatrix{T}, uploc, isunitc, tfun::Function, A::DenseCuMatrix{T}, B::DenseCuMatrix{T}) where {T<:CublasFloat} =
trmm!('R', uploc, tfun === identity ? 'N' : tfun === transpose ? 'T' : 'C', isunitc, one(T), B, A, C)
# tri-tri-mul!
function LinearAlgebra.generic_trimatmul!(C::DenseCuMatrix{T}, uplocA, isunitcA, tfunA::Function, A::DenseCuMatrix{T}, B::LinearAlgebra.UpperOrLowerTriangular{T,<:DenseCuMatrix}) where {T<:CublasFloat}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is meant to handle the upper-lower triangular (and vice versa) products. It may be a bit more general in that not both factors need to be non-"unitary", but only one of the factors (the one that is tri[u/l]ed). We need to think about what should happen if none of the conditions is met, i.e., either throw or fall back to some generic matmatmul or so.

@maleadt
Copy link
Member

maleadt commented Jun 26, 2023

I take it we better wait until JuliaLang/julia#50058 is merged so that we can put an accurate version bound in here?

@dkarrasch
Copy link
Contributor Author

Yes. The extended functions currently don't exist. Unfortunately, that PR requires some back-and-forth with SparseArrays.jl, and requires bumping the stdlib. But I marked it as a milestone, so it must be in v1.10 eventually. I'd love to see profiling results compared to some prior reasonable state, like v1.9 and CUDA.jl without my previous PR, or something like that. 😉

@dkarrasch
Copy link
Contributor Author

Let's see if it also works on nightly now that the Base PR is merged. If so, then this should perhaps get another quick look, but otherwise should be ready to go. For reference, the Base commit is already included in the next backport round: JuliaLang/julia#50508.

@dkarrasch
Copy link
Contributor Author

Hm, nightly checks have been removed?

@maleadt
Copy link
Member

maleadt commented Jul 18, 2023

Hm, nightly checks have been removed?

It was hanging. I'll try to add it back in another PR.

@maleadt
Copy link
Member

maleadt commented Jul 18, 2023

This is to make use of JuliaLang/julia#50058.

FYI, CUDA.jl tests failed on 1.10 without this PR, so I guess that means the change was slightly breaking?

@maleadt maleadt changed the title use unwrapping mechanism for triangular matrices 1.10 enablement Jul 18, 2023
@maleadt maleadt added the enhancement New feature or request label Jul 18, 2023
@dkarrasch
Copy link
Contributor Author

That Base PR is not included in v1.10.0-alpha1. It is going to be included in v1.10.0-alpha2. However, this PR here redirects the triangular methods starting from v1.10-, like, a bit "too early". You can wait until alpha2 is released, then run the whole stack, and if it passes, you can merge and release. An indicator for whether it (should) work(s) is current nightly, because that has the commit included. If that one fails, then we know what needs to be fixed in any case.

@maleadt
Copy link
Member

maleadt commented Jul 18, 2023

That Base PR is not included in v1.10.0-alpha1. It is going to be included in v1.10.0-alpha2.

I know; I was testing alpha2 against CUDA.jl#master. Some of the overloads here weren't sticking anymore:

libraries/cusolver/dense: Error During Test at /home/tim/Julia/pkg/CUDA/test/libraries/cusolver/dense.jl:413
  Test threw exception
  Expression: collect(d_M \ d_B) ≈ M \ B
  ArgumentError: cannot take the CPU address of a CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}
  Stacktrace:
    [1] unsafe_convert(::Type{Ptr{Float32}}, x::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer})
      @ CUDA ~/Julia/pkg/CUDA/src/array.jl:386
    [2] trtrs!(uplo::Char, trans::Char, diag::Char, A::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, B::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer})
      @ LinearAlgebra.LAPACK ~/Julia/src/julia/build/dev/usr/share/julia/stdlib/v1.10/LinearAlgebra/src/lapack.jl:3557
    [3] generic_trimatdiv!(C::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, uploc::Char, isunitc::Char, tfun::Function, A::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, B::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer})
      @ LinearAlgebra ~/Julia/src/julia/build/dev/usr/share/julia/stdlib/v1.10/LinearAlgebra/src/triangular.jl:836
    [4] _ldiv!(C::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, A::UpperTriangular{Float32, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, B::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer})
      @ LinearAlgebra ~/Julia/src/julia/build/dev/usr/share/julia/stdlib/v1.10/LinearAlgebra/src/triangular.jl:758
    [5] ldiv!(C::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, A::UpperTriangular{Float32, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, B::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer})
      @ LinearAlgebra ~/Julia/src/julia/build/dev/usr/share/julia/stdlib/v1.10/LinearAlgebra/src/triangular.jl:751
    [6] \(A::UpperTriangular{Float32, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, B::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer})
      @ LinearAlgebra ~/Julia/src/julia/build/dev/usr/share/julia/stdlib/v1.10/LinearAlgebra/src/triangular.jl:1489
    [7] ldiv!(_qr::QR{Float32, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, b::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer})
      @ CUDA.CUSOLVER ~/Julia/pkg/CUDA/lib/cusolver/linalg.jl:146
    [8] \(F::QR{Float32, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, B::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer})
      @ CUDA.CUSOLVER ~/Julia/pkg/CUDA/lib/cusolver/linalg.jl:95
    [9] macro expansion
      @ ~/Julia/src/julia/build/dev/usr/share/julia/stdlib/v1.10/Test/src/Test.jl:669 [inlined]
   [10] macro expansion
      @ ~/Julia/pkg/CUDA/test/libraries/cusolver/dense.jl:413 [inlined]
   [11] macro expansion
      @ ~/Julia/src/julia/build/dev/usr/share/julia/stdlib/v1.10/Test/src/Test.jl:1577 [inlined]
   [12] macro expansion
      @ ~/Julia/pkg/CUDA/test/libraries/cusolver/dense.jl:338 [inlined]
   [13] macro expansion
      @ ~/Julia/src/julia/build/dev/usr/share/julia/stdlib/v1.10/Test/src/Test.jl:1669 [inlined]
   [14] top-level scope
      @ ~/Julia/pkg/CUDA/test/libraries/cusolver/dense.jl:1690

Now, that isn't too bad for CUDA.jl, we generally don't promise forwards compatibility (because of our tight integration with the Julia compiler). I just wanted to mention it for your information.

@dkarrasch
Copy link
Contributor Author

That means there is something wrong with this PR then. generic_trimatdiv! from this package should be called in step [3], which should be defined line 223 in linalg.jl. But that call takes the wrong direction, to LinearAlgebra and ends in LAPACK. Must be an issue with the method signature.

@dkarrasch
Copy link
Contributor Author

Wait, I guess I'm mixing up branches and versions. Let me think about it.

@maleadt
Copy link
Member

maleadt commented Jul 18, 2023

That means there is something wrong with this PR then.

Sorry, I'm not being clear. I wasn't using this PR. I was using 1.10-alpha2 with CUDA.jl#master. Theoretically, that shouldn't have broken anything, but AFAICT it did seem to break CUDA.jl. This PR will fix that, so users won't notice a thing.

@dkarrasch
Copy link
Contributor Author

Got it. Yeah, the "breakage" is because in triangular \ (step [6] above) I moved to using 3-arg ldiv!, where we used to have a branch, and for BLAS eltypes this was calling 2-arg ldiv!.^1 For 2-arg ldiv! we have overloads here. But when the generic code calls 3-arg ldiv!, packages typcially don't have overloads for it, so their overloads are missed.

^1 For 3-arg ldiv! we allocate just a similar matrix, and in BLAS cases it copies B to the target and applies BLAS methods. Just like what we get for calling 2-arg ldiv!: there we used to create the filled copy (copy_similar) in \ right away. So, the work is the same, but the call chains are different now. That didn't pop up in pkgeval runs, so at least in the CPU world there don't seem to exist any issues with that.

@dkarrasch
Copy link
Contributor Author

Ok, this PR is not completely correct. I'll fix it soon.

@dkarrasch
Copy link
Contributor Author

Hm, is CI down, or did I crash it?

@maleadt
Copy link
Member

maleadt commented Jul 22, 2023

Buildkite is down (like, all of Buildkite).

@dkarrasch dkarrasch closed this Jul 29, 2023
@dkarrasch dkarrasch reopened this Jul 29, 2023
@maleadt
Copy link
Member

maleadt commented Jul 29, 2023

Thanks for the help, @dkarrasch!

@maleadt maleadt merged commit a912bea into JuliaGPU:master Jul 29, 2023
@dkarrasch dkarrasch deleted the dk/triangular branch July 31, 2023 07:51
maleadt added a commit that referenced this pull request Aug 25, 2023
Use unwrapping mechanism for triangular matrices.

Co-authored-by: Tim Besard <tim.besard@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants