Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure of Eigenvalue Decomposition for Large Matrices. #2413

Closed
hkimman opened this issue Jun 12, 2024 · 3 comments · Fixed by #2437
Closed

Failure of Eigenvalue Decomposition for Large Matrices. #2413

hkimman opened this issue Jun 12, 2024 · 3 comments · Fixed by #2437
Assignees
Labels
bug Something isn't working cuda libraries Stuff about CUDA library wrappers. good first issue Good for newcomers

Comments

@hkimman
Copy link

hkimman commented Jun 12, 2024

Describe the bug

Before updating, I was able to decompose a large (100,000 by 100,000) dense matrix using the CUDA.CUSOLVER.syevd! with CUDA.jl version 5.2.0. However, after updating to version 5.4.2, the eigenvalue decomposition (CUDA.CUSOLVER.syevd!) fails even with a 15,000 by 15,000 dense matrix, resulting in the following error message:

ERROR: InexactError: trunc(Int32, 2707806276)
Stacktrace:
  [1] throw_inexacterror(f::Symbol, ::Type{Int32}, val::Int64)
    @ Core .\boot.jl:634
  [2] checked_trunc_sint
    @ .\boot.jl:656 [inlined]
  [3] toInt32
    @ .\boot.jl:693 [inlined]
  [4] Int32
    @ .\boot.jl:783 [inlined]
  [5] convert
    @ .\number.jl:7 [inlined]
  [6] cconvert
    @ .\essentials.jl:543 [inlined]
  [7] macro expansion
    @ C:\Users\kimma\.julia\packages\CUDA\75aiI\lib\utils\call.jl:226 [inlined]
  [8] macro expansion
    @ C:\Users\kimma\.julia\packages\CUDA\75aiI\lib\cusolver\libcusolver.jl:3043 [inlined]
  [9] #506
    @ C:\Users\kimma\.julia\packages\CUDA\75aiI\lib\utils\call.jl:35 [inlined]
 [10] retry_reclaim
    @ C:\Users\kimma\.julia\packages\CUDA\75aiI\src\memory.jl:434 [inlined]
 [11] check
    @ C:\Users\kimma\.julia\packages\CUDA\75aiI\lib\cusolver\libcusolver.jl:24 [inlined]
 [12] cusolverDnSsyevd
    @ C:\Users\kimma\.julia\packages\CUDA\75aiI\lib\utils\call.jl:34 [inlined]
 [13] (::CUDA.CUSOLVER.var"#1362#1364"{…})(buffer::CuArray{…})
    @ CUDA.CUSOLVER C:\Users\kimma\.julia\packages\CUDA\75aiI\lib\cusolver\dense.jl:640
 [14] with_workspaces(f::CUDA.CUSOLVER.var"#1362#1364"{…}, cache_gpu::Nothing, cache_cpu::Nothing, size_gpu::CUDA.CUSOLVER.var"#bufferSize#1363"{…}, size_cpu::Int64)
    @ CUDA.APIUtils C:\Users\kimma\.julia\packages\CUDA\75aiI\lib\utils\call.jl:131
 [15] with_workspace
    @ C:\Users\kimma\.julia\packages\CUDA\75aiI\lib\utils\call.jl:67 [inlined]
 [16] syevd!(jobz::Char, uplo::Char, A::CuArray{Float32, 2, CUDA.DeviceMemory})
    @ CUDA.CUSOLVER C:\Users\kimma\.julia\packages\CUDA\75aiI\lib\cusolver\dense.jl:639
 [17] top-level scope
    @ c:\Users\kimma\SynologyDrive\Code\Julia\cuda_bug.jl:5
Some type information was truncated. Use `show(err)` to see complete types.

To reproduce

The Minimal Working Example (MWE) for this bug:

using LinearAlgebra
using CUDA

A = Matrix(Symmetric(rand(15_000,15_000)))
result = CUDA.CUSOLVER.syevd!('V','U',cu(A))
Manifest.toml

# This file is machine-generated - editing it directly is not advised

julia_version = "1.10.4"
manifest_format = "2.0"
project_hash = "31a65ef0d76c2eb9948f2f097d929af9853d3bbc"

[[deps.CUDA]]
deps = ["AbstractFFTs", "Adapt", "BFloat16s", "CEnum", "CUDA_Driver_jll", "CUDA_Runtime_Discovery", "CUDA_Runtime_jll", "Crayons", "DataFrames", "ExprTools", "GPUArrays", "GPUCompiler", "KernelAbstractions", "LLVM", "LLVMLoopInfo", "LazyArtifacts", "Libdl", "LinearAlgebra", "Logging", "NVTX", "Preferences", "PrettyTables", "Printf", "Random", "Random123", "RandomNumbers", "Reexport", "Requires", "SparseArrays", "StaticArrays", "Statistics"]
git-tree-sha1 = "6e945e876652f2003e6ca74e19a3c45017d3e9f6"
uuid = "052768ef-5323-5732-b1bb-66c8b64840ba"
version = "5.4.2"

    [deps.CUDA.extensions]
    ChainRulesCoreExt = "ChainRulesCore"
    EnzymeCoreExt = "EnzymeCore"
    SpecialFunctionsExt = "SpecialFunctions"

    [deps.CUDA.weakdeps]
    ChainRulesCore = "d360d2e6-b24c-11e9-a2a3-2a2ae2dbcce4"
    EnzymeCore = "f151be2c-9106-41f4-ab19-57ee4f262869"
    SpecialFunctions = "276daf66-3868-5448-9aa4-cd146d93841b"

[[deps.CUDA_Driver_jll]]
deps = ["Artifacts", "JLLWrappers", "LazyArtifacts", "Libdl", "Pkg"]
git-tree-sha1 = "c48f9da18efd43b6b7adb7ee1f93fe5f2926c339"
uuid = "4ee394cb-3365-5eb0-8335-949819d2adfc"
version = "0.9.0+0"

[[deps.CUDA_Runtime_Discovery]]
deps = ["Libdl"]
git-tree-sha1 = "5db9da5fdeaa708c22ba86b82c49528f402497f2"
uuid = "1af6417a-86b4-443c-805f-a4643ffb695f"
version = "0.3.3"

[[deps.CUDA_Runtime_jll]]
deps = ["Artifacts", "CUDA_Driver_jll", "JLLWrappers", "LazyArtifacts", "Libdl", "TOML"]
git-tree-sha1 = "bcba305388e16aa5c879e896726db9e71b4942c6"
uuid = "76a88914-d11a-5bdc-97e0-2f5a05c973a2"
version = "0.14.0+1"

[[deps.GPUArrays]]
deps = ["Adapt", "GPUArraysCore", "LLVM", "LinearAlgebra", "Printf", "Random", "Reexport", "Serialization", "Statistics"]
git-tree-sha1 = "38cb19b8a3e600e509dc36a6396ac74266d108c1"
uuid = "0c68f7d7-f131-5f86-a1c3-88cf8149b2d7"
version = "10.1.1"

[[deps.GPUArraysCore]]
deps = ["Adapt"]
git-tree-sha1 = "ec632f177c0d990e64d955ccc1b8c04c485a0950"
uuid = "46192b85-c4d5-4398-a991-12ede77f4527"
version = "0.1.6"

[[deps.GPUCompiler]]
deps = ["ExprTools", "InteractiveUtils", "LLVM", "Libdl", "Logging", "Scratch", "TimerOutputs", "UUIDs"]
git-tree-sha1 = "518ebd058c9895de468a8c255797b0c53fdb44dd"
uuid = "61eb1bfa-7361-4325-ad38-22787b887f55"
version = "0.26.5"

[[deps.LLVM]]
deps = ["CEnum", "LLVMExtra_jll", "Libdl", "Preferences", "Printf", "Requires", "Unicode"]
git-tree-sha1 = "389aea28d882a40b5e1747069af71bdbd47a1cae"
uuid = "929cbde3-209d-540e-8aea-75f648917ca0"
version = "7.2.1"
weakdeps = ["BFloat16s"]

    [deps.LLVM.extensions]
    BFloat16sExt = "BFloat16s"

Version info

Details on Julia:

Julia Version 1.10.4
Commit 48d4fd4843 (2024-06-04 10:41 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 32 × AMD Ryzen 9 7945HX with Radeon Graphics
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 32 virtual cores)
Environment:
  JULIA_EDITOR = code
  JULIA_NUM_THREADS =

Details on CUDA:

CUDA runtime 12.5, artifact installation
CUDA driver 12.5
NVIDIA driver 555.99.0

CUDA libraries:
- CUBLAS: 12.5.2
- CURAND: 10.3.6
- CUFFT: 11.2.3
- CUSOLVER: 11.6.2
- CUSPARSE: 12.4.1
- CUPTI: 23.0.0
- NVML: 12.0.0+555.99

Julia packages:
- CUDA: 5.4.2
- CUDA_Driver_jll: 0.9.0+0
- CUDA_Runtime_jll: 0.14.0+1

Toolchain:
- Julia: 1.10.4
- LLVM: 15.0.7

1 device:
  0: NVIDIA GeForce RTX 4090 Laptop GPU (sm_89, 12.148 GiB / 15.992 GiB available)
@hkimman hkimman added the bug Something isn't working label Jun 12, 2024
@maleadt
Copy link
Member

maleadt commented Jun 12, 2024

I don't immediately see how this would have worked before, as the cuSOLVER APIs were always 32-bit. I guess we should be using the new generic APIs here, i.e. Xsyevd instead of Ssyevd, but that's a larger change. The 64-bit APIs were introduced in CUDA 11.1, https://docs.nvidia.com/cuda/archive/11.1.1/cuda-toolkit-release-notes/index.html#cusolver-new-features, and the old ones are now officially deprecated, https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/#cusolver-release-12-4-update-1.

cc @amontoison

@amontoison
Copy link
Member

@maleadt
I planned to wait for CUDA 13.0 to change the dispatch in CUDA.jl.
If it's similar to CUSPARSE, the generic routines were buggy until they removed the legacy API (CUDA v12.0).
However, what we can do for now is to check the dimension of the matrix and dispatch Xsyevd (and other generic routines) only if the matrix is too large to be supported by the 32-bit API.

if CUSOLVER.version() >= v"11.1" && (n >= 2^31 - 1)
  Xsyevd(...)
else
  syevd(...)
end

For CUDA 13.0, we can just replace the condition by

CUSOLVER.version() >= v"13.0" || (CUSOLVER.version() >= v"11.1" && (n >= 2^31 - 1))

@maleadt
Copy link
Member

maleadt commented Jun 28, 2024

Related MWE from #2427:

using LinearAlgebra, CUDA
a = CUDA.rand(Float64, 10000, 10000)
b = a + a'
eigen(b)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cuda libraries Stuff about CUDA library wrappers. good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants