Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cuTENSOR] Issue when contracting views of CuArrays with cuTENSOR #2407

Open
kmp5VT opened this issue Jun 5, 2024 · 1 comment
Open

[cuTENSOR] Issue when contracting views of CuArrays with cuTENSOR #2407

kmp5VT opened this issue Jun 5, 2024 · 1 comment
Labels
bug Something isn't working cuda libraries Stuff about CUDA library wrappers.

Comments

@kmp5VT
Copy link

kmp5VT commented Jun 5, 2024

Describe the bug

Occasionally, there is an issue when contracting sub-matrices from views of CuArrays using the cuTENSOR backend. The contractions work fine with cuBLAS but fail with

ERROR: CUTENSORError: an invalid value was used as an argument (code 7, CUTENSOR_STATUS_INVALID_VALUE)
Stacktrace:
 [1] throw_api_error(res::cuTENSOR.cutensorStatus_t)
   @ cuTENSOR ~/.julia/packages/cuTENSOR/uwns2/src/libcutensor.jl:14
 [2] check
   @ ~/.julia/packages/cuTENSOR/uwns2/src/libcutensor.jl:27 [inlined]
 [3] cutensorContract
   @ ~/.julia/packages/CUDA/75aiI/lib/utils/call.jl:34 [inlined]
 [4] 
   @ cuTENSOR ~/.julia/packages/cuTENSOR/uwns2/src/operations.jl:294
 [5] #contract!#83
   @ ~/.julia/packages/cuTENSOR/uwns2/src/operations.jl:278 [inlined]
 [6] contract!
   @ ~/.julia/packages/cuTENSOR/uwns2/src/operations.jl:259 [inlined]
 [7] mul!
   @ ~/.julia/packages/cuTENSOR/uwns2/src/interfaces.jl:57 [inlined]
 [8] mul!(C::CuTensor{Float32, 2}, A::CuTensor{Float32, 2}, B::CuTensor{Float32, 2})
   @ LinearAlgebra ~/.julia/juliaup/julia-1.10.3+0.x64.linux.gnu/share/julia/stdlib/v1.10/LinearAlgebra/src/matmul.jl:237
 [9] top-level scope
   @ ~/.julia/dev/testing.jl:321
Some type information was truncated. Use `show(err)` to see complete types.

To reproduce

The Minimal Working Example (MWE) for this bug:

using CUDA, cuTENSOR, LinearAlgebra
A = cu(randn(5))
B = cu(randn(1))
C = cu(randn(5))
vA = @view A[2:5]
vB = @view B[1:1]
vC = @view C[2:5]

tA = CuTensor(reshape(vA, (4,1)), [1,2])
tB = CuTensor(reshape(vB, (1,1)), [2,3])
tC = CuTensor(reshape(vC, (4,1)), [1,3])
mul!(reshape(vC, (4,1)), reshape(vA, (4,1)), reshape(vB, (1,1))) ## works fine
mul!(tC, tA, tB) ## Fails
Manifest.toml

(jl_HrbN51) pkg> st
Status `/tmp/jl_HrbN51/Project.toml`
  [052768ef] CUDA v5.4.2

Version info

Details on Julia:

# please post the output of:
versioninfo()
Julia Version 1.10.3
Commit 0b4590a5507 (2024-04-30 10:59 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 32 × Intel(R) Xeon(R) Gold 6244 CPU @ 3.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, cascadelake)
Threads: 1 default, 0 interactive, 1 GC (on 32 virtual cores)
Environment:
  LD_LIBRARY_PATH = /mnt/sw/nix/store/pmwk60bp5k4qr8vsg411p7vzhr502d83-openblas-0.3.23/lib:/cm/shared/apps/slurm/current/lib64

Details on CUDA:

# please post the output of:
CUDA.versioninfo()
CUDA runtime 12.5, artifact installation
CUDA driver 12.5
NVIDIA driver 550.76.0, originally for CUDA 12.4

CUDA libraries: 
- CUBLAS: 12.5.2
- CURAND: 10.3.6
- CUFFT: 11.2.3
- CUSOLVER: 11.6.2
- CUSPARSE: 12.4.1
- CUPTI: 23.0.0
- NVML: 12.0.0+550.76

Julia packages: 
- CUDA: 5.4.2
- CUDA_Driver_jll: 0.9.0+0
- CUDA_Runtime_jll: 0.14.0+1

Toolchain:
- Julia: 1.10.3
- LLVM: 15.0.7

1 device:
  0: NVIDIA RTX A6000 (sm_86, 45.830 GiB / 47.988 GiB available)
@kmp5VT
Copy link
Author

kmp5VT commented Jun 5, 2024

As a followup the code does run successfully with an element type of ComplexF64

using CUDA, cuTENSOR, LinearAlgebra
elt = ComplexF64
A = CuArray(randn(elt, 5))
B = CuArray(randn(elt, 1))
C = CuArray(randn(elt, 5))
vA = @view A[2:5]
vB = @view B[1:1]
vC = @view C[2:5]

tA = CuTensor(reshape(vA, (4,1)), [1,2])
tB = CuTensor(reshape(vB, (1,1)), [2,3])
tC = CuTensor(reshape(vC, (4,1)), [1,3])
mul!(reshape(vC, (4,1)), reshape(vA, (4,1)), reshape(vB, (1,1)))
mul!(tC, tA, tB) 
vC ≈ tC.data # true

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cuda libraries Stuff about CUDA library wrappers.
Projects
None yet
Development

No branches or pull requests

2 participants