Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect colVal array when using CuSparseMatrixCSR command on sparse matrix #1888

Closed
HamMoh94 opened this issue Apr 28, 2023 · 8 comments
Closed
Labels
bug Something isn't working upstream Somebody else's problem.

Comments

@HamMoh94
Copy link

The command CuSparseMatrixCSR incorrectly generates the colVal inner array when applying it on a sparse matrix of form CSC with a specific size. This can be easily seen by generating a sparse identity matrix (A) of size (N,N) and then loading it on GPU using the command CuSparseMatrixCSR(A). We expect the colVal array to be (1:1:N). However, the first value is shifted to the last column. More interestingly, the problem showed only for matrices of size (16,32,64,128 ...), but it does not show for other sizes.

The Minimal Working Example (MWE) for this bug:

using CUDA 
using CUDA.CUSPARSE 
using SparseArrays


A = spdiagm(ones(8))       # csc structure on cpu
B = spdiagm(ones(16))       # csc structure on cpu
C = spdiagm(ones(32))       # csc structure on cpu
D = spdiagm(ones(64))       # csc structure on cpu
E = spdiagm(ones(128))       # csc structure on cpu

Acsr = CuSparseMatrixCSR(A)   # csr on GPU
Bcsr = CuSparseMatrixCSR(B)   # csr on GPU
Ccsr = CuSparseMatrixCSR(C)   # csr on GPU
Dcsr = CuSparseMatrixCSR(D)   # csr on GPU
Ecsr = CuSparseMatrixCSR(E)   # csr on GPU


Acsr.colVal
Bcsr.colVal
Ccsr.colVal
Dcsr.colVal
Ecsr.colVal

Expected behavior

We expect to get Acsr.colVal = 1:1:8 But I am getting Acsr.colVal = [8; 1:1:7], while the same applies for the other matrices.

Version info

Details on Julia:
Julia Version 1.8.3
Commit 0434deb161e (2022-11-14 20:14 UTC)
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 128 × AMD EPYC 7542 32-Core Processor
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-13.0.1 (ORCJIT, znver2)
Threads: 1 on 128 virtual cores
Environment:
JULIA_EDITOR = code
JULIA_NUM_THREADS = ```

Details on CUDA:
CUDA runtime 12.1, artifact installation
CUDA driver 12.1
NVIDIA driver 470.161.3, originally for CUDA 11.4

Libraries:

  • CUBLAS: 12.1.0
  • CURAND: 10.3.2
  • CUFFT: 11.0.2
  • CUSOLVER: 11.4.4
  • CUSPARSE: 12.0.2
  • CUPTI: 18.0.0
  • NVML: 11.0.0+470.161.3

Toolchain:

  • Julia: 1.8.3
  • LLVM: 13.0.1
  • PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0, 7.1, 7.2
  • Device capability support: sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80, sm_86

1 device:
0: NVIDIA A100 80GB PCIe (sm_80, 59.855 GiB / 79.096 GiB available)

@HamMoh94 HamMoh94 added the bug Something isn't working label Apr 28, 2023
@maleadt
Copy link
Member

maleadt commented Apr 28, 2023

Which version of CUDA.jl?

cc @amontoison

@HamMoh94
Copy link
Author

CUDA.versioninfo()
CUDA runtime 12.1, artifact installation
CUDA driver 12.1

@amontoison
Copy link
Member

amontoison commented Apr 28, 2023

Tim, is CUDA toolkit 12.1.1 used by default now?
They fixed a few bugs in CUDA 12.1.1. CUDA 12.1.0 has the bugs.

@maleadt
Copy link
Member

maleadt commented Apr 28, 2023

After #1883 yes, but that's not part of a release yet, so @HamMoh94 please test the master branch.

CUDA.versioninfo()
CUDA runtime 12.1, artifact installation
CUDA driver 12.1

That's the CUDA version, not the version of the CUDA.jl Julia package.

@HamMoh94
Copy link
Author

The command Pkg.status("CUDA"), gives out the following:
⌃ [052768ef] CUDA v4.1.2
Info Packages marked with ⌃ have new versions available and may be upgradable.

@amontoison
Copy link
Member

@HamMoh94
Could you add the master branch and test again?

pkg> add CUDA#master

@HamMoh94
Copy link
Author

HamMoh94 commented May 2, 2023

@amontoison The issue is resolved after adding the branch master.
Thank you

@amontoison
Copy link
Member

amontoison commented May 2, 2023

Great! I'm using CUDA 11.8 to avoid the last problems.
They broke a lot of thing with CUDA Toolkits 12.x.
@maleadt I suggest to do a new release if you can.

@maleadt maleadt closed this as completed May 2, 2023
@maleadt maleadt added the upstream Somebody else's problem. label May 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working upstream Somebody else's problem.
Projects
None yet
Development

No branches or pull requests

3 participants