-
Notifications
You must be signed in to change notification settings - Fork 218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cuBLASXt's xt_gemm!
incompatible with stream-ordered allocated memory
#2320
Comments
Works for me.
What version of the CUDA toolkit is that using? Since you're using a CUDA 12.4 driver, I assume that might be the CUDA 12.4 toolkit. If so, try CUDA.jl#master to also use CUDA 12.4 for the Julia test. |
On CUDA runtime 12.4, artifact installation
CUDA driver 12.4
NVIDIA driver 550.54.15
CUDA libraries:
- CUBLAS: 12.4.2
- CURAND: 10.3.5
- CUFFT: 11.2.0
- CUSOLVER: 11.6.0
- CUSPARSE: 12.3.0
- CUPTI: 22.0.0
- NVML: 12.0.0+550.54.15
Julia packages:
- CUDA: 5.3.0
- CUDA_Driver_jll: 0.8.0+0
- CUDA_Runtime_jll: 0.12.0+1
Toolchain:
- Julia: 1.10.2
- LLVM: 15.0.7
4 devices:
0: NVIDIA H100 (sm_90, 92.999 GiB / 93.584 GiB available)
1: NVIDIA H100 (sm_90, 92.999 GiB / 93.584 GiB available)
2: NVIDIA H100 (sm_90, 92.999 GiB / 93.584 GiB available)
3: NVIDIA H100 (sm_90, 92.999 GiB / 93.584 GiB available) |
I also tried on a different machine (both 5.2 and master, versioninfo below). The error still happens. 5.2 CUDA runtime 12.3, artifact installation
CUDA driver 12.3
NVIDIA driver 545.23.6
CUDA libraries:
- CUBLAS: 12.3.4
- CURAND: 10.3.4
- CUFFT: 11.0.12
- CUSOLVER: 11.5.4
- CUSPARSE: 12.2.0
- CUPTI: 21.0.0
- NVML: 12.0.0+545.23.6
Julia packages:
- CUDA: 5.2.0
- CUDA_Driver_jll: 0.7.0+1
- CUDA_Runtime_jll: 0.11.1+0
Toolchain:
- Julia: 1.10.2
- LLVM: 15.0.7
2 devices:
0: NVIDIA RTX A6000 (sm_86, 44.548 GiB / 44.988 GiB available)
1: NVIDIA RTX A6000 (sm_86, 44.548 GiB / 44.988 GiB available) master CUDA runtime 12.4, artifact installation
CUDA driver 12.3
NVIDIA driver 545.23.6
CUDA libraries:
- CUBLAS: 12.4.2
- CURAND: 10.3.5
- CUFFT: 11.2.0
- CUSOLVER: 11.6.0
- CUSPARSE: 12.3.0
- CUPTI: 22.0.0
- NVML: 12.0.0+545.23.6
Julia packages:
- CUDA: 5.3.0
- CUDA_Driver_jll: 0.8.0+0
- CUDA_Runtime_jll: 0.12.0+1
Toolchain:
- Julia: 1.10.2
- LLVM: 15.0.7
2 devices:
0: NVIDIA RTX A6000 (sm_86, 44.548 GiB / 44.988 GiB available)
1: NVIDIA RTX A6000 (sm_86, 44.548 GiB / 44.988 GiB available) |
That's surprising, because I also have an RTX A6000. I can reproduce on an H100 though. Can you share your C++ reproducer? Also, can you try running with |
On the H100 machine on master 09:32:08 |base|lpawela@nirvana mwe_xt_gemm → JULIA_DEBUG=CUBLAS julia --project -t 128 mwe.jl
First test passed
┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasXtCreate(cublasXtContext**) called:
│ handle: type=SOME TYPE; val=POINTER (IN HEX:0x0x7f9324ac5dc0)
│ Time: 2024-04-10T21:32:37 elapsed from start 0.100000 minutes or 6.000000 seconds
│ Process=414562; Thread=140270086828608; GPU=0; Handle=POINTER (IN HEX:0x(nil))
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
ERROR: ┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasXtDeviceSelect(cublasXtHandle_t, int, int*) called:
│ handle: type=SOME TYPE; val=POINTER (IN HEX:0x0x618e950)
│ nbDevices: type=int; val=4
│ deviceId: type=int; val=POINTER (IN HEX:0x0x7f9321955260)
│ Time: 2024-04-10T21:32:37 elapsed from start 0.100000 minutes or 6.000000 seconds
│ Process=414562; Thread=140270086828608; GPU=0; Handle=POINTER (IN HEX:0x(nil))
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
LoadError: ┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasCreate_v2(cublasContext**) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x61cbcd0)
│ Time: 2024-04-10T21:32:37 elapsed from start 0.100000 minutes or 6.000000 seconds
│ Process=414562; Thread=140270086828608; GPU=0; Handle=POINTER (IN HEX:0x(nil))
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
CUBLASError: ┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasCreate_v2(cublasContext**) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x61cbcd8)
│ Time: 2024-04-10T21:32:37 elapsed from start 0.100000 minutes or 6.000000 seconds
│ Process=414562; Thread=140270086828608; GPU=1; Handle=POINTER (IN HEX:0x(nil))
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
an access to GPU memory space failed┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasCreate_v2(cublasContext**) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x61cbce0)
│ Time: 2024-04-10T21:32:37 elapsed from start 0.100000 minutes or 6.000000 seconds
│ Process=414562; Thread=140270086828608; GPU=2; Handle=POINTER (IN HEX:0x(nil))
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
(code 11, CUBLAS_STATUS_MAPPING_ERROR)┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasCreate_v2(cublasContext**) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x61cbce8)
│ Time: 2024-04-10T21:32:37 elapsed from start 0.100000 minutes or 6.000000 seconds
│ Process=414562; Thread=140270086828608; GPU=3; Handle=POINTER (IN HEX:0x(nil))
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
Stacktrace:┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasXtSgemm(cublasXtHandle_t, cublasOperation_t, cublasOperation_t, size_t, size_t, size_t, const float*, const float*, size_t, const float*, size_t, const float*, float*, size_t) called:
│ handle: type=SOME TYPE; val=POINTER (IN HEX:0x0x618e950)
│ transa: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│ transb: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│ m: type=SOME TYPE; val=1000
│ n: type=SOME TYPE; val=100
│ k: type=SOME TYPE; val=1000
│ alpha: type=float; val=POINTER (IN HEX:0x0x7ffd86ce35f8)
│ A: type=float; val=POINTER (IN HEX:0x0x7f8c6c999040)
│ lda: type=SOME TYPE; val=1000
│ B: type=float; val=POINTER (IN HEX:0x0x620000000)
│ ldb: type=SOME TYPE; val=1000
│ beta: type=float; val=POINTER (IN HEX:0x0x7ffd86ce35f0)
│ C: type=float; val=POINTER (IN HEX:0x0x620061c00)
│ ldc: type=SOME TYPE; val=1000
│ Time: 2024-04-10T21:32:37 elapsed from start 0.100000 minutes or 6.000000 seconds
│ Process=414562; Thread=140270086828608; GPU=0; Handle=POINTER (IN HEX:0x(nil))
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasSetStream_v2(cublasHandle_t, cudaStream_t) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x6250fb0)
│ streamId: type=SOME TYPE; val=POINTER (IN HEX:0x0x7f899c0050f0)
│ Time: 2024-04-10T21:32:37 elapsed from start 0.100000 minutes or 6.000000 seconds
│ Process=414562; Thread=140230656906816; GPU=0; Handle=POINTER (IN HEX:0x0x6250fb0); StreamId=POINTER (IN HEX:0x(nil)) (defaultStream); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasSgemm_v2(cublasHandle_t, cublasOperation_t, cublasOperation_t, int, int, int, const float*, const float*, int, const float*, int, const float*, float*, int) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x6250fb0)
│ transa: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│ transb: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│ m: type=int; val=1000
│ n: type=int; val=100
│ k: type=int; val=1000
│ alpha: type=float; val=POINTER (IN HEX:0x0x7f89fe7dca38)
│ A: type=float; val=POINTER (IN HEX:0x0x7f89d1200000)
│ lda: type=int; val=1024
│ B: type=float; val=POINTER (IN HEX:0x0x620000000)
│ ldb: type=int; val=1000
│ beta: type=float; val=POINTER (IN HEX:0x0x7f89fe7dca34)
│ C: type=float; val=POINTER (IN HEX:0x0x620061c00)
│ ldc: type=int; val=1000
│ Time: 2024-04-10T21:32:37 elapsed from start 0.100000 minutes or 6.000000 seconds
│ Process=414562; Thread=140230656906816; GPU=0; Handle=POINTER (IN HEX:0x0x6250fb0); StreamId=POINTER (IN HEX:0x0x7f899c0050f0); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
│
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
[1]┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasXtSgemm(cublasXtHandle_t, cublasOperation_t, cublasOperation_t, size_t, size_t, size_t, const float*, const float*, size_t, const float*, size_t, const float*, float*, size_t) called:
│ handle: type=SOME TYPE; val=POINTER (IN HEX:0x0x618e950)
│ transa: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│ transb: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│ m: type=SOME TYPE; val=100000
│ n: type=SOME TYPE; val=10000
│ k: type=SOME TYPE; val=100000
│ alpha: type=float; val=POINTER (IN HEX:0x0x7ffd86ce35f8)
│ A: type=float; val=POINTER (IN HEX:0x0x7f8030f62040)
│ lda: type=SOME TYPE; val=100000
│ B: type=float; val=POINTER (IN HEX:0x0x6200c3800)
│ ldb: type=SOME TYPE; val=100000
│ beta: type=float; val=POINTER (IN HEX:0x0x7ffd86ce35f0)
│ C: type=float; val=POINTER (IN HEX:0x0x70e776000)
│ ldc: type=SOME TYPE; val=100000
│ Time: 2024-04-10T21:32:50 elapsed from start 0.316667 minutes or 19.000000 seconds
│ Process=414562; Thread=140270086828608; GPU=0; Handle=POINTER (IN HEX:0x(nil))
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasSetStream_v2(cublasHandle_t, cudaStream_t) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x6250fb0)
│ streamId: type=SOME TYPE; val=POINTER (IN HEX:0x0x7f899f21b480)
│ Time: 2024-04-10T21:32:50 elapsed from start 0.316667 minutes or 19.000000 seconds
│ Process=414562; Thread=140230631728704; GPU=0; Handle=POINTER (IN HEX:0x0x6250fb0); StreamId=POINTER (IN HEX:0x0x7f899c0050f0); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
throw_api_error(res::CUDA.CUBLAS.cublasStatus_t)┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasSgemm_v2(cublasHandle_t, cublasOperation_t, cublasOperation_t, int, int, int, const float*, const float*, int, const float*, int, const float*, float*, int) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x6250fb0)
│ transa: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│ transb: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│ m: type=int; val=1024
│ n: type=int; val=1024
│ k: type=int; val=1024
│ alpha: type=float; val=POINTER (IN HEX:0x0x7f89fcfd9a38)
│ A: type=float; val=POINTER (IN HEX:0x0x7f898a400000)
│ lda: type=int; val=1024
│ B: type=float; val=POINTER (IN HEX:0x0x6200c3800)
│ ldb: type=int; val=100000
│ beta: type=float; val=POINTER (IN HEX:0x0x7f89fcfd9a34)
│ C: type=float; val=POINTER (IN HEX:0x0x70e776000)
│ ldc: type=int; val=100000
│ Time: 2024-04-10T21:32:50 elapsed from start 0.316667 minutes or 19.000000 seconds
│ Process=414562; Thread=140230631728704; GPU=0; Handle=POINTER (IN HEX:0x0x6250fb0); StreamId=POINTER (IN HEX:0x0x7f899f21b480); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasSetStream_v2(cublasHandle_t, cudaStream_t) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x6250fb0)
│ streamId: type=SOME TYPE; val=POINTER (IN HEX:0x0x7f899c3acd00)
│ Time: 2024-04-10T21:32:50 elapsed from start 0.316667 minutes or 19.000000 seconds
│ Process=414562; Thread=140230631728704; GPU=0; Handle=POINTER (IN HEX:0x0x6250fb0); StreamId=POINTER (IN HEX:0x0x7f899f21b480); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
@┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasSgemm_v2(cublasHandle_t, cublasOperation_t, cublasOperation_t, int, int, int, const float*, const float*, int, const float*, int, const float*, float*, int) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x6250fb0)
│ transa: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│ transb: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│ m: type=int; val=1024
│ n: type=int; val=1024
│ k: type=int; val=1024
│ alpha: type=float; val=POINTER (IN HEX:0x0x7f89fcfd9a38)
│ A: type=float; val=POINTER (IN HEX:0x0x7f8993a00000)
│ lda: type=int; val=1024
│ B: type=float; val=POINTER (IN HEX:0x0x6200c4800)
│ ldb: type=int; val=100000
│ beta: type=float; val=POINTER (IN HEX:0x0x7f89fcfd9a3c)
│ C: type=float; val=POINTER (IN HEX:0x0x70e776000)
│ ldc: type=int; val=100000
│ Time: 2024-04-10T21:32:50 elapsed from start 0.316667 minutes or 19.000000 seconds
│ Process=414562; Thread=140230631728704; GPU=0; Handle=POINTER (IN HEX:0x0x6250fb0); StreamId=POINTER (IN HEX:0x0x7f899c3acd00); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasSetStream_v2(cublasHandle_t, cudaStream_t) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x6250fb0)
│ streamId: type=SOME TYPE; val=POINTER (IN HEX:0x0x7f899f21b480)
│ Time: 2024-04-10T21:32:50 elapsed from start 0.316667 minutes or 19.000000 seconds
│ Process=414562; Thread=140230631728704; GPU=0; Handle=POINTER (IN HEX:0x0x6250fb0); StreamId=POINTER (IN HEX:0x0x7f899c3acd00); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
CUDA.CUBLAS┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasSgemm_v2(cublasHandle_t, cublasOperation_t, cublasOperation_t, int, int, int, const float*, const float*, int, const float*, int, const float*, float*, int) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x6250fb0)
│ transa: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│ transb: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│ m: type=int; val=1024
│ n: type=int; val=1024
│ k: type=int; val=1024
│ alpha: type=float; val=POINTER (IN HEX:0x0x7f89fcfd9a38)
│ A: type=float; val=POINTER (IN HEX:0x0x7f898a400000)
│ lda: type=int; val=1024
│ B: type=float; val=POINTER (IN HEX:0x0x6200c5800)
│ ldb: type=int; val=100000
│ beta: type=float; val=POINTER (IN HEX:0x0x7f89fcfd9a3c)
│ C: type=float; val=POINTER (IN HEX:0x0x70e776000)
│ ldc: type=int; val=100000
│ Time: 2024-04-10T21:32:50 elapsed from start 0.316667 minutes or 19.000000 seconds
│ Process=414562; Thread=140230631728704; GPU=0; Handle=POINTER (IN HEX:0x0x6250fb0); StreamId=POINTER (IN HEX:0x0x7f899f21b480); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasSetStream_v2(cublasHandle_t, cudaStream_t) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x6250fb0)
│ streamId: type=SOME TYPE; val=POINTER (IN HEX:0x0x7f899c3acd00)
│ Time: 2024-04-10T21:32:50 elapsed from start 0.316667 minutes or 19.000000 seconds
│ Process=414562; Thread=140230631728704; GPU=0; Handle=POINTER (IN HEX:0x0x6250fb0); StreamId=POINTER (IN HEX:0x0x7f899f21b480); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
~/.julia/packages/CUDA/fGE8R/lib/cublas/┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasSgemm_v2(cublasHandle_t, cublasOperation_t, cublasOperation_t, int, int, int, const float*, const float*, int, const float*, int, const float*, float*, int) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x6250fb0)
│ transa: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│ transb: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│ m: type=int; val=1024
│ n: type=int; val=1024
│ k: type=int; val=1024
│ alpha: type=float; val=POINTER (IN HEX:0x0x7f89fcfd9a38)
│ A: type=float; val=POINTER (IN HEX:0x0x7f8993a00000)
│ lda: type=int; val=1024
│ B: type=float; val=POINTER (IN HEX:0x0x6200c6800)
│ ldb: type=int; val=100000
│ beta: type=float; val=POINTER (IN HEX:0x0x7f89fcfd9a3c)
│ C: type=float; val=POINTER (IN HEX:0x0x70e776000)
│ ldc: type=int; val=100000
│ Time: 2024-04-10T21:32:50 elapsed from start 0.316667 minutes or 19.000000 seconds
│ Process=414562; Thread=140230631728704; GPU=0; Handle=POINTER (IN HEX:0x0x6250fb0); StreamId=POINTER (IN HEX:0x0x7f899c3acd00); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
libcublas.jl:14┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasSetStream_v2(cublasHandle_t, cudaStream_t) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x6250fb0)
│ streamId: type=SOME TYPE; val=POINTER (IN HEX:0x0x7f899f21b480)
│ Time: 2024-04-10T21:32:50 elapsed from start 0.316667 minutes or 19.000000 seconds
│ Process=414562; Thread=140230631728704; GPU=0; Handle=POINTER (IN HEX:0x0x6250fb0); StreamId=POINTER (IN HEX:0x0x7f899c3acd00); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasSgemm_v2(cublasHandle_t, cublasOperation_t, cublasOperation_t, int, int, int, const float*, const float*, int, const float*, int, const float*, float*, int) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x6250fb0)
│ transa: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│ transb: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│ m: type=int; val=1024
│ n: type=int; val=1024
│ k: type=int; val=1024
│ alpha: type=float; val=POINTER (IN HEX:0x0x7f89fcfd9a38)
│ A: type=float; val=POINTER (IN HEX:0x0x7f898a400000)
│ lda: type=int; val=1024
│ B: type=float; val=POINTER (IN HEX:0x0x6200c7800)
│ ldb: type=int; val=100000
│ beta: type=float; val=POINTER (IN HEX:0x0x7f89fcfd9a3c)
│ C: type=float; val=POINTER (IN HEX:0x0x70e776000)
│ ldc: type=int; val=100000
│ Time: 2024-04-10T21:32:50 elapsed from start 0.316667 minutes or 19.000000 seconds
│ Process=414562; Thread=140230631728704; GPU=0; Handle=POINTER (IN HEX:0x0x6250fb0); StreamId=POINTER (IN HEX:0x0x7f899f21b480); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasSetStream_v2(cublasHandle_t, cudaStream_t) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x6250fb0)
│ streamId: type=SOME TYPE; val=POINTER (IN HEX:0x0x7f899c3acd00)
│ Time: 2024-04-10T21:32:50 elapsed from start 0.316667 minutes or 19.000000 seconds
│ Process=414562; Thread=140230631728704; GPU=0; Handle=POINTER (IN HEX:0x0x6250fb0); StreamId=POINTER (IN HEX:0x0x7f899f21b480); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasSgemm_v2(cublasHandle_t, cublasOperation_t, cublasOperation_t, int, int, int, const float*, const float*, int, const float*, int, const float*, float*, int) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x6250fb0)
│ transa: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│ transb: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│ m: type=int; val=1024
│ n: type=int; val=1024
│ k: type=int; val=1024
│ alpha: type=float; val=POINTER (IN HEX:0x0x7f89fcfd9a38)
│ A: type=float; val=POINTER (IN HEX:0x0x7f8993a00000)
│ lda: type=int; val=1024
│ B: type=float; val=POINTER (IN HEX:0x0x6200c8800)
│ ldb: type=int; val=100000
│ beta: type=float; val=POINTER (IN HEX:0x0x7f89fcfd9a3c)
│ C: type=float; val=POINTER (IN HEX:0x0x70e776000)
│ ldc: type=int; val=100000
│ Time: 2024-04-10T21:32:50 elapsed from start 0.316667 minutes or 19.000000 seconds
│ Process=414562; Thread=140230631728704; GPU=0; Handle=POINTER (IN HEX:0x0x6250fb0); StreamId=POINTER (IN HEX:0x0x7f899c3acd00); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
[2]┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasSetStream_v2(cublasHandle_t, cudaStream_t) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x6250fb0)
│ streamId: type=SOME TYPE; val=POINTER (IN HEX:0x0x7f899f21b480)
│ Time: 2024-04-10T21:32:50 elapsed from start 0.316667 minutes or 19.000000 seconds
│ Process=414562; Thread=140230631728704; GPU=0; Handle=POINTER (IN HEX:0x0x6250fb0); StreamId=POINTER (IN HEX:0x0x7f899c3acd00); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasSgemm_v2(cublasHandle_t, cublasOperation_t, cublasOperation_t, int, int, int, const float*, const float*, int, const float*, int, const float*, float*, int) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x6250fb0)
│ transa: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│ transb: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│ m: type=int; val=1024
│ n: type=int; val=1024
│ k: type=int; val=1024
│ alpha: type=float; val=POINTER (IN HEX:0x0x7f89fcfd9a38)
│ A: type=float; val=POINTER (IN HEX:0x0x7f898a400000)
│ lda: type=int; val=1024
│ B: type=float; val=POINTER (IN HEX:0x0x6200c9800)
│ ldb: type=int; val=100000
│ beta: type=float; val=POINTER (IN HEX:0x0x7f89fcfd9a3c)
│ C: type=float; val=POINTER (IN HEX:0x0x70e776000)
│ ldc: type=int; val=100000
│ Time: 2024-04-10T21:32:50 elapsed from start 0.316667 minutes or 19.000000 seconds
│ Process=414562; Thread=140230631728704; GPU=0; Handle=POINTER (IN HEX:0x0x6250fb0); StreamId=POINTER (IN HEX:0x0x7f899f21b480); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
check┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasSetStream_v2(cublasHandle_t, cudaStream_t) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x6250fb0)
│ streamId: type=SOME TYPE; val=POINTER (IN HEX:0x0x7f899c3acd00)
│ Time: 2024-04-10T21:32:50 elapsed from start 0.316667 minutes or 19.000000 seconds
│ Process=414562; Thread=140230631728704; GPU=0; Handle=POINTER (IN HEX:0x0x6250fb0); StreamId=POINTER (IN HEX:0x0x7f899f21b480); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasSgemm_v2(cublasHandle_t, cublasOperation_t, cublasOperation_t, int, int, int, const float*, const float*, int, const float*, int, const float*, float*, int) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x6250fb0)
│ transa: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│ transb: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│ m: type=int; val=1024
│ n: type=int; val=1024
│ k: type=int; val=1024
│ alpha: type=float; val=POINTER (IN HEX:0x0x7f89fcfd9a38)
│ A: type=float; val=POINTER (IN HEX:0x0x7f8993a00000)
│ lda: type=int; val=1024
│ B: type=float; val=POINTER (IN HEX:0x0x6200ca800)
│ ldb: type=int; val=100000
│ beta: type=float; val=POINTER (IN HEX:0x0x7f89fcfd9a3c)
│ C: type=float; val=POINTER (IN HEX:0x0x70e776000)
│ ldc: type=int; val=100000
│ Time: 2024-04-10T21:32:50 elapsed from start 0.316667 minutes or 19.000000 seconds
│ Process=414562; Thread=140230631728704; GPU=0; Handle=POINTER (IN HEX:0x0x6250fb0); StreamId=POINTER (IN HEX:0x0x7f899c3acd00); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
@┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasSetStream_v2(cublasHandle_t, cudaStream_t) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x6250fb0)
│ streamId: type=SOME TYPE; val=POINTER (IN HEX:0x0x7f899f21b480)
│ Time: 2024-04-10T21:32:50 elapsed from start 0.316667 minutes or 19.000000 seconds
│ Process=414562; Thread=140230631728704; GPU=0; Handle=POINTER (IN HEX:0x0x6250fb0); StreamId=POINTER (IN HEX:0x0x7f899c3acd00); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasSgemm_v2(cublasHandle_t, cublasOperation_t, cublasOperation_t, int, int, int, const float*, const float*, int, const float*, int, const float*, float*, int) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x6250fb0)
│ transa: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│ transb: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│ m: type=int; val=1024
│ n: type=int; val=1024
│ k: type=int; val=1024
│ alpha: type=float; val=POINTER (IN HEX:0x0x7f89fcfd9a38)
│ A: type=float; val=POINTER (IN HEX:0x0x7f898a400000)
│ lda: type=int; val=1024
│ B: type=float; val=POINTER (IN HEX:0x0x6200cb800)
│ ldb: type=int; val=100000
│ beta: type=float; val=POINTER (IN HEX:0x0x7f89fcfd9a3c)
│ C: type=float; val=POINTER (IN HEX:0x0x70e776000)
│ ldc: type=int; val=100000
│ Time: 2024-04-10T21:32:50 elapsed from start 0.316667 minutes or 19.000000 seconds
│ Process=414562; Thread=140230631728704; GPU=0; Handle=POINTER (IN HEX:0x0x6250fb0); StreamId=POINTER (IN HEX:0x0x7f899f21b480); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
~/.julia/packages/CUDA/fGE8R/lib/cublas/┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasSetStream_v2(cublasHandle_t, cudaStream_t) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x6250fb0)
│ streamId: type=SOME TYPE; val=POINTER (IN HEX:0x0x7f899c3acd00)
│ Time: 2024-04-10T21:32:50 elapsed from start 0.316667 minutes or 19.000000 seconds
│ Process=414562; Thread=140230631728704; GPU=0; Handle=POINTER (IN HEX:0x0x6250fb0); StreamId=POINTER (IN HEX:0x0x7f899f21b480); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
libcublas.jl:27┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasSgemm_v2(cublasHandle_t, cublasOperation_t, cublasOperation_t, int, int, int, const float*, const float*, int, const float*, int, const float*, float*, int) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x6250fb0)
│ transa: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│ transb: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│ m: type=int; val=1024
│ n: type=int; val=1024
│ k: type=int; val=1024
│ alpha: type=float; val=POINTER (IN HEX:0x0x7f89fcfd9a38)
│ A: type=float; val=POINTER (IN HEX:0x0x7f8993a00000)
│ lda: type=int; val=1024
│ B: type=float; val=POINTER (IN HEX:0x0x6200cc800)
│ ldb: type=int; val=100000
│ beta: type=float; val=POINTER (IN HEX:0x0x7f89fcfd9a3c)
│ C: type=float; val=POINTER (IN HEX:0x0x70e776000)
│ ldc: type=int; val=100000
│ Time: 2024-04-10T21:32:50 elapsed from start 0.316667 minutes or 19.000000 seconds
│ Process=414562; Thread=140230631728704; GPU=0; Handle=POINTER (IN HEX:0x0x6250fb0); StreamId=POINTER (IN HEX:0x0x7f899c3acd00); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
[inlined]┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasSetStream_v2(cublasHandle_t, cudaStream_t) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x6250fb0)
│ streamId: type=SOME TYPE; val=POINTER (IN HEX:0x0x7f899f21b480)
│ Time: 2024-04-10T21:32:50 elapsed from start 0.316667 minutes or 19.000000 seconds
│ Process=414562; Thread=140230631728704; GPU=0; Handle=POINTER (IN HEX:0x0x6250fb0); StreamId=POINTER (IN HEX:0x0x7f899c3acd00); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasSgemm_v2(cublasHandle_t, cublasOperation_t, cublasOperation_t, int, int, int, const float*, const float*, int, const float*, int, const float*, float*, int) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x6250fb0)
│ transa: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│ transb: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│ m: type=int; val=1024
│ n: type=int; val=1024
│ k: type=int; val=1024
│ alpha: type=float; val=POINTER (IN HEX:0x0x7f89fcfd9a38)
│ A: type=float; val=POINTER (IN HEX:0x0x7f898a400000)
│ lda: type=int; val=1024
│ B: type=float; val=POINTER (IN HEX:0x0x6200cd800)
│ ldb: type=int; val=100000
│ beta: type=float; val=POINTER (IN HEX:0x0x7f89fcfd9a3c)
│ C: type=float; val=POINTER (IN HEX:0x0x70e776000)
│ ldc: type=int; val=100000
│ Time: 2024-04-10T21:32:50 elapsed from start 0.316667 minutes or 19.000000 seconds
│ Process=414562; Thread=140230631728704; GPU=0; Handle=POINTER (IN HEX:0x0x6250fb0); StreamId=POINTER (IN HEX:0x0x7f899f21b480); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasSetStream_v2(cublasHandle_t, cudaStream_t) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x6250fb0)
│ streamId: type=SOME TYPE; val=POINTER (IN HEX:0x0x7f899c3acd00)
│ Time: 2024-04-10T21:32:50 elapsed from start 0.316667 minutes or 19.000000 seconds
│ Process=414562; Thread=140230631728704; GPU=0; Handle=POINTER (IN HEX:0x0x6250fb0); StreamId=POINTER (IN HEX:0x0x7f899f21b480); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
[3]┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasSgemm_v2(cublasHandle_t, cublasOperation_t, cublasOperation_t, int, int, int, const float*, const float*, int, const float*, int, const float*, float*, int) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x6250fb0)
│ transa: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│ transb: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│ m: type=int; val=1024
│ n: type=int; val=1024
│ k: type=int; val=1024
│ alpha: type=float; val=POINTER (IN HEX:0x0x7f89fcfd9a38)
│ A: type=float; val=POINTER (IN HEX:0x0x7f8993a00000)
│ lda: type=int; val=1024
│ B: type=float; val=POINTER (IN HEX:0x0x6200ce800)
│ ldb: type=int; val=100000
│ beta: type=float; val=POINTER (IN HEX:0x0x7f89fcfd9a3c)
│ C: type=float; val=POINTER (IN HEX:0x0x70e776000)
│ ldc: type=int; val=100000
│ Time: 2024-04-10T21:32:50 elapsed from start 0.316667 minutes or 19.000000 seconds
│ Process=414562; Thread=140230631728704; GPU=0; Handle=POINTER (IN HEX:0x0x6250fb0); StreamId=POINTER (IN HEX:0x0x7f899c3acd00); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasSetStream_v2(cublasHandle_t, cudaStream_t) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x6250fb0)
│ streamId: type=SOME TYPE; val=POINTER (IN HEX:0x0x7f899f21b480)
│ Time: 2024-04-10T21:32:50 elapsed from start 0.316667 minutes or 19.000000 seconds
│ Process=414562; Thread=140230631728704; GPU=0; Handle=POINTER (IN HEX:0x0x6250fb0); StreamId=POINTER (IN HEX:0x0x7f899c3acd00); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
cublasXtSgemm┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasSgemm_v2(cublasHandle_t, cublasOperation_t, cublasOperation_t, int, int, int, const float*, const float*, int, const float*, int, const float*, float*, int) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x6250fb0)
│ transa: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│ transb: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│ m: type=int; val=1024
│ n: type=int; val=1024
│ k: type=int; val=1024
│ alpha: type=float; val=POINTER (IN HEX:0x0x7f89fcfd9a38)
│ A: type=float; val=POINTER (IN HEX:0x0x7f898a400000)
│ lda: type=int; val=1024
│ B: type=float; val=POINTER (IN HEX:0x0x6200cf800)
│ ldb: type=int; val=100000
│ beta: type=float; val=POINTER (IN HEX:0x0x7f89fcfd9a3c)
│ C: type=float; val=POINTER (IN HEX:0x0x70e776000)
│ ldc: type=int; val=100000
│ Time: 2024-04-10T21:32:50 elapsed from start 0.316667 minutes or 19.000000 seconds
│ Process=414562; Thread=140230631728704; GPU=0; Handle=POINTER (IN HEX:0x0x6250fb0); StreamId=POINTER (IN HEX:0x0x7f899f21b480); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasSetStream_v2(cublasHandle_t, cudaStream_t) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x6250fb0)
│ streamId: type=SOME TYPE; val=POINTER (IN HEX:0x0x7f899c3acd00)
│ Time: 2024-04-10T21:32:50 elapsed from start 0.316667 minutes or 19.000000 seconds
│ Process=414562; Thread=140230631728704; GPU=0; Handle=POINTER (IN HEX:0x0x6250fb0); StreamId=POINTER (IN HEX:0x0x7f899f21b480); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
@┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasSgemm_v2(cublasHandle_t, cublasOperation_t, cublasOperation_t, int, int, int, const float*, const float*, int, const float*, int, const float*, float*, int) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x6250fb0)
│ transa: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│ transb: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│ m: type=int; val=1024
│ n: type=int; val=1024
│ k: type=int; val=1024
│ alpha: type=float; val=POINTER (IN HEX:0x0x7f89fcfd9a38)
│ A: type=float; val=POINTER (IN HEX:0x0x7f8993a00000)
│ lda: type=int; val=1024
│ B: type=float; val=POINTER (IN HEX:0x0x6200d0800)
│ ldb: type=int; val=100000
│ beta: type=float; val=POINTER (IN HEX:0x0x7f89fcfd9a3c)
│ C: type=float; val=POINTER (IN HEX:0x0x70e776000)
│ ldc: type=int; val=100000
│ Time: 2024-04-10T21:32:50 elapsed from start 0.316667 minutes or 19.000000 seconds
│ Process=414562; Thread=140230631728704; GPU=0; Handle=POINTER (IN HEX:0x0x6250fb0); StreamId=POINTER (IN HEX:0x0x7f899c3acd00); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasSetStream_v2(cublasHandle_t, cudaStream_t) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x6250fb0)
│ streamId: type=SOME TYPE; val=POINTER (IN HEX:0x0x7f899f21b480)
│ Time: 2024-04-10T21:32:50 elapsed from start 0.316667 minutes or 19.000000 seconds
│ Process=414562; Thread=140230631728704; GPU=0; Handle=POINTER (IN HEX:0x0x6250fb0); StreamId=POINTER (IN HEX:0x0x7f899c3acd00); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
~/.julia/packages/CUDA/fGE8R/lib/utils/┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasSgemm_v2(cublasHandle_t, cublasOperation_t, cublasOperation_t, int, int, int, const float*, const float*, int, const float*, int, const float*, float*, int) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x6250fb0)
│ transa: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│ transb: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│ m: type=int; val=1024
│ n: type=int; val=1024
│ k: type=int; val=1024
│ alpha: type=float; val=POINTER (IN HEX:0x0x7f89fcfd9a38)
│ A: type=float; val=POINTER (IN HEX:0x0x7f898a400000)
│ lda: type=int; val=1024
│ B: type=float; val=POINTER (IN HEX:0x0x6200d1800)
│ ldb: type=int; val=100000
│ beta: type=float; val=POINTER (IN HEX:0x0x7f89fcfd9a3c)
│ C: type=float; val=POINTER (IN HEX:0x0x70e776000)
│ ldc: type=int; val=100000
│ Time: 2024-04-10T21:32:50 elapsed from start 0.316667 minutes or 19.000000 seconds
│ Process=414562; Thread=140230631728704; GPU=0; Handle=POINTER (IN HEX:0x0x6250fb0); StreamId=POINTER (IN HEX:0x0x7f899f21b480); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
call.jl:30┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasSetStream_v2(cublasHandle_t, cudaStream_t) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x6250fb0)
│ streamId: type=SOME TYPE; val=POINTER (IN HEX:0x0x7f899c3acd00)
│ Time: 2024-04-10T21:32:50 elapsed from start 0.316667 minutes or 19.000000 seconds
│ Process=414562; Thread=140230631728704; GPU=0; Handle=POINTER (IN HEX:0x0x6250fb0); StreamId=POINTER (IN HEX:0x0x7f899f21b480); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
[inlined]┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasSgemm_v2(cublasHandle_t, cublasOperation_t, cublasOperation_t, int, int, int, const float*, const float*, int, const float*, int, const float*, float*, int) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x6250fb0)
│ transa: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│ transb: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│ m: type=int; val=1024
│ n: type=int; val=1024
│ k: type=int; val=1024
│ alpha: type=float; val=POINTER (IN HEX:0x0x7f89fcfd9a38)
│ A: type=float; val=POINTER (IN HEX:0x0x7f8993a00000)
│ lda: type=int; val=1024
│ B: type=float; val=POINTER (IN HEX:0x0x6200d2800)
│ ldb: type=int; val=100000
│ beta: type=float; val=POINTER (IN HEX:0x0x7f89fcfd9a3c)
│ C: type=float; val=POINTER (IN HEX:0x0x70e776000)
│ ldc: type=int; val=100000
│ Time: 2024-04-10T21:32:50 elapsed from start 0.316667 minutes or 19.000000 seconds
│ Process=414562; Thread=140230631728704; GPU=0; Handle=POINTER (IN HEX:0x0x6250fb0); StreamId=POINTER (IN HEX:0x0x7f899c3acd00); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasSetStream_v2(cublasHandle_t, cudaStream_t) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x6250fb0)
│ streamId: type=SOME TYPE; val=POINTER (IN HEX:0x0x7f899f21b480)
│ Time: 2024-04-10T21:32:50 elapsed from start 0.316667 minutes or 19.000000 seconds
│ Process=414562; Thread=140230631728704; GPU=0; Handle=POINTER (IN HEX:0x0x6250fb0); StreamId=POINTER (IN HEX:0x0x7f899c3acd00); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasSgemm_v2(cublasHandle_t, cublasOperation_t, cublasOperation_t, int, int, int, const float*, const float*, int, const float*, int, const float*, float*, int) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x6250fb0)
│ transa: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│ transb: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│ m: type=int; val=1024
│ n: type=int; val=1024
│ k: type=int; val=1024
│ alpha: type=float; val=POINTER (IN HEX:0x0x7f89fcfd9a38)
│ A: type=float; val=POINTER (IN HEX:0x0x7f898a400000)
│ lda: type=int; val=1024
│ B: type=float; val=POINTER (IN HEX:0x0x6200d3800)
│ ldb: type=int; val=100000
│ beta: type=float; val=POINTER (IN HEX:0x0x7f89fcfd9a3c)
│ C: type=float; val=POINTER (IN HEX:0x0x70e776000)
│ ldc: type=int; val=100000
│ Time: 2024-04-10T21:32:50 elapsed from start 0.316667 minutes or 19.000000 seconds
│ Process=414562; Thread=140230631728704; GPU=0; Handle=POINTER (IN HEX:0x0x6250fb0); StreamId=POINTER (IN HEX:0x0x7f899f21b480); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
[4]┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasSetStream_v2(cublasHandle_t, cudaStream_t) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x6250fb0)
│ streamId: type=SOME TYPE; val=POINTER (IN HEX:0x0x7f899c3acd00)
│ Time: 2024-04-10T21:32:50 elapsed from start 0.316667 minutes or 19.000000 seconds
│ Process=414562; Thread=140230631728704; GPU=0; Handle=POINTER (IN HEX:0x0x6250fb0); StreamId=POINTER (IN HEX:0x0x7f899f21b480); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasSgemm_v2(cublasHandle_t, cublasOperation_t, cublasOperation_t, int, int, int, const float*, const float*, int, const float*, int, const float*, float*, int) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x6250fb0)
│ transa: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│ transb: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│ m: type=int; val=1024
│ n: type=int; val=1024
│ k: type=int; val=1024
│ alpha: type=float; val=POINTER (IN HEX:0x0x7f89fcfd9a38)
│ A: type=float; val=POINTER (IN HEX:0x0x7f8993a00000)
│ lda: type=int; val=1024
│ B: type=float; val=POINTER (IN HEX:0x0x6200d4800)
│ ldb: type=int; val=100000
│ beta: type=float; val=POINTER (IN HEX:0x0x7f89fcfd9a3c)
│ C: type=float; val=POINTER (IN HEX:0x0x70e776000)
│ ldc: type=int; val=100000
│ Time: 2024-04-10T21:32:50 elapsed from start 0.316667 minutes or 19.000000 seconds
│ Process=414562; Thread=140230631728704; GPU=0; Handle=POINTER (IN HEX:0x0x6250fb0); StreamId=POINTER (IN HEX:0x0x7f899c3acd00); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
xt_gemm!(transA::Char, transB::Char, alpha::Int64, A::Matrix{Float32}, B::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, beta::Int64, C::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer})┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasSetStream_v2(cublasHandle_t, cudaStream_t) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x6250fb0)
│ streamId: type=SOME TYPE; val=POINTER (IN HEX:0x0x7f899f21b480)
│ Time: 2024-04-10T21:32:50 elapsed from start 0.316667 minutes or 19.000000 seconds
│ Process=414562; Thread=140230631728704; GPU=0; Handle=POINTER (IN HEX:0x0x6250fb0); StreamId=POINTER (IN HEX:0x0x7f899c3acd00); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasSgemm_v2(cublasHandle_t, cublasOperation_t, cublasOperation_t, int, int, int, const float*, const float*, int, const float*, int, const float*, float*, int) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x6250fb0)
│ transa: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│ transb: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│ m: type=int; val=1024
│ n: type=int; val=1024
│ k: type=int; val=1024
│ alpha: type=float; val=POINTER (IN HEX:0x0x7f89fcfd9a38)
│ A: type=float; val=POINTER (IN HEX:0x0x7f898a400000)
│ lda: type=int; val=1024
│ B: type=float; val=POINTER (IN HEX:0x0x6200d5800)
│ ldb: type=int; val=100000
│ beta: type=float; val=POINTER (IN HEX:0x0x7f89fcfd9a3c)
│ C: type=float; val=POINTER (IN HEX:0x0x70e776000)
│ ldc: type=int; val=100000
│ Time: 2024-04-10T21:32:50 elapsed from start 0.316667 minutes or 19.000000 seconds
│ Process=414562; Thread=140230631728704; GPU=0; Handle=POINTER (IN HEX:0x0x6250fb0); StreamId=POINTER (IN HEX:0x0x7f899f21b480); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
@┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasSetStream_v2(cublasHandle_t, cudaStream_t) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x6250fb0)
│ streamId: type=SOME TYPE; val=POINTER (IN HEX:0x0x7f899c3acd00)
│ Time: 2024-04-10T21:32:50 elapsed from start 0.316667 minutes or 19.000000 seconds
│ Process=414562; Thread=140230631728704; GPU=0; Handle=POINTER (IN HEX:0x0x6250fb0); StreamId=POINTER (IN HEX:0x0x7f899f21b480); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasSgemm_v2(cublasHandle_t, cublasOperation_t, cublasOperation_t, int, int, int, const float*, const float*, int, const float*, int, const float*, float*, int) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x6250fb0)
│ transa: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│ transb: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│ m: type=int; val=1024
│ n: type=int; val=1024
│ k: type=int; val=1024
│ alpha: type=float; val=POINTER (IN HEX:0x0x7f89fcfd9a38)
│ A: type=float; val=POINTER (IN HEX:0x0x7f8993a00000)
│ lda: type=int; val=1024
│ B: type=float; val=POINTER (IN HEX:0x0x6200d6800)
│ ldb: type=int; val=100000
│ beta: type=float; val=POINTER (IN HEX:0x0x7f89fcfd9a3c)
│ C: type=float; val=POINTER (IN HEX:0x0x70e776000)
│ ldc: type=int; val=100000
│ Time: 2024-04-10T21:32:50 elapsed from start 0.316667 minutes or 19.000000 seconds
│ Process=414562; Thread=140230631728704; GPU=0; Handle=POINTER (IN HEX:0x0x6250fb0); StreamId=POINTER (IN HEX:0x0x7f899c3acd00); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
CUDA.CUBLAS┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasSetStream_v2(cublasHandle_t, cudaStream_t) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x6250fb0)
│ streamId: type=SOME TYPE; val=POINTER (IN HEX:0x0x7f899f21b480)
│ Time: 2024-04-10T21:32:50 elapsed from start 0.316667 minutes or 19.000000 seconds
│ Process=414562; Thread=140230631728704; GPU=0; Handle=POINTER (IN HEX:0x0x6250fb0); StreamId=POINTER (IN HEX:0x0x7f899c3acd00); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasSgemm_v2(cublasHandle_t, cublasOperation_t, cublasOperation_t, int, int, int, const float*, const float*, int, const float*, int, const float*, float*, int) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x6250fb0)
│ transa: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│ transb: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│ m: type=int; val=1024
│ n: type=int; val=1024
│ k: type=int; val=1024
│ alpha: type=float; val=POINTER (IN HEX:0x0x7f89fcfd9a38)
│ A: type=float; val=POINTER (IN HEX:0x0x7f898a400000)
│ lda: type=int; val=1024
│ B: type=float; val=POINTER (IN HEX:0x0x6200d7800)
│ ldb: type=int; val=100000
│ beta: type=float; val=POINTER (IN HEX:0x0x7f89fcfd9a3c)
│ C: type=float; val=POINTER (IN HEX:0x0x70e776000)
│ ldc: type=int; val=100000
│ Time: 2024-04-10T21:32:50 elapsed from start 0.316667 minutes or 19.000000 seconds
│ Process=414562; Thread=140230631728704; GPU=0; Handle=POINTER (IN HEX:0x0x6250fb0); StreamId=POINTER (IN HEX:0x0x7f899f21b480); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
~/.julia/packages/CUDA/fGE8R/lib/cublas/┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasSetStream_v2(cublasHandle_t, cudaStream_t) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x6250fb0)
│ streamId: type=SOME TYPE; val=POINTER (IN HEX:0x0x7f899c3acd00)
│ Time: 2024-04-10T21:32:50 elapsed from start 0.316667 minutes or 19.000000 seconds
│ Process=414562; Thread=140230631728704; GPU=0; Handle=POINTER (IN HEX:0x0x6250fb0); StreamId=POINTER (IN HEX:0x0x7f899f21b480); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
wrappers.jl:2145┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasSgemm_v2(cublasHandle_t, cublasOperation_t, cublasOperation_t, int, int, int, const float*, const float*, int, const float*, int, const float*, float*, int) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x6250fb0)
│ transa: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│ transb: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│ m: type=int; val=1024
│ n: type=int; val=1024
│ k: type=int; val=1024
│ alpha: type=float; val=POINTER (IN HEX:0x0x7f89fcfd9a38)
│ A: type=float; val=POINTER (IN HEX:0x0x7f8993a00000)
│ lda: type=int; val=1024
│ B: type=float; val=POINTER (IN HEX:0x0x6200d8800)
│ ldb: type=int; val=100000
│ beta: type=float; val=POINTER (IN HEX:0x0x7f89fcfd9a3c)
│ C: type=float; val=POINTER (IN HEX:0x0x70e776000)
│ ldc: type=int; val=100000
│ Time: 2024-04-10T21:32:50 elapsed from start 0.316667 minutes or 19.000000 seconds
│ Process=414562; Thread=140230631728704; GPU=0; Handle=POINTER (IN HEX:0x0x6250fb0); StreamId=POINTER (IN HEX:0x0x7f899c3acd00); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasSetStream_v2(cublasHandle_t, cudaStream_t) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x6250fb0)
│ streamId: type=SOME TYPE; val=POINTER (IN HEX:0x0x7f899f21b480)
│ Time: 2024-04-10T21:32:50 elapsed from start 0.316667 minutes or 19.000000 seconds
│ Process=414562; Thread=140230631728704; GPU=0; Handle=POINTER (IN HEX:0x0x6250fb0); StreamId=POINTER (IN HEX:0x0x7f899c3acd00); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasSgemm_v2(cublasHandle_t, cublasOperation_t, cublasOperation_t, int, int, int, const float*, const float*, int, const float*, int, const float*, float*, int) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x6250fb0)
│ transa: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│ transb: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│ m: type=int; val=1024
│ n: type=int; val=1024
│ k: type=int; val=1024
│ alpha: type=float; val=POINTER (IN HEX:0x0x7f89fcfd9a38)
│ A: type=float; val=POINTER (IN HEX:0x0x7f898a400000)
│ lda: type=int; val=1024
│ B: type=float; val=POINTER (IN HEX:0x0x6200d9800)
│ ldb: type=int; val=100000
│ beta: type=float; val=POINTER (IN HEX:0x0x7f89fcfd9a3c)
│ C: type=float; val=POINTER (IN HEX:0x0x70e776000)
│ ldc: type=int; val=100000
│ Time: 2024-04-10T21:32:50 elapsed from start 0.316667 minutes or 19.000000 seconds
│ Process=414562; Thread=140230631728704; GPU=0; Handle=POINTER (IN HEX:0x0x6250fb0); StreamId=POINTER (IN HEX:0x0x7f899f21b480); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasSetStream_v2(cublasHandle_t, cudaStream_t) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x6250fb0)
│ streamId: type=SOME TYPE; val=POINTER (IN HEX:0x0x7f899c3acd00)
│ Time: 2024-04-10T21:32:50 elapsed from start 0.316667 minutes or 19.000000 seconds
│ Process=414562; Thread=140230631728704; GPU=0; Handle=POINTER (IN HEX:0x0x6250fb0); StreamId=POINTER (IN HEX:0x0x7f899f21b480); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
[5]┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasSgemm_v2(cublasHandle_t, cublasOperation_t, cublasOperation_t, int, int, int, const float*, const float*, int, const float*, int, const float*, float*, int) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x6250fb0)
│ transa: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│ transb: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│ m: type=int; val=1024
│ n: type=int; val=1024
│ k: type=int; val=1024
│ alpha: type=float; val=POINTER (IN HEX:0x0x7f89fcfd9a38)
│ A: type=float; val=POINTER (IN HEX:0x0x7f8993a00000)
│ lda: type=int; val=1024
│ B: type=float; val=POINTER (IN HEX:0x0x6200da800)
│ ldb: type=int; val=100000
│ beta: type=float; val=POINTER (IN HEX:0x0x7f89fcfd9a3c)
│ C: type=float; val=POINTER (IN HEX:0x0x70e776000)
│ ldc: type=int; val=100000
│ Time: 2024-04-10T21:32:50 elapsed from start 0.316667 minutes or 19.000000 seconds
│ Process=414562; Thread=140230631728704; GPU=0; Handle=POINTER (IN HEX:0x0x6250fb0); StreamId=POINTER (IN HEX:0x0x7f899c3acd00); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasSetStream_v2(cublasHandle_t, cudaStream_t) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x6250fb0)
│ streamId: type=SOME TYPE; val=POINTER (IN HEX:0x0x7f899f21b480)
│ Time: 2024-04-10T21:32:50 elapsed from start 0.316667 minutes or 19.000000 seconds
│ Process=414562; Thread=140230631728704; GPU=0; Handle=POINTER (IN HEX:0x0x6250fb0); StreamId=POINTER (IN HEX:0x0x7f899c3acd00); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
top-level scope┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasSgemm_v2(cublasHandle_t, cublasOperation_t, cublasOperation_t, int, int, int, const float*, const float*, int, const float*, int, const float*, float*, int) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x6250fb0)
│ transa: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│ transb: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│ m: type=int; val=1024
│ n: type=int; val=1024
│ k: type=int; val=1024
│ alpha: type=float; val=POINTER (IN HEX:0x0x7f89fcfd9a38)
│ A: type=float; val=POINTER (IN HEX:0x0x7f898a400000)
│ lda: type=int; val=1024
│ B: type=float; val=POINTER (IN HEX:0x0x6200db800)
│ ldb: type=int; val=100000
│ beta: type=float; val=POINTER (IN HEX:0x0x7f89fcfd9a3c)
│ C: type=float; val=POINTER (IN HEX:0x0x70e776000)
│ ldc: type=int; val=100000
│ Time: 2024-04-10T21:32:50 elapsed from start 0.316667 minutes or 19.000000 seconds
│ Process=414562; Thread=140230631728704; GPU=0; Handle=POINTER (IN HEX:0x0x6250fb0); StreamId=POINTER (IN HEX:0x0x7f899f21b480); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasSetStream_v2(cublasHandle_t, cudaStream_t) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x6250fb0)
│ streamId: type=SOME TYPE; val=POINTER (IN HEX:0x0x7f899c3acd00)
│ Time: 2024-04-10T21:32:50 elapsed from start 0.316667 minutes or 19.000000 seconds
│ Process=414562; Thread=140230631728704; GPU=0; Handle=POINTER (IN HEX:0x0x6250fb0); StreamId=POINTER (IN HEX:0x0x7f899f21b480); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
@┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasSgemm_v2(cublasHandle_t, cublasOperation_t, cublasOperation_t, int, int, int, const float*, const float*, int, const float*, int, const float*, float*, int) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x6250fb0)
│ transa: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│ transb: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│ m: type=int; val=1024
│ n: type=int; val=1024
│ k: type=int; val=1024
│ alpha: type=float; val=POINTER (IN HEX:0x0x7f89fcfd9a38)
│ A: type=float; val=POINTER (IN HEX:0x0x7f8993a00000)
│ lda: type=int; val=1024
│ B: type=float; val=POINTER (IN HEX:0x0x6200dc800)
│ ldb: type=int; val=100000
│ beta: type=float; val=POINTER (IN HEX:0x0x7f89fcfd9a3c)
│ C: type=float; val=POINTER (IN HEX:0x0x70e776000)
│ ldc: type=int; val=100000
│ Time: 2024-04-10T21:32:50 elapsed from start 0.316667 minutes or 19.000000 seconds
│ Process=414562; Thread=140230631728704; GPU=0; Handle=POINTER (IN HEX:0x0x6250fb0); StreamId=POINTER (IN HEX:0x0x7f899c3acd00); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasSetStream_v2(cublasHandle_t, cudaStream_t) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x6250fb0)
│ streamId: type=SOME TYPE; val=POINTER (IN HEX:0x0x7f899f21b480)
│ Time: 2024-04-10T21:32:50 elapsed from start 0.316667 minutes or 19.000000 seconds
│ Process=414562; Thread=140230631728704; GPU=0; Handle=POINTER (IN HEX:0x0x6250fb0); StreamId=POINTER (IN HEX:0x0x7f899c3acd00); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
~/mwe_xt_gemm/┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasSgemm_v2(cublasHandle_t, cublasOperation_t, cublasOperation_t, int, int, int, const float*, const float*, int, const float*, int, const float*, float*, int) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x6250fb0)
│ transa: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│ transb: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│ m: type=int; val=1024
│ n: type=int; val=1024
│ k: type=int; val=1024
│ alpha: type=float; val=POINTER (IN HEX:0x0x7f89fcfd9a38)
│ A: type=float; val=POINTER (IN HEX:0x0x7f898a400000)
│ lda: type=int; val=1024
│ B: type=float; val=POINTER (IN HEX:0x0x6200dd800)
│ ldb: type=int; val=100000
│ beta: type=float; val=POINTER (IN HEX:0x0x7f89fcfd9a3c)
│ C: type=float; val=POINTER (IN HEX:0x0x70e776000)
│ ldc: type=int; val=100000
│ Time: 2024-04-10T21:32:50 elapsed from start 0.316667 minutes or 19.000000 seconds
│ Process=414562; Thread=140230631728704; GPU=0; Handle=POINTER (IN HEX:0x0x6250fb0); StreamId=POINTER (IN HEX:0x0x7f899f21b480); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
mwe.jl:16┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasSetStream_v2(cublasHandle_t, cudaStream_t) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x6250fb0)
│ streamId: type=SOME TYPE; val=POINTER (IN HEX:0x0x7f899c3acd00)
│ Time: 2024-04-10T21:32:50 elapsed from start 0.316667 minutes or 19.000000 seconds
│ Process=414562; Thread=140230631728704; GPU=0; Handle=POINTER (IN HEX:0x0x6250fb0); StreamId=POINTER (IN HEX:0x0x7f899f21b480); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasSgemm_v2(cublasHandle_t, cublasOperation_t, cublasOperation_t, int, int, int, const float*, const float*, int, const float*, int, const float*, float*, int) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x6250fb0)
│ transa: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│ transb: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│ m: type=int; val=1024
│ n: type=int; val=1024
│ k: type=int; val=1024
│ alpha: type=float; val=POINTER (IN HEX:0x0x7f89fcfd9a38)
│ A: type=float; val=POINTER (IN HEX:0x0x7f8993a00000)
│ lda: type=int; val=1024
│ B: type=float; val=POINTER (IN HEX:0x0x6200de800)
│ ldb: type=int; val=100000
│ beta: type=float; val=POINTER (IN HEX:0x0x7f89fcfd9a3c)
│ C: type=float; val=POINTER (IN HEX:0x0x70e776000)
│ ldc: type=int; val=100000
│ Time: 2024-04-10T21:32:50 elapsed from start 0.316667 minutes or 19.000000 seconds
│ Process=414562; Thread=140230631728704; GPU=0; Handle=POINTER (IN HEX:0x0x6250fb0); StreamId=POINTER (IN HEX:0x0x7f899c3acd00); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
in expression starting at /home/lpawela/mwe_xt_gemm/mwe.jl:16┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasSetStream_v2(cublasHandle_t, cudaStream_t) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x6250fb0)
│ streamId: type=SOME TYPE; val=POINTER (IN HEX:0x0x7f899f21b480)
│ Time: 2024-04-10T21:32:50 elapsed from start 0.316667 minutes or 19.000000 seconds
│ Process=414562; Thread=140230631728704; GPU=0; Handle=POINTER (IN HEX:0x0x6250fb0); StreamId=POINTER (IN HEX:0x0x7f899c3acd00); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224
┌ Debug: cuBLAS (v12.3) function cublasStatus_t cublasSgemm_v2(cublasHandle_t, cublasOperation_t, cublasOperation_t, int, int, int, const float*, const float*, int, const float*, int, const float*, float*, int) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x6250fb0)
│ transa: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│ transb: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│ m: type=int; val=1024
│ n: type=int; val=1024
│ k: type=int; val=1024
│ alpha: type=float; val=POINTER (IN HEX:0x0x7f89fcfd9a38)
│ A: type=float; val=POINTER (IN HEX:0x0x7f898a400000)
│ lda: type=int; val=1024
│ B: type=float; val=POINTER (IN HEX:0x0x6200df800)
│ ldb: type=int; val=100000
│ beta: type=float; val=POINTER (IN HEX:0x0x7f89fcfd9a3c)
│ C: type=float; val=POINTER (IN HEX:0x0x70e776000)
│ ldc: type=int; val=100000
│ Time: 2024-04-10T21:32:50 elapsed from start 0.316667 minutes or 19.000000 seconds
│ Process=414562; Thread=140230631728704; GPU=0; Handle=POINTER (IN HEX:0x0x6250fb0); StreamId=POINTER (IN HEX:0x0x7f899f21b480); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/fGE8R/lib/cublas/CUBLAS.jl:224 |
As for the C++ it was a bit larger code, but this is the main part template<class T=float>
void test(
cublasHandle_t &handle, cublasXtHandle_t &xtHandle,
curandGenerator_t &prng,
size_t n, size_t m, size_t k,
bool ah = false, bool bh = false, bool ch = false) {
T *A;
T *B;
T *C;
static T zero = 0.0;
static T one = 1.0;
static T mone = -1.0;
static T is2 = pow(T(0.5), T(0.5));
/*
CUDA_CALL(cudaMalloc((void **)(&A), n * m * sizeof(T)));
CUDA_CALL(cudaDeviceSynchronize());
if (n == m) {
T *HA;
CUDA_CALL(cudaMalloc((void **)(&HA), n * n * sizeof(T)));
CUDA_CALL(cudaDeviceSynchronize());
CURAND_CALL(curandGenerateNormalAny(prng, HA, n * m, zero, is2));
CUDA_CALL(cudaDeviceSynchronize());
CUBLAS_CALL(cublasgeam<T>(
handle, CUBLAS_OP_N, CUBLAS_OP_T,
n, n, &one, HA, n, &mone, HA, n, A, n));
CUDA_CALL(cudaDeviceSynchronize());
CUDA_CALL(cudaFree(HA)); HA = nullptr;
} else {
CURAND_CALL(curandGenerateNormalAny(prng, A, n * m, zero, one));
}
CUDA_CALL(cudaDeviceSynchronize());
*/
CUDA_CALL(cudaMalloc((void **)(&B), m * k * sizeof(T)));
CUDA_CALL(cudaDeviceSynchronize());
CURAND_CALL(curandGenerateNormalAny(prng, B, m * k, zero, one));
CUDA_CALL(cudaDeviceSynchronize());
CUDA_CALL(cudaMalloc((void **)(&C), n * k * sizeof(T)));
CUDA_CALL(cudaDeviceSynchronize());
cudaEvent_t start, stop;
/*if (ah) {
T *oM, *hM = (T *)(malloc(n * m * sizeof(T)));
CUDA_CALL(cudaMemcpy(
hM, A, n * m * sizeof(T), cudaMemcpyDeviceToHost));
oM = A; A = hM;
CUDA_CALL(cudaFree(oM)); oM = nullptr;
}*/
A = (T *)(malloc(n * m * sizeof(T)));
if (bh) {
T *oM, *hM = (T *)(malloc(m * k * sizeof(T)));
CUDA_CALL(cudaMemcpy(
hM, B, m * k * sizeof(T), cudaMemcpyDeviceToHost));
oM = B; B = hM;
CUDA_CALL(cudaFree(oM)); oM = nullptr;
}
if (ch) {
T *oM, *hM = (T *)(malloc(n * k * sizeof(T)));
CUDA_CALL(cudaMemcpy(
hM, C, n * k * sizeof(T), cudaMemcpyDeviceToHost));
oM = C; C = hM;
CUDA_CALL(cudaFree(oM)); oM = nullptr;
}
CUDA_CALL(cudaEventCreate(&start));
CUDA_CALL(cudaEventCreate(&stop));
CUDA_CALL(cudaEventRecord(start));
CUDA_CALL(cudaEventSynchronize(start));
CUBLAS_CALL(cublasXtgemm(
xtHandle, CUBLAS_OP_N, CUBLAS_OP_N,
n, k, m, &one, A, n, B, m, &zero, C, n));
CUDA_CALL(cudaDeviceSynchronize());
CUDA_CALL(cudaEventRecord(stop));
CUDA_CALL(cudaEventSynchronize(stop));
float milliseconds = 0;
cudaEventElapsedTime(&milliseconds, start, stop);
printf(
"%-8s %-6s %7lu %7lu %7lu %-6s %-6s %-6s %10.3f\n",
"cublasXt",
theTypename<T>(),
n, m, k,
ah ? "HOST" : "DEVICE",
bh ? "HOST" : "DEVICE",
ch ? "HOST" : "DEVICE",
milliseconds);
CUDA_CALL(cudaEventDestroy(stop));
CUDA_CALL(cudaEventDestroy(start));
CUDA_CALL(cudaDeviceSynchronize());
/*
T *dA = A, *dB = B, *dC = C;
if (ah) CUDA_CALL(cudaMalloc((void **)(&dA), n * m * sizeof(T)));
if (bh) CUDA_CALL(cudaMalloc((void **)(&dB), m * k * sizeof(T)));
if (ch) CUDA_CALL(cudaMalloc((void **)(&dC), n * k * sizeof(T)));
CUDA_CALL(cudaEventCreate(&start));
CUDA_CALL(cudaEventCreate(&stop));
CUDA_CALL(cudaEventRecord(start));
CUDA_CALL(cudaEventSynchronize(start));
if (ah)
CUDA_CALL(cudaMemcpy(
dA, A, n * m * sizeof(T), cudaMemcpyHostToDevice));
if (bh)
CUDA_CALL(cudaMemcpy(
dB, B, m * k * sizeof(T), cudaMemcpyHostToDevice));
if (ch)
CUDA_CALL(cudaMemcpy(
dC, C, n * k * sizeof(T), cudaMemcpyHostToDevice));
CUBLAS_CALL(cublasgemm(
handle, CUBLAS_OP_N, CUBLAS_OP_N,
n, k, m, &one, dA, n, dB, m, &zero, dC, n));
CUDA_CALL(cudaDeviceSynchronize());
CUDA_CALL(cudaEventRecord(stop));
CUDA_CALL(cudaEventSynchronize(stop));
if (ch) CUDA_CALL(cudaFree(dC));
dC = nullptr;
if (bh) CUDA_CALL(cudaFree(dB));
dB = nullptr;
if (ah) CUDA_CALL(cudaFree(dA));
dA = nullptr;
milliseconds = 0;
cudaEventElapsedTime(&milliseconds, start, stop);
printf(
"%-8s %-6s %7lu %7lu %7lu %-6s %-6s %-6s %10.3f\n",
"CUBLAS",
theTypename<T>(),
n, m, k,
ah ? "HOST" : "DEVICE",
bh ? "HOST" : "DEVICE",
ch ? "HOST" : "DEVICE",
milliseconds);
CUDA_CALL(cudaEventDestroy(stop));
CUDA_CALL(cudaEventDestroy(start));
CUDA_CALL(cudaDeviceSynchronize());
*/
ch ? free(C) : CUDA_CALL(cudaFree(C)); C = nullptr;
bh ? free(B) : CUDA_CALL(cudaFree(B)); B = nullptr;
ah ? free(A) : CUDA_CALL(cudaFree(A)); A = nullptr;
CUDA_CALL(cudaDeviceSynchronize());
}
int main() {
cublasHandle_t handle;
CUBLAS_CALL(cublasCreate(&handle));
cublasXtHandle_t xtHandle;
CUBLAS_CALL(cublasXtCreate(&xtHandle));
int device_count = 1;
CUDA_CALL(cudaGetDeviceCount(&device_count));
printf("device_count = %d\n", device_count);
int *device_ids = (int *)(malloc(device_count * sizeof(int)));
for (int idx = 0; idx < device_count; ++idx) {
device_ids[idx] = idx;
}
CUBLAS_CALL(cublasXtDeviceSelect(xtHandle, device_count, device_ids));
free(device_ids); device_ids = nullptr;
curandGenerator_t prng;
CURAND_CALL(curandCreateGenerator(&prng, CURAND_RNG_PSEUDO_XORWOW));
CURAND_CALL(curandSetPseudoRandomGeneratorSeed(prng, 0xDEADBEEF));
int p = 35000;
printf(
"%-8s %-6s %7s %7s %7s %-6s %-6s %-6s %10s\n",
"library", "type", "n", "m", "k",
"A_mem", "B_mem", "C_mem", "time [ms]");
//for (size_t idx = 0; idx < 8; ++idx) {
size_t idx = 1;
bool ah = idx & 1;
bool bh = idx & 2;
bool ch = idx & 4;
test(handle, xtHandle, prng, 10 * p, 10 * p, 1000, ah, bh, ch);
test<double>(handle, xtHandle, prng, 10 * p, 10 * p, 1000, ah, bh, ch);
//}
CURAND_CALL(curandDestroyGenerator(prng));
CUBLAS_CALL(cublasXtDestroy(xtHandle));
CUBLAS_CALL(cublasDestroy(handle));
CUDA_CALL(cudaDeviceSynchronize());
return 0;
}
|
C++ MWE that does reproduce the error: #include <iostream>
#include <vector>
#include <cuda.h>
#include <cublasXt.h>
// Error checking for CUDA Driver API
#define CUDA_CHECK(call) { gpuAssert((call), __FILE__, __LINE__); }
inline void gpuAssert(CUresult code, const char *file, int line, bool abort=true) {
if (code != CUDA_SUCCESS) {
const char *error_string;
cuGetErrorString(code, &error_string);
std::cerr << "CUDA Driver API error: " << error_string << " at " << file << ":" << line << std::endl;
if (abort) exit(code);
}
}
// Error checking for CUBLAS API
#define CUBLAS_CHECK(status) { cublasAssert((status), __FILE__, __LINE__); }
inline void cublasAssert(cublasStatus_t status, const char *file, int line, bool abort=true) {
if (status != CUBLAS_STATUS_SUCCESS) {
std::cerr << "CUBLAS API error: " << status << " at " << file << ":" << line << std::endl;
if (abort) exit(status);
}
}
int main() {
// Initialize CUDA
CUDA_CHECK(cuInit(0));
// Set up primary contexts for both devices
std::vector<int> device_ids = {0, 1};
std::vector<CUcontext> contexts(device_ids.size());
for (auto id : device_ids) {
CUdevice cuDevice;
CUDA_CHECK(cuDeviceGet(&cuDevice, id));
CUDA_CHECK(cuDevicePrimaryCtxRetain(&contexts[id], cuDevice));
}
// Activate the first device's context
CUDA_CHECK(cuCtxSetCurrent(contexts[0]));
// Create a stream for operations
CUstream stream;
CUDA_CHECK(cuStreamCreate(&stream, CU_STREAM_DEFAULT));
// Allocate memory from the pool
CUdeviceptr d_A, d_B, d_C;
size_t m = 100000, n = 10000, k = 100000;
size_t bytes_A = m * k * sizeof(float);
size_t bytes_B = k * n * sizeof(float);
size_t bytes_C = m * n * sizeof(float);
CUDA_CHECK(cuMemAllocAsync(&d_A, bytes_A, stream));
CUDA_CHECK(cuMemAllocAsync(&d_B, bytes_B, stream));
CUDA_CHECK(cuMemAllocAsync(&d_C, bytes_C, stream));
// Set up CUBLAS Xt
cublasXtHandle_t xtHandle;
CUBLAS_CHECK(cublasXtCreate(&xtHandle));
// Configure CUBLAS Xt to use both devices
CUBLAS_CHECK(cublasXtDeviceSelect(xtHandle, device_ids.size(), device_ids.data()));
// Perform the matrix multiplication
float alpha = 1.0f, beta = 0.0f;
CUBLAS_CHECK(cublasXtSgemm(xtHandle, CUBLAS_OP_N, CUBLAS_OP_N, m, n, k, &alpha, (float*)d_A, m, (float*)d_B, k, &beta, (float*)d_C, m));
return 0;
} It seems related to the |
Filed a bug with NVIDIA. Workaround: run with |
xt_gemm!
causes code 11, CUBLAS_STATUS_MAPPING_ERRORxt_gemm!
incompatible with stream-ordered allocated memory
I can confirm the workaround makes the error go away. |
This should be fixed with #2398, without having to use the legacy allocator. Do note however that you still have to manually synchronize after allocating data, because as the name implies the asynchronous (stream-ordered) allocator does not guarantee that allocations are ready to use on other streams (or devices) after calling the allocator. |
If your bug is still valid, please go ahead and fill out the template below.
Describe the bug
After launching the second part of the MWE I get the following error
To reproduce
The Minimal Working Example (MWE) for this bug:
Manifest.toml
Expected behavior
The second multiplication also passes.
Version info
Details on Julia:
Details on CUDA:
Additional context
Similar and larger examples (up to 500k) work in C++ on the same setup.
The text was updated successfully, but these errors were encountered: