-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NeuralODE training failed on GPU with Enzyme #2478
Comments
@maleadt I'd assign myself, but I don't have permissions. For ease, would it be possible to have permissions added? |
@yunan-l can you paste the versions of CUDA/Enzyme you're using, as well as system inforamtion? |
sure, CUDA runtime 12.5, artifact installation
CUDA driver 12.6
NVIDIA driver 535.154.5, originally for CUDA 12.2
CUDA libraries:
- CUBLAS: 12.3.4
- CURAND: 10.3.6
- CUFFT: 11.2.3
- CUSOLVER: 11.6.3
- CUSPARSE: 12.5.1
- CUPTI: 2024.2.1 (API 23.0.0)
- NVML: 12.0.0+535.154.5
Julia packages:
- CUDA: 5.4.3
- CUDA_Driver_jll: 0.9.2+0
- CUDA_Runtime_jll: 0.14.1+0
Toolchain:
- Julia: 1.10.0
- LLVM: 15.0.7
1 device:
0: NVIDIA H100 80GB HBM3 (sm_90, 75.982 GiB / 79.647 GiB available) the Enzyme version, Enzyme v0.12.32 the system information, Platform Info,
OS: Linux (x86_64-linux-gnu)
CPU: 128 × AMD EPYC 9554 64-Core Processor
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, znver3)
Threads: 2 on 128 virtual cores |
Hi @wsmoses, there is an update error, which seems to be after I updated the CUDA.jl. Now, the CUDA version is below, CUDA.versioninfo()
CUDA runtime 12.5, artifact installation
CUDA driver 12.6
NVIDIA driver 535.154.5, originally for CUDA 12.2
CUDA libraries:
- CUBLAS: 12.5.3
- CURAND: 10.3.6
- CUFFT: 11.2.3
- CUSOLVER: 11.6.3
- CUSPARSE: 12.5.1
- CUPTI: 2024.2.1 (API 23.0.0)
- NVML: 12.0.0+535.154.5
Julia packages:
- CUDA: 5.4.3
- CUDA_Driver_jll: 0.9.2+0
- CUDA_Runtime_jll: 0.14.1+0
Toolchain:
- Julia: 1.10.0
- LLVM: 15.0.7
1 device:
0: NVIDIA H100 80GB HBM3 (sm_90, 78.214 GiB / 79.647 GiB available) the only difference with the above version is the Error & Stacktrace No augmented forward pass found for cublasLtMatmulDescCreate
at context: %173 = call i32 @cublasLtMatmulDescCreate(i64 %bitcast_coercion, i32 %unbox32, i32 0) #469 [ "jl_roots"({} addrspace(10)* %166) ], !dbg !535
Stacktrace:
[1] macro expansion
@ ~/.julia/packages/CUDA/Tl08O/lib/utils/call.jl:218
[2] macro expansion
@ ~/.julia/packages/CUDA/Tl08O/lib/cublas/libcublasLt.jl:400
[3] #1158
@ ~/.julia/packages/CUDA/Tl08O/lib/utils/call.jl:35
[4] retry_reclaim
@ ~/.julia/packages/CUDA/Tl08O/src/memory.jl:434
[5] check
@ ~/.julia/packages/CUDA/Tl08O/lib/cublas/libcublas.jl:24
[6] cublasLtMatmulDescCreate
@ ~/.julia/packages/CUDA/Tl08O/lib/utils/call.jl:34
[7] cublaslt_matmul_fused!
@ ~/.julia/packages/LuxLib/mR6WV/ext/LuxLibCUDAExt/cublaslt.jl:63
Stacktrace:
[1] macro expansion
@ ~/.julia/packages/CUDA/Tl08O/lib/utils/call.jl:218 [inlined]
[2] macro expansion
@ ~/.julia/packages/CUDA/Tl08O/lib/cublas/libcublasLt.jl:400 [inlined]
[3] #1158
@ ~/.julia/packages/CUDA/Tl08O/lib/utils/call.jl:35 [inlined]
[4] retry_reclaim
@ ~/.julia/packages/CUDA/Tl08O/src/memory.jl:434 [inlined]
[5] check
@ ~/.julia/packages/CUDA/Tl08O/lib/cublas/libcublas.jl:24 [inlined]
[6] cublasLtMatmulDescCreate
@ ~/.julia/packages/CUDA/Tl08O/lib/utils/call.jl:34 [inlined]
[7] cublaslt_matmul_fused!
@ ~/.julia/packages/LuxLib/mR6WV/ext/LuxLibCUDAExt/cublaslt.jl:63
[8] cublaslt_matmul_fused!
@ ~/.julia/packages/LuxLib/mR6WV/ext/LuxLibCUDAExt/cublaslt.jl:13 [inlined]
[9] cublasLt_fused_dense!
@ ~/.julia/packages/LuxLib/mR6WV/ext/LuxLibCUDAExt/cublaslt.jl:196
[10] cublasLt_fused_dense!
@ ~/.julia/packages/LuxLib/mR6WV/ext/LuxLibCUDAExt/cublaslt.jl:194 [inlined]
[11] fused_dense!
@ ~/.julia/packages/LuxLib/mR6WV/src/impl/dense.jl:38 [inlined]
[12] fused_dense
@ ~/.julia/packages/LuxLib/mR6WV/src/impl/dense.jl:24 [inlined]
[13] fused_dense
@ ~/.julia/packages/LuxLib/mR6WV/src/impl/dense.jl:11 [inlined]
[14] fused_dense_bias_activation
@ ~/.julia/packages/LuxLib/mR6WV/src/api/dense.jl:31 [inlined]
[15] Dense
@ ~/.julia/packages/Lux/PsW4M/src/layers/basic.jl:366
[16] Dense
@ ~/.julia/packages/Lux/PsW4M/src/layers/basic.jl:356
[17] apply
@ ~/.julia/packages/LuxCore/yzx6E/src/LuxCore.jl:171 [inlined]
[18] macro expansion
@ ~/.julia/packages/Lux/PsW4M/src/layers/containers.jl:0 [inlined]
[19] applychain
@ ~/.julia/packages/Lux/PsW4M/src/layers/containers.jl:520
[20] Chain
@ ~/.julia/packages/Lux/PsW4M/src/layers/containers.jl:518 [inlined]
[21] apply
@ ~/.julia/packages/LuxCore/yzx6E/src/LuxCore.jl:171 [inlined]
[22] dudt
@ ./In[5]:21 [inlined]
[23] dudt
@ ./In[5]:18 [inlined]
[24] ODEFunction
@ ~/.julia/packages/SciMLBase/HReyK/src/scimlfunctions.jl:2335 [inlined]
[25] #138
@ ~/.julia/packages/SciMLSensitivity/se3y4/src/adjoint_common.jl:490 [inlined]
[26] diffejulia__138_34195_inner_1wrap
@ ~/.julia/packages/SciMLSensitivity/se3y4/src/adjoint_common.jl:0
[27] macro expansion
@ ~/.julia/packages/Enzyme/YWQiS/src/compiler.jl:7099 [inlined]
[28] enzyme_call
@ ~/.julia/packages/Enzyme/YWQiS/src/compiler.jl:6708 [inlined]
[29] CombinedAdjointThunk
@ ~/.julia/packages/Enzyme/YWQiS/src/compiler.jl:6585 [inlined]
[30] autodiff
@ ~/.julia/packages/Enzyme/YWQiS/src/Enzyme.jl:320 [inlined]
.... the whole log please see attached. |
oh yeah, so here the issue is we added support for some cublas routines, but I haven't seen this |
@avik-pal it looks like this might be best implemented as a custom rule for |
closing in favor of LuxDL/LuxLib.jl#148 |
This particular error should for cuBLASLt is not fixed upstream. (Note you need to install Lux v1 and LuxLib v1.1 for the patch) |
Hi, when I try to train a NeuralODE with Discretecallback using
sensealg=InterpolatingAdjoint(autojacvec=EnzymeVJP(), checkpointing=true)
, I get:the whole log please see attached below:
EnzymeVJP.failed.txt
Here is the main julia code, which works on CPU, but not GPU.
https://discourse.julialang.org/t/neuralode-training-failed-on-gpu-with-enzyme/118537/5 @wsmoses
The text was updated successfully, but these errors were encountered: