-
-
Notifications
You must be signed in to change notification settings - Fork 611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ConvTranspose can cause Julia crash on GPU #2193
Comments
Your input has 4 channels but the conv layer only expects to take 2. |
I'm aware of the channels mismatch. But even after I change to use the right channels, the |
That seems like a bigger error and not one I've seen before. If I had to guess, something about your CUDA setup is broken. Can you post the version of CUDA and the output of |
Here's the cuda version info:
On code that works, CUDA version is |
Cannot reproduce. At least if I understood correctly, and the claim is that julia> using Flux, CUDA
julia> x = rand(Float32, 32, 32, 4, 1) |> gpu;
julia> nn = ConvTranspose((4,4), 2 => 2) |> gpu;
julia> nn(x) |> summary # correctly fails
ERROR: DimensionMismatch: layer ConvTranspose((4, 4), 2 => 2) expects size(input, 3) == 2, but got 32×32×4×1 CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}
julia> x2 = rand(Float32, 32, 32, 2, 1) |> gpu;
julia> nn(x2) |> summary
"35×35×2×1 CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}"
(@v1.10) pkg> st Flux CUDA
Status `~/.julia/environments/v1.10/Project.toml`
[052768ef] CUDA v4.0.1
[587475ba] Flux v0.13.13
julia> CUDA.device()
CuDevice(0): Tesla V100-PCIE-16GB
julia> CUDA.versioninfo()
CUDA runtime 11.8, artifact installation
CUDA driver 11.6
NVIDIA driver 510.47.3
Libraries:
- CUBLAS: 11.9.2
- CURAND: 10.3.0
- CUFFT: 10.9.0
- CUSOLVER: 11.4.1
- CUSPARSE: 11.7.5
- CUPTI: 18.0.0
- NVML: 11.0.0+510.47.3
Toolchain:
- Julia: 1.10.0-DEV.220
- LLVM: 14.0.6
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0, 7.1, 7.2, 7.3, 7.4, 7.5
- Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80, sm_86
6 devices:
0: Tesla V100-PCIE-16GB (sm_70, 14.361 GiB / 16.000 GiB available) |
Not sure the cause, maybe depend on hardware. I'm able to reproduce on my desktop as well with CUDA.versioninfo:
|
The problem seems to related to CUDA |
It's odd because our CI should test this and it's been green as well. Can you share the output of (non-CUDA.jl) |
Here's the versioninfo:
It looks like the |
If you can reproduce this with just the cuDNN Julia package loaded, do you mind filing an issue over on https://github.com/JuliaGPU/CUDA.jl? |
Yeah, I'll file an issue there, and I think we can close this one then. |
Initially, I thought it's only causing error when channel is mismatched. But on 0.13.12, it will crash julia even the channel is matched.
ConvTranspose seems to work regardless of the channel on CPU, but it will return an opaque CUDA error when run on GPU (
CUDNN_STATUS_BAD_PARAM (code 3)
) in older version (0.13.4), crash julia on 0.13.12.A minimum example.
This will return CUDA error mentioned above or just crash, but will run normally if run on CPU. On 0.13.12, even I changed the convtranspose pameter from
2=>2
to4=>2
, it still crashes the Julia, it may require more parameter validity check.The text was updated successfully, but these errors were encountered: