Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spurious RNN failure with CUDNN #923

Closed
maleadt opened this issue Nov 6, 2019 · 29 comments
Closed

Spurious RNN failure with CUDNN #923

maleadt opened this issue Nov 6, 2019 · 29 comments

Comments

@maleadt
Copy link
Collaborator

maleadt commented Nov 6, 2019

On 8a0745f, which uses CuArrays 1.4.2:

R = Flux.LSTM: Error During Test at /builds/JuliaGPU/Flux.jl/test/cuda/curnn.jl:4
  Got exception outside of a @test
  CUDNNError(code CUDNN_STATUS_BAD_PARAM, CUDNN_STATUS_BAD_PARAM)
  Stacktrace:
   [1] cudnnRNNForwardTraining(::Ptr{Nothing}, ::CuArrays.CUDNN.RNNDesc{Float32}, ::Int64, ::Array{CuArrays.CUDNN.TensorDesc,1}, ::CuArray{Float32,1,Nothing}, ::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,1,Nothing}, ::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,1,Nothing}, ::CuArrays.CUDNN.FilterDesc, ::CuArray{Float32,1,Nothing}, ::Array{CuArrays.CUDNN.TensorDesc,1}, ::CuArray{Float32,1,Nothing}, ::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,1,Nothing}, ::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,1,Nothing}, ::CuArray{UInt8,1,Nothing}, ::Int64, ::CuArray{UInt8,1,Nothing}, ::Int64) at /root/.julia/packages/CuArrays/YbsYr/src/dnn/error.jl:19
   [2] cudnnRNNForward(::CuArrays.CUDNN.RNNDesc{Float32}, ::Int64, ::Array{CuArrays.CUDNN.TensorDesc,1}, ::CuArray{Float32,1,Nothing}, ::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,1,Nothing}, ::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,1,Nothing}, ::CuArrays.CUDNN.FilterDesc, ::CuArray{Float32,1,Nothing}, ::Array{CuArrays.CUDNN.TensorDesc,1}, ::CuArray{Float32,1,Nothing}, ::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,1,Nothing}, ::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,1,Nothing}, ::CuArray{UInt8,1,Nothing}, ::CuArray{UInt8,1,Nothing}) at /root/.julia/packages/CuArrays/YbsYr/src/dnn/rnn.jl:86
   [3] forward(::CuArrays.CUDNN.RNNDesc{Float32}, ::CuArray{Float32,1,Nothing}, ::CuArray{Float32,1,Nothing}, ::CuArray{Float32,1,Nothing}, ::Type) at /root/.julia/packages/CuArrays/YbsYr/src/dnn/rnn.jl:121
   [4] adjoint at /root/.julia/packages/CuArrays/YbsYr/src/dnn/rnn.jl:135 [inlined]
   [5] _pullback at /root/.julia/packages/ZygoteRules/6nssF/src/adjoint.jl:47 [inlined]
   [6] adjoint(::Zygote.Context, ::typeof(Core._apply), ::Flux.LSTMCell{CuArray{Float32,2,Nothing},CuArray{Float32,1,Nothing}}, ::Tuple{Tuple{CuArray{Float32,1,Nothing},CuArray{Float32,1,Nothing}}}, ::Tuple{CuArray{Float32,1,Nothing}}) at /root/.julia/packages/Zygote/mMAdj/src/lib/lib.jl:139
   [7] _pullback(::Zygote.Context, ::Flux.Recur{Flux.LSTMCell{CuArray{Float32,2,Nothing},CuArray{Float32,1,Nothing}}}, ::CuArray{Float32,1,Nothing}) at /root/.julia/packages/ZygoteRules/6nssF/src/adjoint.jl:47
   [8] _pullback(::Zygote.Context, ::getfield(Main, Symbol("##97#99")){CuArray{Float32,1,Nothing}}, ::Flux.Recur{Flux.LSTMCell{CuArray{Float32,2,Nothing},CuArray{Float32,1,Nothing}}}) at /builds/JuliaGPU/Flux.jl/test/cuda/curnn.jl:7
   [9] _pullback(::Function, ::Flux.Recur{Flux.LSTMCell{CuArray{Float32,2,Nothing},CuArray{Float32,1,Nothing}}}) at /root/.julia/packages/Zygote/mMAdj/src/compiler/interface.jl:31
   [10] pullback(::Function, ::Flux.Recur{Flux.LSTMCell{CuArray{Float32,2,Nothing},CuArray{Float32,1,Nothing}}}) at /root/.julia/packages/Zygote/mMAdj/src/compiler/interface.jl:37
   [11] gradient(::Function, ::Flux.Recur{Flux.LSTMCell{CuArray{Float32,2,Nothing},CuArray{Float32,1,Nothing}}}) at /root/.julia/packages/Zygote/mMAdj/src/compiler/interface.jl:46
   [12] top-level scope at /builds/JuliaGPU/Flux.jl/test/cuda/curnn.jl:7
   [13] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/Test/src/Test.jl:1156
   [14] include at ./boot.jl:326 [inlined]
   [15] include_relative(::Module, ::String) at ./loading.jl:1038
   [16] include(::Module, ::String) at ./sysimg.jl:29
   [17] include(::String) at ./client.jl:403
   [18] top-level scope at /builds/JuliaGPU/Flux.jl/test/cuda/cuda.jl:59
   [19] include at ./boot.jl:326 [inlined]
   [20] include_relative(::Module, ::String) at ./loading.jl:1038
   [21] include(::Module, ::String) at ./sysimg.jl:29
   [22] include(::String) at ./client.jl:403
   [23] top-level scope at /builds/JuliaGPU/Flux.jl/test/runtests.jl:23
   [24] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/Test/src/Test.jl:1083
   [25] top-level scope at /builds/JuliaGPU/Flux.jl/test/runtests.jl:8
   [26] include at ./boot.jl:326 [inlined]
   [27] include_relative(::Module, ::String) at ./loading.jl:1038
   [28] include(::Module, ::String) at ./sysimg.jl:29
   [29] include(::String) at ./client.jl:403
   [30] top-level scope at none:0
   [31] eval(::Module, ::Any) at ./boot.jl:328
   [32] exec_options(::Base.JLOptions) at ./client.jl:243
   [33] _start() at ./client.jl:436

Guess we haven't fixed all of them. Happens very rarely though, so less of an issue than #267.

@YolCruz
Copy link

YolCruz commented Nov 7, 2019

I'm getting this error consistently. Altho I found something odd:
When I first run my model I get this error. But If I run it again immediately then I get no error. Thought it was just my computer. I even got a fresh install of windows just to see if was something with the installation but it wasn't, I'm still getting the error

@MikeInnes
Copy link
Member

This is showing up in CI too. Any idea why it might be happening? Seems like it should be possible to bisect the CuArrays change that lead to this.

@maleadt
Copy link
Collaborator Author

maleadt commented Nov 15, 2019

There was a bunch of changes to the memory allocator, and I also limited the amount of memory a process can use (which increases memory pressure, and can thus cause a memory reuse problem to surface).

@appleparan
Copy link

appleparan commented Nov 18, 2019

For me, it happens on "cudnnBackwardData" not forward. It may not be workspace issue only.

I'm on 7eb6a0

In my case, I copied https://github.com/FluxML/Flux.jl/blob/master/test/cuda/curnn.jl to test_curnn.jl and limit memory as 40MB as following. Then trying to debug this code.

ENV["CUARRAYS_MEMORY_LIMIT"] = "40000000"
  1. It only happens first and second run
  2. At first run,
ERROR: CUDNNError(code CUDNN_STATUS_BAD_PARAM, CUDNN_STATUS_BAD_PARAM)
Stacktrace:
 [1] cudnnRNNBackwardData(::Ptr{Nothing}, ::CuArrays.CUDNN.RNNDesc{Float32}, ::Int64, ::Array{CuArrays.CUDNN.TensorDesc,1}, ::CuArray{Float32,1,Nothing}, ::Array{CuArrays.CUDNN.TensorDesc,1}, ::CuArray{Float32,1,Nothing}, ::Ptr{Nothing}, ::CUDAdrv.CuPtr{No
thing}, ::Ptr{Nothing}, ::CUDAdrv.CuPtr{Nothing}, ::CuArrays.CUDNN.FilterDesc, ::CuArray{Float32,1,Nothing}, ::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,1,Nothing}, ::Ptr{Nothing}, ::CUDAdrv.CuPtr{Nothing}, ::Array{CuArrays.CUDNN.TensorDesc,1}, ::CuArra
y{Float32,1,Nothing}, ::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,1,Nothing}, ::Ptr{Nothing}, ::CUDAdrv.CuPtr{Nothing}, ::CuArray{UInt8,1,Nothing}, ::Int64, ::CuArray{UInt8,1,Nothing}, ::Int64) at /home/appleparan/.julia/packages/CuArrays/7z7MV/src/dnn/
error.jl:19
 [2] backwardData(::CuArrays.CUDNN.RNNDesc{Float32}, ::CuArray{Float32,1,Nothing}, ::CuArray{Float32,1,Nothing}, ::Nothing, ::Nothing, ::CuArray{Float32,1,Nothing}, ::Nothing, ::CuArray{UInt8,1,Nothing}) at /home/appleparan/.julia/packages/CuArrays/7z7MV/s
rc/dnn/rnn.jl:146
 [3] backwardData(::CuArrays.CUDNN.RNNDesc{Float32}, ::CuArray{Float32,1,Nothing}, ::CuArray{Float32,1,Nothing}, ::Nothing, ::CuArray{Float32,1,Nothing}, ::CuArray{UInt8,1,Nothing}) at /home/appleparan/.julia/packages/CuArrays/7z7MV/src/dnn/rnn.jl:155
 [4] (::getfield(CuArrays.CUDNN, Symbol("##358#359")){CuArrays.CUDNN.RNNDesc{Float32},CuArray{Float32,1,Nothing},CuArray{Float32,1,Nothing},CuArray{UInt8,1,Nothing},CuArray{Float32,1,Nothing}})(::CuArray{Float32,1,Nothing}, ::Nothing) at /home/appleparan/.
julia/packages/CuArrays/7z7MV/src/dnn/rnn.jl:174
 [5] (::getfield(Flux.CUDA, Symbol("##9#10")){Zygote.Context,Flux.GRUCell{CuArray{Float32,2,Nothing},CuArray{Float32,1,Nothing}},CuArray{Float32,1,Nothing},getfield(CuArrays.CUDNN, Symbol("##358#359")){CuArrays.CUDNN.RNNDesc{Float32},CuArray{Float32,1,Noth
ing},CuArray{Float32,1,Nothing},CuArray{UInt8,1,Nothing},CuArray{Float32,1,Nothing}}})(::Tuple{Nothing,CuArray{Float32,1,Nothing}}) at /home/appleparan/.julia/packages/Flux/jXyco/src/cuda/curnn.jl:75
 [6] (::getfield(Flux.CUDA, Symbol("##69#back#11")){getfield(Flux.CUDA, Symbol("##9#10")){Zygote.Context,Flux.GRUCell{CuArray{Float32,2,Nothing},CuArray{Float32,1,Nothing}},CuArray{Float32,1,Nothing},getfield(CuArrays.CUDNN, Symbol("##358#359")){CuArrays.C
UDNN.RNNDesc{Float32},CuArray{Float32,1,Nothing},CuArray{Float32,1,Nothing},CuArray{UInt8,1,Nothing},CuArray{Float32,1,Nothing}}}})(::Tuple{Nothing,CuArray{Float32,1,Nothing}}) at /home/appleparan/.julia/packages/ZygoteRules/6nssF/src/adjoint.jl:49
 [7] (::getfield(Zygote, Symbol("##153#154")){getfield(Flux.CUDA, Symbol("##69#back#11")){getfield(Flux.CUDA, Symbol("##9#10")){Zygote.Context,Flux.GRUCell{CuArray{Float32,2,Nothing},CuArray{Float32,1,Nothing}},CuArray{Float32,1,Nothing},getfield(CuArrays.
CUDNN, Symbol("##358#359")){CuArrays.CUDNN.RNNDesc{Float32},CuArray{Float32,1,Nothing},CuArray{Float32,1,Nothing},CuArray{UInt8,1,Nothing},CuArray{Float32,1,Nothing}}}},Tuple{Tuple{Nothing},Tuple{Nothing}}})(::Tuple{Nothing,CuArray{Float32,1,Nothing}}) at
/home/appleparan/.julia/packages/Zygote/8dVxG/src/lib/lib.jl:142
 [8] (::getfield(Zygote, Symbol("##283#back#155")){getfield(Zygote, Symbol("##153#154")){getfield(Flux.CUDA, Symbol("##69#back#11")){getfield(Flux.CUDA, Symbol("##9#10")){Zygote.Context,Flux.GRUCell{CuArray{Float32,2,Nothing},CuArray{Float32,1,Nothing}},Cu
Array{Float32,1,Nothing},getfield(CuArrays.CUDNN, Symbol("##358#359")){CuArrays.CUDNN.RNNDesc{Float32},CuArray{Float32,1,Nothing},CuArray{Float32,1,Nothing},CuArray{UInt8,1,Nothing},CuArray{Float32,1,Nothing}}}},Tuple{Tuple{Nothing},Tuple{Nothing}}}})(::Tu
ple{Nothing,CuArray{Float32,1,Nothing}}) at /home/appleparan/.julia/packages/ZygoteRules/6nssF/src/adjoint.jl:49
 [9] (::typeof(∂(λ)))(::CuArray{Float32,1,Nothing}) at /home/appleparan/.julia/packages/Flux/jXyco/src/layers/recurrent.jl:36
 [10] (::typeof(∂(λ)))(::Float32) at /home/appleparan/src/DebugFlux/test_curnn.jl:8
 [11] (::getfield(Zygote, Symbol("##28#29")){typeof(∂(λ))})(::Float32) at /home/appleparan/.julia/packages/Zygote/8dVxG/src/compiler/interface.jl:38
 [12] gradient(::getfield(Main, Symbol("##3#7")){CuArray{Float32,1,Nothing}}, ::Tuple{Flux.Recur{Flux.GRUCell{CuArray{Float32,2,Nothing},CuArray{Float32,1,Nothing}}}}) at /home/appleparan/.julia/packages/Zygote/8dVxG/src/compiler/interface.jl:47
 [13] run_test() at /home/appleparan/src/DebugFlux/test_curnn.jl:8
  1. second run
ERROR: CUDNNError(code CUDNN_STATUS_BAD_PARAM, CUDNN_STATUS_BAD_PARAM)
Stacktrace:
 [1] adjoint(::Zygote.Context, ::typeof(Core._apply), ::Flux.LSTMCell{CuArray{Float32,2,Nothing},CuArray{Float32,1,Nothing}}, ::Tuple{Tuple{Tuple{CuArray{Float32,1,Nothing},CuArray{Float32,1,Nothing}}},Tuple{CuArray{Float32,1,Nothing}}}) at /home/appleparan/.julia/packages/Zygote/8dVxG/src/lib/lib.jl:139
 [2] _pullback(::Zygote.Context, ::typeof(Core._apply), ::Flux.LSTMCell{CuArray{Float32,2,Nothing},CuArray{Float32,1,Nothing}}, ::Tuple{Tuple{Tuple{CuArray{Float32,1,Nothing},CuArray{Float32,1,Nothing}}},Tuple{CuArray{Float32,1,Nothing}}}) at /home/appleparan/.julia/packages/ZygoteRules/6nssF/src/adjoint.jl:47
 [3] _pullback(::Zygote.Context, ::Flux.Recur{Flux.LSTMCell{CuArray{Float32,2,Nothing},CuArray{Float32,1,Nothing}}}, ::Tuple{CuArray{Float32,1,Nothing}}) at /home/appleparan/.julia/packages/Flux/jXyco/src/layers/recurrent.jl:36
 [4] _pullback(::Zygote.Context, ::getfield(Main, Symbol("##3#7")){CuArray{Float32,1,Nothing}}, ::Tuple{Flux.Recur{Flux.LSTMCell{CuArray{Float32,2,Nothing},CuArray{Float32,1,Nothing}}}}) at /home/appleparan/src/DebugFlux/test_curnn.jl:8
 [5] _pullback(::getfield(Main, Symbol("##3#7")){CuArray{Float32,1,Nothing}}, ::Tuple{Flux.Recur{Flux.LSTMCell{CuArray{Float32,2,Nothing},CuArray{Float32,1,Nothing}}}}) at /home/appleparan/.julia/packages/Zygote/8dVxG/src/compiler/interface.jl:31
 [6] pullback(::getfield(Main, Symbol("##3#7")){CuArray{Float32,1,Nothing}}, ::Tuple{Flux.Recur{Flux.LSTMCell{CuArray{Float32,2,Nothing},CuArray{Float32,1,Nothing}}}}) at /home/appleparan/.julia/packages/Zygote/8dVxG/src/compiler/interface.jl:37
 [7] gradient(::getfield(Main, Symbol("##3#7")){CuArray{Float32,1,Nothing}}, ::Tuple{Flux.Recur{Flux.LSTMCell{CuArray{Float32,2,Nothing},CuArray{Float32,1,Nothing}}}}) at /home/appleparan/.julia/packages/Zygote/8dVxG/src/compiler/interface.jl:46
 [8] run_test() at /home/appleparan/src/DebugFlux/test_curnn.jl:8

This error indicates first test case code

  for R in [RNN, GRU, LSTM]
    m = R(10, 5) |> gpu
    x = gpu(rand(10))
    (m̄,) = gradient(m -> sum(m(x)), m)
    Flux.reset!(m)
    θ = gradient(() -> sum(m(x)), Flux.params(m))
    @test collect(m̄[].cell[].Wi) == collect(θ[m.cell.Wi])
  end

  1. Then it works after third run.

Could It be Zygote issue?

@MikeInnes
Copy link
Member

MikeInnes commented Nov 19, 2019

Can you try running with GC disabled and see if you can still reproduce it?

We never actually added all the GC.@preserves that we should have to in the rnn wrappers, so it might just be as simple as fixing that.

@maleadt
Copy link
Collaborator Author

maleadt commented Nov 19, 2019

We never actually added all the GC.@preserves that we should have to in the rnn wrappers, so it might just be as simple as fixing that.

We don't need any GC.@preserves, as we're only dealing with CuArrays and the unsafe conversion to a pointer happens "within" the ccall.

I'll have a closer look at this, but couldn't reproduce last I looked.

@MikeInnes
Copy link
Member

IIUC, we could still get an early free if the ccall gets inlined, and any of the ccall-invoked code (i.e. user-defined conversions) allocates memory and triggers GC. I imagine our CuArray->Ptr conversions don't allocate, but since that's dependent on compiler optimisations it seems a little risky.

I have no idea if early frees would even cause this error – you'd expect a segfault-like issue instead – but hopefully GC will tell us that. Of course, it was misleading last time, so ¯\(ツ)

FWIW, this seems to be showing up in about 1/3-1/2 of bors runs.

@maleadt
Copy link
Collaborator Author

maleadt commented Nov 19, 2019

IIUC, we could still get an early free if the ccall gets inlined, and any of the ccall-invoked code (i.e. user-defined conversions) allocates memory and triggers GC. I imagine our CuArray->Ptr conversions don't allocate, but since that's dependent on compiler optimisations it seems a little risky.

No, that's not correct. If the unsafe conversions (i.e. the ones that get an untracked reference to data) only happen by Base.unsafe_convert, ccall guarantees that the object (or more correctly, the object as returned by Base.cconvert) is kept alive until after the ccall returns.

julia> a = [1]
1-element Array{Int64,1}:
 1

julia> Meta.@lower ccall(:whatever, Nothing, (Ptr{Int},), a)
:($(Expr(:thunk, CodeInfo(
    @ none within `top-level scope'
1 ─ %1 = Core.apply_type(Ptr, Int)
│   %2 = Base.cconvert(%1, a)
│   %3 = Core.apply_type(Ptr, Int)
│   %4 = Base.unsafe_convert(%3, %2)
│   %5 = $(Expr(:foreigncall, :(:whatever), :Nothing, :(Core.svec(Core.apply_type(Ptr, Int))), :(:ccall), 1, :(%4), :(%2)))
└──      return %5
))))

%2 being passed into the foreigncall is what keeps the array alive, regardless of any user-defined conversions that happen in unsafe_convert. See https://docs.julialang.org/en/v1/devdocs/llvm/#Supporting-[ccall](@ref)-1

@MikeInnes
Copy link
Member

Ah, I wasn't aware that ccall had that safety net built in. Yes, that does seem like it should be OK then.

@maleadt
Copy link
Collaborator Author

maleadt commented Nov 19, 2019

I still cannot reproduce. @appleparan could you send a Manifest? Which versions of Julia, CUDA and CUDNN are you using?

@appleparan
Copy link

appleparan commented Nov 19, 2019

@maleadt Here it is. https://gist.github.com/appleparan/69289887e446b3ec57f1f42c6a375588

I was trying to reproduce whole day. However, If I didn't put bp it can be reproduced, however, if I put bp just before on cudnnRNNForward and cudnnRNNBackwardData to check workspace size, I couldn't.

Following is commands that I used. I didn't use GC related options.

$ CUARRAYS_MEMORY_LIMIT="40000000" julia
] activate .
include("test_curnn.jl")
using Debugger

@enter run_test()

EDIT
Julia ver.

Julia Version 1.2.0
Commit c6da87ff4b (2019-08-20 00:03 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)
Environment:
  JULIA_PATH = /usr/local/julia
  JULIA_BINDIR = /usr/local/julia/bin

CUDA Info

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

I made a Singularity image, and run julia inside

@appleparan
Copy link

appleparan commented Nov 19, 2019

I added Zygote.@code_adjoint to gradient(m -> sum(m(x)), m) and get this

https://gist.github.com/appleparan/676d03e9de15092b0e7d3d6501c52517

I thought pullback from this lines and this should be called, however, there is no pullback(rnn::RNNDesc{T}, x::CuArray{T}, h::CuArray{T}, c::CuArray{T}) in stacktrace.

Does anyone explain this? It is nonsense not to call pullback(rnn::RNNDesc{T}, x::CuArray{T}, h::CuArray{T}, c::CuArray{T}) and get CUDNNError.

Moreover, this error only happens on first run. If I want to reproduce, I need to restart Julia. That's why I think this relates to code generation or Zygote.

@MikeInnes
Copy link
Member

You're taking the gradient of m -> sum(m(x)) (an anonymous function), and then taking the adjoint of that (i.e. it's a second order derivative). In any case, the call to CuArrays.pullback is inside a custom adjoint, not derived by Zygote, so it won't show up in there.

@maleadt
Copy link
Collaborator Author

maleadt commented Nov 26, 2019

R = LSTM: Error During Test at /home/tbesard/Julia/pkg/Flux/test/cuda/curnn.jl:5
  Got exception outside of a @test
  AssertionError: length(workspace) >= cudnnGetRNNWorkspaceSize(rnn, seqlen, xd)
  Stacktrace:
   [1] cudnnRNNForward(::CuArrays.CUDNN.RNNDesc{Float32}, ::Int64, ::Array{CuArrays.CUDNN.TensorDesc,1}, ::CuArray{Float32,1,Nothing}, ::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,1,Nothing}, ::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,1,Nothing}, ::CuArrays.CUDNN.FilterDesc, ::CuArray{Float32,1,Nothing}, ::Array{CuArrays.CUDNN.TensorDesc,1}, ::CuArray{Float32,1,Nothing}, ::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,1,Nothing}, ::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,1,Nothing}, ::CuArray{UInt8,1,Nothing}, ::CuArray{UInt8,1,Nothing}) at /home/tbesard/Julia/pkg/CuArrays/src/dnn/rnn.jl:83

Which doesn't make any sense since the workspace is allocated right before the call to cudnnRNNForward. So either cudnnGetRNNWorkspaceSize returns a different value, or the workspace is getting destroyed behind the scenes by a CuArrays memory pool bug.

@MikeInnes
Copy link
Member

Is it possible that the alloc causes memory to get freed by the pool, affecting CUDNN's heuristic?

If so, it seems pretty hard to work around this. Perhaps we can just allocate-and-check until CUDNN is happy, at least as a temporary fix.

@maleadt
Copy link
Collaborator Author

maleadt commented Nov 26, 2019

ding, ding, ding

# outer method that allocates
workspace_len = cudnnGetRNNWorkspaceSize(rnn, seqLength, xdesc) = 1216
# inner one
length(workspace) = 1216
workspace_len = cudnnGetRNNWorkspaceSize(rnn, seqlen, xd) = 1248
R = GRU, batch_size = 1: Error During Test at /home/tim/Julia/pkg/Flux/test/cuda/curnn.jl:14
  Got exception outside of a @test
  AssertionError: length(workspace) >= workspace_len

@MikeInnes
Copy link
Member

Dammit CUDNN, we talked about this

bors bot added a commit that referenced this issue Nov 26, 2019
944: RNN failure hackaround r=MikeInnes a=MikeInnes

See #923.

bors try

Co-authored-by: Mike Innes <mike.j.innes@gmail.com>
@appleparan
Copy link

Finally! workspace problem again.

I gave up this issue because after updating Flux#master I couldn't reproduce and in my case when I inspect variables, error doesn't appeared.

I saw draft PR from @maleadt . It is great solution. However, the real problem is workspace is smaller than CUDNN expected. I was curious about why only Julia have this problem. CUDNN sample code just call cudnnGetRNNWorkspaceSize once and use it whole sample code. (It' not from official repo but it seems okay).

How about keeping workspace size at start of pullback and pass that size and not calling cudnnGetRNNWorkspaceSize everytime?

@maleadt
Copy link
Collaborator Author

maleadt commented Nov 26, 2019

How about keeping workspace size at start of pullback and pass that size and not calling cudnnGetRNNWorkspaceSize everytime?

That wouldn't work, because we can happen to free data in between, in which case CUDNN would expect a larger workspace due to how its heuristics appear to work. In the sample code, there are no frees in between those calls, they only happen at the very end of the sample.

@appleparan
Copy link

I see. You are right.

@DhairyaLGandhi
Copy link
Member

So the heuristic is to check until cudnn is happy with the amount of memory we allocate?

@maleadt
Copy link
Collaborator Author

maleadt commented Nov 26, 2019

We can't really check if CUDNN is happy because it returns INVALID_PARAM and not INSUFFICIENT_WORKSPACE or similar. So we allocate until it doesn't ask us to allocate more, and hope that we don't suddenly free memory before calling into the library (which would change the heuristic).

bors bot added a commit that referenced this issue Nov 27, 2019
944: RNN failure hackaround r=MikeInnes a=MikeInnes

See #923.

bors try

Co-authored-by: Mike Innes <mike.j.innes@gmail.com>
@maleadt
Copy link
Collaborator Author

maleadt commented Nov 27, 2019

Should be fixed

@maleadt maleadt closed this as completed Nov 27, 2019
@ararslan
Copy link
Contributor

I've been getting this, which seems related to what's been discussed here:

ERROR: LoadError: CUDNNError: CUDNN_STATUS_BAD_PARAM (code 3)
Stacktrace:
 [1] throw_api_error(::CuArrays.CUDNN.cudnnStatus_t) at /home/ubuntu/.julia/packages/CuArrays/A6GUx/src/dnn/error.jl:27
 [2] macro expansion at /home/ubuntu/.julia/packages/CuArrays/A6GUx/src/dnn/error.jl:40 [inlined]
 [3] cudnnRNNBackwardData(::Ptr{Nothing}, ::CuArrays.CUDNN.RNNDesc{Float32}, ::Int64, ::Array{CuArrays.CUDNN.TensorDesc,1}, ::CuArray{Float32,2,Nothing}, ::Array{CuArrays.CUDNN.TensorDesc,1}, ::CuArray{Float32,2,Nothing}, ::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,1,CuArray{Float32,2,Nothing}}, ::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,1,CuArray{Float32,2,Nothing}}, ::CuArrays.CUDNN.FilterDesc, ::CuArray{Float32,1,Nothing}, ::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,2,Nothing}, ::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,2,Nothing}, ::Array{CuArrays.CUDNN.TensorDesc,1}, ::CuArray{Float32,2,Nothing}, ::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,2,Nothing}, ::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,2,Nothing}, ::CuArray{UInt8,1,Nothing}, ::Int64, ::CuArray{UInt8,1,Nothing}, ::Int64) at /home/ubuntu/.julia/packages/CuArrays/A6GUx/src/dnn/libcudnn.jl:1362
 [4] macro expansion at /home/ubuntu/.julia/packages/CuArrays/A6GUx/src/dnn/rnn.jl:143 [inlined]
 [5] macro expansion at /home/ubuntu/.julia/packages/CuArrays/A6GUx/src/utils.jl:198 [inlined]
 [6] cudnnRNNBackwardData(::CuArrays.CUDNN.RNNDesc{Float32}, ::Int64, ::Array{CuArrays.CUDNN.TensorDesc,1}, ::CuArray{Float32,2,Nothing}, ::Array{CuArrays.CUDNN.TensorDesc,1}, ::CuArray{Float32,2,Nothing}, ::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,1,CuArray{Float32,2,Nothing}}, ::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,1,CuArray{Float32,2,Nothing}}, ::CuArrays.CUDNN.FilterDesc, ::CuArray{Float32,1,Nothing}, ::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,2,Nothing},::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,2,Nothing}, ::Array{CuArrays.CUDNN.TensorDesc,1}, ::CuArray{Float32,2,Nothing}, ::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,2,Nothing}, ::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,2,Nothing}, ::CuArray{UInt8,1,Nothing}) at /home/ubuntu/.julia/packages/CuArrays/A6GUx/src/dnn/rnn.jl:139
 [7] backwardData(::CuArrays.CUDNN.RNNDesc{Float32}, ::CuArray{Float32,2,Nothing}, ::CuArray{Float32,2,Nothing}, ::CuArray{Float32,1,CuArray{Float32,2,Nothing}}, ::CuArray{Float32,1,CuArray{Float32,2,Nothing}}, ::CuArray{Float32,2,Nothing}, ::CuArray{Float32,2,Nothing}, ::CuArray{UInt8,1,Nothing}) at /home/ubuntu/.julia/packages/CuArrays/A6GUx/src/dnn/rnn.jl:157
 [8] (::CuArrays.CUDNN.var"#357#358"{CuArrays.CUDNN.RNNDesc{Float32},CuArray{Float32,2,Nothing},CuArray{Float32,2,Nothing},CuArray{Float32,2,Nothing},CuArray{UInt8,1,Nothing},CuArray{Float32,2,Nothing}})(::CuArray{Float32,2,Nothing}, ::CuArray{Float32,1,CuArray{Float32,2,Nothing}}, ::CuArray{Float32,1,CuArray{Float32,2,Nothing}}) at /home/ubuntu/.julia/packages/CuArrays/A6GUx/src/dnn/rnn.jl:200
 [9] (::Flux.CUDA.var"#13#14"{Zygote.Context,Flux.LSTMCell{CuArray{Float32,2,Nothing},CuArray{Float32,1,Nothing}},CuArray{Float32,2,Nothing},CuArray{Float32,2,Nothing},CuArrays.CUDNN.var"#357#358"{CuArrays.CUDNN.RNNDesc{Float32},CuArray{Float32,2,Nothing},CuArray{Float32,2,Nothing},CuArray{Float32,2,Nothing},CuArray{UInt8,1,Nothing},CuArray{Float32,2,Nothing}}})(::Tuple{Tuple{CuArray{Float32,1,CuArray{Float32,2,Nothing}},CuArray{Float32,1,CuArray{Float32,2,Nothing}}},CuArray{Float32,2,Nothing}}) at /home/ubuntu/.julia/packages/Flux/NpkMm/src/cuda/curnn.jl:86
 [10] #306#back at /home/ubuntu/.julia/packages/ZygoteRules/6nssF/src/adjoint.jl:49 [inlined]
 [11] #175 at /home/ubuntu/.julia/packages/Zygote/KNUTW/src/lib/lib.jl:170 [inlined]
 [12] (::Zygote.var"#344#back#177"{Zygote.var"#175#176"{Flux.CUDA.var"#306#back#15"{Flux.CUDA.var"#13#14"{Zygote.Context,Flux.LSTMCell{CuArray{Float32,2,Nothing},CuArray{Float32,1,Nothing}},CuArray{Float32,2,Nothing},CuArray{Float32,2,Nothing},CuArrays.CUDNN.var"#357#358"{CuArrays.CUDNN.RNNDesc{Float32},CuArray{Float32,2,Nothing},CuArray{Float32,2,Nothing},CuArray{Float32,2,Nothing},CuArray{UInt8,1,Nothing},CuArray{Float32,2,Nothing}}}},Tuple{Tuple{Nothing},Tuple{Nothing}}}})(::Tuple{Tuple{CuArray{Float32,1,CuArray{Float32,2,Nothing}},CuArray{Float32,1,CuArray{Float32,2,Nothing}}},CuArray{Float32,2,Nothing}}) at /home/ubuntu/.julia/packages/ZygoteRules/6nssF/src/adjoint.jl:49
 [13] Recur at /home/ubuntu/.julia/packages/Flux/NpkMm/src/layers/recurrent.jl:36 [inlined]
 [14] (::typeof(∂(λ)))(::CuArray{Float32,2,Nothing}) at /home/ubuntu/.julia/packages/Zygote/KNUTW/src/compiler/interface2.jl:0
 [15] applychain at /home/ubuntu/.julia/packages/Flux/NpkMm/src/layers/basic.jl:30 [inlined]
 ... (the last 2 lines are repeated 12 more times)
 [40] (::typeof(∂(applychain)))(::CuArray{Float32,2,CuArray{Float32,3,Nothing}}) at /home/ubuntu/.julia/packages/Zygote/KNUTW/src/compiler/interface2.jl:0

If this isn't the same, I can open a new issue.

@ararslan
Copy link
Contributor

Apologies for the noise, it seems it is not the same issue.

@maleadt
Copy link
Collaborator Author

maleadt commented Mar 30, 2020

Apologies for the noise, it seems it is not the same issue.

What was the issue?

@ararslan
Copy link
Contributor

I still have no idea, but the workspace and reserve sizes appear to be correct, so it seems not to be the same issue as this one.

@jeremiedb
Copy link
Contributor

jeremiedb commented Apr 8, 2020

I also encountered CUDNN_STATUS_BAD_PARAM (code 3) with RNN (latest Flux/Zygote/CuArrays):
The origin seems to be Flux.reset! and RNN state size initialization, which results in an improper state dimension:

rnn = Chain(GRU(16, 8),
  Dense(8,1, σ),
  x -> reshape(x,:))

X = [rand(16,10) for i in 1:20]
Y = rand(10,20) ./ 10

rnn = rnn |> gpu
X = gpu(X)
Y = gpu(Y)

θ = Flux.params(rnn)
loss(x,y) = mean((Flux.stack(rnn.(X),2) .- y) .^ 2f0)
opt = ADAM(1e-3)
size(rnn[1].state)
Flux.reset!(rnn)
Flux.train!(loss, θ, [(X,Y)], opt)
size(rnn[1].state)

Itr can be observed that both prior and after reset!, rnn state is of size (8), while after a call to train! on GPU, state becomes the expected proper size (8,10). After each call to reset!, the CUDNN_STATUS_BAD_PARAM error pops out after first call to train!, but subsequent ones are fine as the state size stays (8,10). Can't confirm whether that state size is the root cause, but appears closely tied to the bug.

@maleadt
Copy link
Collaborator Author

maleadt commented Apr 8, 2020

Is that reproducible? If so, could you put it in a new issue with some details on package and CUDA versions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants