Spurious RNN failure with CUDNN #923

maleadt · 2019-11-06T19:14:28Z

On 8a0745f, which uses CuArrays 1.4.2:

R = Flux.LSTM: Error During Test at /builds/JuliaGPU/Flux.jl/test/cuda/curnn.jl:4
  Got exception outside of a @test
  CUDNNError(code CUDNN_STATUS_BAD_PARAM, CUDNN_STATUS_BAD_PARAM)
  Stacktrace:
   [1] cudnnRNNForwardTraining(::Ptr{Nothing}, ::CuArrays.CUDNN.RNNDesc{Float32}, ::Int64, ::Array{CuArrays.CUDNN.TensorDesc,1}, ::CuArray{Float32,1,Nothing}, ::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,1,Nothing}, ::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,1,Nothing}, ::CuArrays.CUDNN.FilterDesc, ::CuArray{Float32,1,Nothing}, ::Array{CuArrays.CUDNN.TensorDesc,1}, ::CuArray{Float32,1,Nothing}, ::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,1,Nothing}, ::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,1,Nothing}, ::CuArray{UInt8,1,Nothing}, ::Int64, ::CuArray{UInt8,1,Nothing}, ::Int64) at /root/.julia/packages/CuArrays/YbsYr/src/dnn/error.jl:19
   [2] cudnnRNNForward(::CuArrays.CUDNN.RNNDesc{Float32}, ::Int64, ::Array{CuArrays.CUDNN.TensorDesc,1}, ::CuArray{Float32,1,Nothing}, ::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,1,Nothing}, ::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,1,Nothing}, ::CuArrays.CUDNN.FilterDesc, ::CuArray{Float32,1,Nothing}, ::Array{CuArrays.CUDNN.TensorDesc,1}, ::CuArray{Float32,1,Nothing}, ::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,1,Nothing}, ::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,1,Nothing}, ::CuArray{UInt8,1,Nothing}, ::CuArray{UInt8,1,Nothing}) at /root/.julia/packages/CuArrays/YbsYr/src/dnn/rnn.jl:86
   [3] forward(::CuArrays.CUDNN.RNNDesc{Float32}, ::CuArray{Float32,1,Nothing}, ::CuArray{Float32,1,Nothing}, ::CuArray{Float32,1,Nothing}, ::Type) at /root/.julia/packages/CuArrays/YbsYr/src/dnn/rnn.jl:121
   [4] adjoint at /root/.julia/packages/CuArrays/YbsYr/src/dnn/rnn.jl:135 [inlined]
   [5] _pullback at /root/.julia/packages/ZygoteRules/6nssF/src/adjoint.jl:47 [inlined]
   [6] adjoint(::Zygote.Context, ::typeof(Core._apply), ::Flux.LSTMCell{CuArray{Float32,2,Nothing},CuArray{Float32,1,Nothing}}, ::Tuple{Tuple{CuArray{Float32,1,Nothing},CuArray{Float32,1,Nothing}}}, ::Tuple{CuArray{Float32,1,Nothing}}) at /root/.julia/packages/Zygote/mMAdj/src/lib/lib.jl:139
   [7] _pullback(::Zygote.Context, ::Flux.Recur{Flux.LSTMCell{CuArray{Float32,2,Nothing},CuArray{Float32,1,Nothing}}}, ::CuArray{Float32,1,Nothing}) at /root/.julia/packages/ZygoteRules/6nssF/src/adjoint.jl:47
   [8] _pullback(::Zygote.Context, ::getfield(Main, Symbol("##97#99")){CuArray{Float32,1,Nothing}}, ::Flux.Recur{Flux.LSTMCell{CuArray{Float32,2,Nothing},CuArray{Float32,1,Nothing}}}) at /builds/JuliaGPU/Flux.jl/test/cuda/curnn.jl:7
   [9] _pullback(::Function, ::Flux.Recur{Flux.LSTMCell{CuArray{Float32,2,Nothing},CuArray{Float32,1,Nothing}}}) at /root/.julia/packages/Zygote/mMAdj/src/compiler/interface.jl:31
   [10] pullback(::Function, ::Flux.Recur{Flux.LSTMCell{CuArray{Float32,2,Nothing},CuArray{Float32,1,Nothing}}}) at /root/.julia/packages/Zygote/mMAdj/src/compiler/interface.jl:37
   [11] gradient(::Function, ::Flux.Recur{Flux.LSTMCell{CuArray{Float32,2,Nothing},CuArray{Float32,1,Nothing}}}) at /root/.julia/packages/Zygote/mMAdj/src/compiler/interface.jl:46
   [12] top-level scope at /builds/JuliaGPU/Flux.jl/test/cuda/curnn.jl:7
   [13] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/Test/src/Test.jl:1156
   [14] include at ./boot.jl:326 [inlined]
   [15] include_relative(::Module, ::String) at ./loading.jl:1038
   [16] include(::Module, ::String) at ./sysimg.jl:29
   [17] include(::String) at ./client.jl:403
   [18] top-level scope at /builds/JuliaGPU/Flux.jl/test/cuda/cuda.jl:59
   [19] include at ./boot.jl:326 [inlined]
   [20] include_relative(::Module, ::String) at ./loading.jl:1038
   [21] include(::Module, ::String) at ./sysimg.jl:29
   [22] include(::String) at ./client.jl:403
   [23] top-level scope at /builds/JuliaGPU/Flux.jl/test/runtests.jl:23
   [24] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/Test/src/Test.jl:1083
   [25] top-level scope at /builds/JuliaGPU/Flux.jl/test/runtests.jl:8
   [26] include at ./boot.jl:326 [inlined]
   [27] include_relative(::Module, ::String) at ./loading.jl:1038
   [28] include(::Module, ::String) at ./sysimg.jl:29
   [29] include(::String) at ./client.jl:403
   [30] top-level scope at none:0
   [31] eval(::Module, ::Any) at ./boot.jl:328
   [32] exec_options(::Base.JLOptions) at ./client.jl:243
   [33] _start() at ./client.jl:436

Guess we haven't fixed all of them. Happens very rarely though, so less of an issue than #267.

The text was updated successfully, but these errors were encountered:

YolCruz · 2019-11-07T04:35:48Z

I'm getting this error consistently. Altho I found something odd:
When I first run my model I get this error. But If I run it again immediately then I get no error. Thought it was just my computer. I even got a fresh install of windows just to see if was something with the installation but it wasn't, I'm still getting the error

MikeInnes · 2019-11-15T11:42:54Z

This is showing up in CI too. Any idea why it might be happening? Seems like it should be possible to bisect the CuArrays change that lead to this.

maleadt · 2019-11-15T12:12:02Z

There was a bunch of changes to the memory allocator, and I also limited the amount of memory a process can use (which increases memory pressure, and can thus cause a memory reuse problem to surface).

appleparan · 2019-11-18T23:14:27Z

For me, it happens on "cudnnBackwardData" not forward. It may not be workspace issue only.

I'm on 7eb6a0

In my case, I copied https://github.com/FluxML/Flux.jl/blob/master/test/cuda/curnn.jl to test_curnn.jl and limit memory as 40MB as following. Then trying to debug this code.

ENV["CUARRAYS_MEMORY_LIMIT"] = "40000000"

It only happens first and second run
At first run,

ERROR: CUDNNError(code CUDNN_STATUS_BAD_PARAM, CUDNN_STATUS_BAD_PARAM)
Stacktrace:
 [1] cudnnRNNBackwardData(::Ptr{Nothing}, ::CuArrays.CUDNN.RNNDesc{Float32}, ::Int64, ::Array{CuArrays.CUDNN.TensorDesc,1}, ::CuArray{Float32,1,Nothing}, ::Array{CuArrays.CUDNN.TensorDesc,1}, ::CuArray{Float32,1,Nothing}, ::Ptr{Nothing}, ::CUDAdrv.CuPtr{No
thing}, ::Ptr{Nothing}, ::CUDAdrv.CuPtr{Nothing}, ::CuArrays.CUDNN.FilterDesc, ::CuArray{Float32,1,Nothing}, ::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,1,Nothing}, ::Ptr{Nothing}, ::CUDAdrv.CuPtr{Nothing}, ::Array{CuArrays.CUDNN.TensorDesc,1}, ::CuArra
y{Float32,1,Nothing}, ::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,1,Nothing}, ::Ptr{Nothing}, ::CUDAdrv.CuPtr{Nothing}, ::CuArray{UInt8,1,Nothing}, ::Int64, ::CuArray{UInt8,1,Nothing}, ::Int64) at /home/appleparan/.julia/packages/CuArrays/7z7MV/src/dnn/
error.jl:19
 [2] backwardData(::CuArrays.CUDNN.RNNDesc{Float32}, ::CuArray{Float32,1,Nothing}, ::CuArray{Float32,1,Nothing}, ::Nothing, ::Nothing, ::CuArray{Float32,1,Nothing}, ::Nothing, ::CuArray{UInt8,1,Nothing}) at /home/appleparan/.julia/packages/CuArrays/7z7MV/s
rc/dnn/rnn.jl:146
 [3] backwardData(::CuArrays.CUDNN.RNNDesc{Float32}, ::CuArray{Float32,1,Nothing}, ::CuArray{Float32,1,Nothing}, ::Nothing, ::CuArray{Float32,1,Nothing}, ::CuArray{UInt8,1,Nothing}) at /home/appleparan/.julia/packages/CuArrays/7z7MV/src/dnn/rnn.jl:155
 [4] (::getfield(CuArrays.CUDNN, Symbol("##358#359")){CuArrays.CUDNN.RNNDesc{Float32},CuArray{Float32,1,Nothing},CuArray{Float32,1,Nothing},CuArray{UInt8,1,Nothing},CuArray{Float32,1,Nothing}})(::CuArray{Float32,1,Nothing}, ::Nothing) at /home/appleparan/.
julia/packages/CuArrays/7z7MV/src/dnn/rnn.jl:174
 [5] (::getfield(Flux.CUDA, Symbol("##9#10")){Zygote.Context,Flux.GRUCell{CuArray{Float32,2,Nothing},CuArray{Float32,1,Nothing}},CuArray{Float32,1,Nothing},getfield(CuArrays.CUDNN, Symbol("##358#359")){CuArrays.CUDNN.RNNDesc{Float32},CuArray{Float32,1,Noth
ing},CuArray{Float32,1,Nothing},CuArray{UInt8,1,Nothing},CuArray{Float32,1,Nothing}}})(::Tuple{Nothing,CuArray{Float32,1,Nothing}}) at /home/appleparan/.julia/packages/Flux/jXyco/src/cuda/curnn.jl:75
 [6] (::getfield(Flux.CUDA, Symbol("##69#back#11")){getfield(Flux.CUDA, Symbol("##9#10")){Zygote.Context,Flux.GRUCell{CuArray{Float32,2,Nothing},CuArray{Float32,1,Nothing}},CuArray{Float32,1,Nothing},getfield(CuArrays.CUDNN, Symbol("##358#359")){CuArrays.C
UDNN.RNNDesc{Float32},CuArray{Float32,1,Nothing},CuArray{Float32,1,Nothing},CuArray{UInt8,1,Nothing},CuArray{Float32,1,Nothing}}}})(::Tuple{Nothing,CuArray{Float32,1,Nothing}}) at /home/appleparan/.julia/packages/ZygoteRules/6nssF/src/adjoint.jl:49
 [7] (::getfield(Zygote, Symbol("##153#154")){getfield(Flux.CUDA, Symbol("##69#back#11")){getfield(Flux.CUDA, Symbol("##9#10")){Zygote.Context,Flux.GRUCell{CuArray{Float32,2,Nothing},CuArray{Float32,1,Nothing}},CuArray{Float32,1,Nothing},getfield(CuArrays.
CUDNN, Symbol("##358#359")){CuArrays.CUDNN.RNNDesc{Float32},CuArray{Float32,1,Nothing},CuArray{Float32,1,Nothing},CuArray{UInt8,1,Nothing},CuArray{Float32,1,Nothing}}}},Tuple{Tuple{Nothing},Tuple{Nothing}}})(::Tuple{Nothing,CuArray{Float32,1,Nothing}}) at
/home/appleparan/.julia/packages/Zygote/8dVxG/src/lib/lib.jl:142
 [8] (::getfield(Zygote, Symbol("##283#back#155")){getfield(Zygote, Symbol("##153#154")){getfield(Flux.CUDA, Symbol("##69#back#11")){getfield(Flux.CUDA, Symbol("##9#10")){Zygote.Context,Flux.GRUCell{CuArray{Float32,2,Nothing},CuArray{Float32,1,Nothing}},Cu
Array{Float32,1,Nothing},getfield(CuArrays.CUDNN, Symbol("##358#359")){CuArrays.CUDNN.RNNDesc{Float32},CuArray{Float32,1,Nothing},CuArray{Float32,1,Nothing},CuArray{UInt8,1,Nothing},CuArray{Float32,1,Nothing}}}},Tuple{Tuple{Nothing},Tuple{Nothing}}}})(::Tu
ple{Nothing,CuArray{Float32,1,Nothing}}) at /home/appleparan/.julia/packages/ZygoteRules/6nssF/src/adjoint.jl:49
 [9] (::typeof(∂(λ)))(::CuArray{Float32,1,Nothing}) at /home/appleparan/.julia/packages/Flux/jXyco/src/layers/recurrent.jl:36
 [10] (::typeof(∂(λ)))(::Float32) at /home/appleparan/src/DebugFlux/test_curnn.jl:8
 [11] (::getfield(Zygote, Symbol("##28#29")){typeof(∂(λ))})(::Float32) at /home/appleparan/.julia/packages/Zygote/8dVxG/src/compiler/interface.jl:38
 [12] gradient(::getfield(Main, Symbol("##3#7")){CuArray{Float32,1,Nothing}}, ::Tuple{Flux.Recur{Flux.GRUCell{CuArray{Float32,2,Nothing},CuArray{Float32,1,Nothing}}}}) at /home/appleparan/.julia/packages/Zygote/8dVxG/src/compiler/interface.jl:47
 [13] run_test() at /home/appleparan/src/DebugFlux/test_curnn.jl:8

second run

ERROR: CUDNNError(code CUDNN_STATUS_BAD_PARAM, CUDNN_STATUS_BAD_PARAM)
Stacktrace:
 [1] adjoint(::Zygote.Context, ::typeof(Core._apply), ::Flux.LSTMCell{CuArray{Float32,2,Nothing},CuArray{Float32,1,Nothing}}, ::Tuple{Tuple{Tuple{CuArray{Float32,1,Nothing},CuArray{Float32,1,Nothing}}},Tuple{CuArray{Float32,1,Nothing}}}) at /home/appleparan/.julia/packages/Zygote/8dVxG/src/lib/lib.jl:139
 [2] _pullback(::Zygote.Context, ::typeof(Core._apply), ::Flux.LSTMCell{CuArray{Float32,2,Nothing},CuArray{Float32,1,Nothing}}, ::Tuple{Tuple{Tuple{CuArray{Float32,1,Nothing},CuArray{Float32,1,Nothing}}},Tuple{CuArray{Float32,1,Nothing}}}) at /home/appleparan/.julia/packages/ZygoteRules/6nssF/src/adjoint.jl:47
 [3] _pullback(::Zygote.Context, ::Flux.Recur{Flux.LSTMCell{CuArray{Float32,2,Nothing},CuArray{Float32,1,Nothing}}}, ::Tuple{CuArray{Float32,1,Nothing}}) at /home/appleparan/.julia/packages/Flux/jXyco/src/layers/recurrent.jl:36
 [4] _pullback(::Zygote.Context, ::getfield(Main, Symbol("##3#7")){CuArray{Float32,1,Nothing}}, ::Tuple{Flux.Recur{Flux.LSTMCell{CuArray{Float32,2,Nothing},CuArray{Float32,1,Nothing}}}}) at /home/appleparan/src/DebugFlux/test_curnn.jl:8
 [5] _pullback(::getfield(Main, Symbol("##3#7")){CuArray{Float32,1,Nothing}}, ::Tuple{Flux.Recur{Flux.LSTMCell{CuArray{Float32,2,Nothing},CuArray{Float32,1,Nothing}}}}) at /home/appleparan/.julia/packages/Zygote/8dVxG/src/compiler/interface.jl:31
 [6] pullback(::getfield(Main, Symbol("##3#7")){CuArray{Float32,1,Nothing}}, ::Tuple{Flux.Recur{Flux.LSTMCell{CuArray{Float32,2,Nothing},CuArray{Float32,1,Nothing}}}}) at /home/appleparan/.julia/packages/Zygote/8dVxG/src/compiler/interface.jl:37
 [7] gradient(::getfield(Main, Symbol("##3#7")){CuArray{Float32,1,Nothing}}, ::Tuple{Flux.Recur{Flux.LSTMCell{CuArray{Float32,2,Nothing},CuArray{Float32,1,Nothing}}}}) at /home/appleparan/.julia/packages/Zygote/8dVxG/src/compiler/interface.jl:46
 [8] run_test() at /home/appleparan/src/DebugFlux/test_curnn.jl:8

This error indicates first test case code

  for R in [RNN, GRU, LSTM]
    m = R(10, 5) |> gpu
    x = gpu(rand(10))
    (m̄,) = gradient(m -> sum(m(x)), m)
    Flux.reset!(m)
    θ = gradient(() -> sum(m(x)), Flux.params(m))
    @test collect(m̄[].cell[].Wi) == collect(θ[m.cell.Wi])
  end

Then it works after third run.

Could It be Zygote issue?

MikeInnes · 2019-11-19T09:47:58Z

Can you try running with GC disabled and see if you can still reproduce it?

We never actually added all the GC.@preserves that we should have to in the rnn wrappers, so it might just be as simple as fixing that.

maleadt · 2019-11-19T09:50:53Z

We never actually added all the GC.@preserves that we should have to in the rnn wrappers, so it might just be as simple as fixing that.

We don't need any GC.@preserves, as we're only dealing with CuArrays and the unsafe conversion to a pointer happens "within" the ccall.

I'll have a closer look at this, but couldn't reproduce last I looked.

MikeInnes · 2019-11-19T10:19:04Z

IIUC, we could still get an early free if the ccall gets inlined, and any of the ccall-invoked code (i.e. user-defined conversions) allocates memory and triggers GC. I imagine our CuArray->Ptr conversions don't allocate, but since that's dependent on compiler optimisations it seems a little risky.

I have no idea if early frees would even cause this error – you'd expect a segfault-like issue instead – but hopefully GC will tell us that. Of course, it was misleading last time, so ¯\(ツ)/¯

FWIW, this seems to be showing up in about 1/3-1/2 of bors runs.

maleadt · 2019-11-19T10:40:43Z

IIUC, we could still get an early free if the ccall gets inlined, and any of the ccall-invoked code (i.e. user-defined conversions) allocates memory and triggers GC. I imagine our CuArray->Ptr conversions don't allocate, but since that's dependent on compiler optimisations it seems a little risky.

No, that's not correct. If the unsafe conversions (i.e. the ones that get an untracked reference to data) only happen by Base.unsafe_convert, ccall guarantees that the object (or more correctly, the object as returned by Base.cconvert) is kept alive until after the ccall returns.

julia> a = [1]
1-element Array{Int64,1}:
 1

julia> Meta.@lower ccall(:whatever, Nothing, (Ptr{Int},), a)
:($(Expr(:thunk, CodeInfo(
    @ none within `top-level scope'
1 ─ %1 = Core.apply_type(Ptr, Int)
│   %2 = Base.cconvert(%1, a)
│   %3 = Core.apply_type(Ptr, Int)
│   %4 = Base.unsafe_convert(%3, %2)
│   %5 = $(Expr(:foreigncall, :(:whatever), :Nothing, :(Core.svec(Core.apply_type(Ptr, Int))), :(:ccall), 1, :(%4), :(%2)))
└──      return %5
))))

%2 being passed into the foreigncall is what keeps the array alive, regardless of any user-defined conversions that happen in unsafe_convert. See https://docs.julialang.org/en/v1/devdocs/llvm/#Supporting-[ccall](@ref)-1

MikeInnes · 2019-11-19T10:47:38Z

Ah, I wasn't aware that ccall had that safety net built in. Yes, that does seem like it should be OK then.

maleadt · 2019-11-19T10:54:58Z

I still cannot reproduce. @appleparan could you send a Manifest? Which versions of Julia, CUDA and CUDNN are you using?

appleparan · 2019-11-19T11:23:54Z

@maleadt Here it is. https://gist.github.com/appleparan/69289887e446b3ec57f1f42c6a375588

I was trying to reproduce whole day. However, If I didn't put bp it can be reproduced, however, if I put bp just before on cudnnRNNForward and cudnnRNNBackwardData to check workspace size, I couldn't.

Following is commands that I used. I didn't use GC related options.

$ CUARRAYS_MEMORY_LIMIT="40000000" julia
] activate .
include("test_curnn.jl")
using Debugger

@enter run_test()

EDIT
Julia ver.

Julia Version 1.2.0
Commit c6da87ff4b (2019-08-20 00:03 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)
Environment:
  JULIA_PATH = /usr/local/julia
  JULIA_BINDIR = /usr/local/julia/bin

CUDA Info

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

I made a Singularity image, and run julia inside

appleparan · 2019-11-19T15:19:24Z

I added Zygote.@code_adjoint to gradient(m -> sum(m(x)), m) and get this

https://gist.github.com/appleparan/676d03e9de15092b0e7d3d6501c52517

I thought pullback from this lines and this should be called, however, there is no pullback(rnn::RNNDesc{T}, x::CuArray{T}, h::CuArray{T}, c::CuArray{T}) in stacktrace.

Does anyone explain this? It is nonsense not to call pullback(rnn::RNNDesc{T}, x::CuArray{T}, h::CuArray{T}, c::CuArray{T}) and get CUDNNError.

Moreover, this error only happens on first run. If I want to reproduce, I need to restart Julia. That's why I think this relates to code generation or Zygote.

MikeInnes · 2019-11-19T15:36:12Z

You're taking the gradient of m -> sum(m(x)) (an anonymous function), and then taking the adjoint of that (i.e. it's a second order derivative). In any case, the call to CuArrays.pullback is inside a custom adjoint, not derived by Zygote, so it won't show up in there.

maleadt · 2019-11-26T14:11:54Z

R = LSTM: Error During Test at /home/tbesard/Julia/pkg/Flux/test/cuda/curnn.jl:5
  Got exception outside of a @test
  AssertionError: length(workspace) >= cudnnGetRNNWorkspaceSize(rnn, seqlen, xd)
  Stacktrace:
   [1] cudnnRNNForward(::CuArrays.CUDNN.RNNDesc{Float32}, ::Int64, ::Array{CuArrays.CUDNN.TensorDesc,1}, ::CuArray{Float32,1,Nothing}, ::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,1,Nothing}, ::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,1,Nothing}, ::CuArrays.CUDNN.FilterDesc, ::CuArray{Float32,1,Nothing}, ::Array{CuArrays.CUDNN.TensorDesc,1}, ::CuArray{Float32,1,Nothing}, ::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,1,Nothing}, ::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,1,Nothing}, ::CuArray{UInt8,1,Nothing}, ::CuArray{UInt8,1,Nothing}) at /home/tbesard/Julia/pkg/CuArrays/src/dnn/rnn.jl:83

Which doesn't make any sense since the workspace is allocated right before the call to cudnnRNNForward. So either cudnnGetRNNWorkspaceSize returns a different value, or the workspace is getting destroyed behind the scenes by a CuArrays memory pool bug.

MikeInnes · 2019-11-26T15:07:03Z

Is it possible that the alloc causes memory to get freed by the pool, affecting CUDNN's heuristic?

If so, it seems pretty hard to work around this. Perhaps we can just allocate-and-check until CUDNN is happy, at least as a temporary fix.

maleadt · 2019-11-26T15:25:01Z

ding, ding, ding

# outer method that allocates
workspace_len = cudnnGetRNNWorkspaceSize(rnn, seqLength, xdesc) = 1216
# inner one
length(workspace) = 1216
workspace_len = cudnnGetRNNWorkspaceSize(rnn, seqlen, xd) = 1248
R = GRU, batch_size = 1: Error During Test at /home/tim/Julia/pkg/Flux/test/cuda/curnn.jl:14
  Got exception outside of a @test
  AssertionError: length(workspace) >= workspace_len

MikeInnes · 2019-11-26T15:29:14Z

Dammit CUDNN, we talked about this

944: RNN failure hackaround r=MikeInnes a=MikeInnes See #923. bors try Co-authored-by: Mike Innes <mike.j.innes@gmail.com>

appleparan · 2019-11-26T19:31:21Z

Finally! workspace problem again.

I gave up this issue because after updating Flux#master I couldn't reproduce and in my case when I inspect variables, error doesn't appeared.

I saw draft PR from @maleadt . It is great solution. However, the real problem is workspace is smaller than CUDNN expected. I was curious about why only Julia have this problem. CUDNN sample code just call cudnnGetRNNWorkspaceSize once and use it whole sample code. (It' not from official repo but it seems okay).

How about keeping workspace size at start of pullback and pass that size and not calling cudnnGetRNNWorkspaceSize everytime?

maleadt · 2019-11-26T19:33:27Z

How about keeping workspace size at start of pullback and pass that size and not calling cudnnGetRNNWorkspaceSize everytime?

That wouldn't work, because we can happen to free data in between, in which case CUDNN would expect a larger workspace due to how its heuristics appear to work. In the sample code, there are no frees in between those calls, they only happen at the very end of the sample.

appleparan · 2019-11-26T19:47:17Z

I see. You are right.

DhairyaLGandhi · 2019-11-26T19:49:51Z

So the heuristic is to check until cudnn is happy with the amount of memory we allocate?

maleadt · 2019-11-26T21:05:12Z

We can't really check if CUDNN is happy because it returns INVALID_PARAM and not INSUFFICIENT_WORKSPACE or similar. So we allocate until it doesn't ask us to allocate more, and hope that we don't suddenly free memory before calling into the library (which would change the heuristic).

944: RNN failure hackaround r=MikeInnes a=MikeInnes See #923. bors try Co-authored-by: Mike Innes <mike.j.innes@gmail.com>

maleadt · 2019-11-27T15:06:34Z

Should be fixed

ararslan · 2020-03-29T02:26:03Z

I've been getting this, which seems related to what's been discussed here:

ERROR: LoadError: CUDNNError: CUDNN_STATUS_BAD_PARAM (code 3)
Stacktrace:
 [1] throw_api_error(::CuArrays.CUDNN.cudnnStatus_t) at /home/ubuntu/.julia/packages/CuArrays/A6GUx/src/dnn/error.jl:27
 [2] macro expansion at /home/ubuntu/.julia/packages/CuArrays/A6GUx/src/dnn/error.jl:40 [inlined]
 [3] cudnnRNNBackwardData(::Ptr{Nothing}, ::CuArrays.CUDNN.RNNDesc{Float32}, ::Int64, ::Array{CuArrays.CUDNN.TensorDesc,1}, ::CuArray{Float32,2,Nothing}, ::Array{CuArrays.CUDNN.TensorDesc,1}, ::CuArray{Float32,2,Nothing}, ::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,1,CuArray{Float32,2,Nothing}}, ::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,1,CuArray{Float32,2,Nothing}}, ::CuArrays.CUDNN.FilterDesc, ::CuArray{Float32,1,Nothing}, ::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,2,Nothing}, ::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,2,Nothing}, ::Array{CuArrays.CUDNN.TensorDesc,1}, ::CuArray{Float32,2,Nothing}, ::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,2,Nothing}, ::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,2,Nothing}, ::CuArray{UInt8,1,Nothing}, ::Int64, ::CuArray{UInt8,1,Nothing}, ::Int64) at /home/ubuntu/.julia/packages/CuArrays/A6GUx/src/dnn/libcudnn.jl:1362
 [4] macro expansion at /home/ubuntu/.julia/packages/CuArrays/A6GUx/src/dnn/rnn.jl:143 [inlined]
 [5] macro expansion at /home/ubuntu/.julia/packages/CuArrays/A6GUx/src/utils.jl:198 [inlined]
 [6] cudnnRNNBackwardData(::CuArrays.CUDNN.RNNDesc{Float32}, ::Int64, ::Array{CuArrays.CUDNN.TensorDesc,1}, ::CuArray{Float32,2,Nothing}, ::Array{CuArrays.CUDNN.TensorDesc,1}, ::CuArray{Float32,2,Nothing}, ::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,1,CuArray{Float32,2,Nothing}}, ::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,1,CuArray{Float32,2,Nothing}}, ::CuArrays.CUDNN.FilterDesc, ::CuArray{Float32,1,Nothing}, ::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,2,Nothing},::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,2,Nothing}, ::Array{CuArrays.CUDNN.TensorDesc,1}, ::CuArray{Float32,2,Nothing}, ::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,2,Nothing}, ::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,2,Nothing}, ::CuArray{UInt8,1,Nothing}) at /home/ubuntu/.julia/packages/CuArrays/A6GUx/src/dnn/rnn.jl:139
 [7] backwardData(::CuArrays.CUDNN.RNNDesc{Float32}, ::CuArray{Float32,2,Nothing}, ::CuArray{Float32,2,Nothing}, ::CuArray{Float32,1,CuArray{Float32,2,Nothing}}, ::CuArray{Float32,1,CuArray{Float32,2,Nothing}}, ::CuArray{Float32,2,Nothing}, ::CuArray{Float32,2,Nothing}, ::CuArray{UInt8,1,Nothing}) at /home/ubuntu/.julia/packages/CuArrays/A6GUx/src/dnn/rnn.jl:157
 [8] (::CuArrays.CUDNN.var"#357#358"{CuArrays.CUDNN.RNNDesc{Float32},CuArray{Float32,2,Nothing},CuArray{Float32,2,Nothing},CuArray{Float32,2,Nothing},CuArray{UInt8,1,Nothing},CuArray{Float32,2,Nothing}})(::CuArray{Float32,2,Nothing}, ::CuArray{Float32,1,CuArray{Float32,2,Nothing}}, ::CuArray{Float32,1,CuArray{Float32,2,Nothing}}) at /home/ubuntu/.julia/packages/CuArrays/A6GUx/src/dnn/rnn.jl:200
 [9] (::Flux.CUDA.var"#13#14"{Zygote.Context,Flux.LSTMCell{CuArray{Float32,2,Nothing},CuArray{Float32,1,Nothing}},CuArray{Float32,2,Nothing},CuArray{Float32,2,Nothing},CuArrays.CUDNN.var"#357#358"{CuArrays.CUDNN.RNNDesc{Float32},CuArray{Float32,2,Nothing},CuArray{Float32,2,Nothing},CuArray{Float32,2,Nothing},CuArray{UInt8,1,Nothing},CuArray{Float32,2,Nothing}}})(::Tuple{Tuple{CuArray{Float32,1,CuArray{Float32,2,Nothing}},CuArray{Float32,1,CuArray{Float32,2,Nothing}}},CuArray{Float32,2,Nothing}}) at /home/ubuntu/.julia/packages/Flux/NpkMm/src/cuda/curnn.jl:86
 [10] #306#back at /home/ubuntu/.julia/packages/ZygoteRules/6nssF/src/adjoint.jl:49 [inlined]
 [11] #175 at /home/ubuntu/.julia/packages/Zygote/KNUTW/src/lib/lib.jl:170 [inlined]
 [12] (::Zygote.var"#344#back#177"{Zygote.var"#175#176"{Flux.CUDA.var"#306#back#15"{Flux.CUDA.var"#13#14"{Zygote.Context,Flux.LSTMCell{CuArray{Float32,2,Nothing},CuArray{Float32,1,Nothing}},CuArray{Float32,2,Nothing},CuArray{Float32,2,Nothing},CuArrays.CUDNN.var"#357#358"{CuArrays.CUDNN.RNNDesc{Float32},CuArray{Float32,2,Nothing},CuArray{Float32,2,Nothing},CuArray{Float32,2,Nothing},CuArray{UInt8,1,Nothing},CuArray{Float32,2,Nothing}}}},Tuple{Tuple{Nothing},Tuple{Nothing}}}})(::Tuple{Tuple{CuArray{Float32,1,CuArray{Float32,2,Nothing}},CuArray{Float32,1,CuArray{Float32,2,Nothing}}},CuArray{Float32,2,Nothing}}) at /home/ubuntu/.julia/packages/ZygoteRules/6nssF/src/adjoint.jl:49
 [13] Recur at /home/ubuntu/.julia/packages/Flux/NpkMm/src/layers/recurrent.jl:36 [inlined]
 [14] (::typeof(∂(λ)))(::CuArray{Float32,2,Nothing}) at /home/ubuntu/.julia/packages/Zygote/KNUTW/src/compiler/interface2.jl:0
 [15] applychain at /home/ubuntu/.julia/packages/Flux/NpkMm/src/layers/basic.jl:30 [inlined]
 ... (the last 2 lines are repeated 12 more times)
 [40] (::typeof(∂(applychain)))(::CuArray{Float32,2,CuArray{Float32,3,Nothing}}) at /home/ubuntu/.julia/packages/Zygote/KNUTW/src/compiler/interface2.jl:0

If this isn't the same, I can open a new issue.

ararslan · 2020-03-29T19:39:47Z

Apologies for the noise, it seems it is not the same issue.

maleadt · 2020-03-30T07:36:52Z

Apologies for the noise, it seems it is not the same issue.

What was the issue?

ararslan · 2020-03-30T16:29:48Z

I still have no idea, but the workspace and reserve sizes appear to be correct, so it seems not to be the same issue as this one.

jeremiedb · 2020-04-08T04:16:32Z

I also encountered CUDNN_STATUS_BAD_PARAM (code 3) with RNN (latest Flux/Zygote/CuArrays):
The origin seems to be Flux.reset! and RNN state size initialization, which results in an improper state dimension:

rnn = Chain(GRU(16, 8),
  Dense(8,1, σ),
  x -> reshape(x,:))

X = [rand(16,10) for i in 1:20]
Y = rand(10,20) ./ 10

rnn = rnn |> gpu
X = gpu(X)
Y = gpu(Y)

θ = Flux.params(rnn)
loss(x,y) = mean((Flux.stack(rnn.(X),2) .- y) .^ 2f0)
opt = ADAM(1e-3)
size(rnn[1].state)
Flux.reset!(rnn)
Flux.train!(loss, θ, [(X,Y)], opt)
size(rnn[1].state)

Itr can be observed that both prior and after reset!, rnn state is of size (8), while after a call to train! on GPU, state becomes the expected proper size (8,10). After each call to reset!, the CUDNN_STATUS_BAD_PARAM error pops out after first call to train!, but subsequent ones are fine as the state size stays (8,10). Can't confirm whether that state size is the root cause, but appears closely tied to the bug.

maleadt · 2020-04-08T06:31:36Z

Is that reproducible? If so, could you put it in a new issue with some details on package and CUDA versions?

maleadt added bug cuda labels Nov 6, 2019

MikeInnes mentioned this issue Nov 15, 2019

Fix binarycrossentropy on CuArrays #926

Merged

maleadt mentioned this issue Nov 19, 2019

GC preserve in generated ccall JuliaGPU/CuArrays.jl#503

Closed

MikeInnes mentioned this issue Nov 19, 2019

Fix AMSGrad on GPU #935

Merged

MikeInnes mentioned this issue Nov 19, 2019

Activations #860

Merged

maleadt mentioned this issue Nov 22, 2019

Don't include the CUDA module during precompilation. #941

Merged

MikeInnes mentioned this issue Nov 26, 2019

Fix logitbinarycrossentropy on CuArrays #940

Merged

This was referenced Nov 26, 2019

RNN failure hackaround #944

Merged

RNN Failure hackaround JuliaGPU/CuArrays.jl#515

Closed

bors bot added a commit that referenced this issue Nov 26, 2019

Merge #944

7b8c82a

944: RNN failure hackaround r=MikeInnes a=MikeInnes See #923. bors try Co-authored-by: Mike Innes <mike.j.innes@gmail.com>

bors bot added a commit that referenced this issue Nov 27, 2019

Merge #944

3699969

944: RNN failure hackaround r=MikeInnes a=MikeInnes See #923. bors try Co-authored-by: Mike Innes <mike.j.innes@gmail.com>

maleadt closed this as completed Nov 27, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spurious RNN failure with CUDNN #923

Spurious RNN failure with CUDNN #923

maleadt commented Nov 6, 2019

YolCruz commented Nov 7, 2019 •

edited

Loading

MikeInnes commented Nov 15, 2019

maleadt commented Nov 15, 2019

appleparan commented Nov 18, 2019 •

edited

Loading

MikeInnes commented Nov 19, 2019 •

edited

Loading

maleadt commented Nov 19, 2019

MikeInnes commented Nov 19, 2019

maleadt commented Nov 19, 2019

MikeInnes commented Nov 19, 2019

maleadt commented Nov 19, 2019

appleparan commented Nov 19, 2019 •

edited

Loading

appleparan commented Nov 19, 2019 •

edited

Loading

MikeInnes commented Nov 19, 2019

maleadt commented Nov 26, 2019

MikeInnes commented Nov 26, 2019

maleadt commented Nov 26, 2019

MikeInnes commented Nov 26, 2019

appleparan commented Nov 26, 2019

maleadt commented Nov 26, 2019

appleparan commented Nov 26, 2019

DhairyaLGandhi commented Nov 26, 2019

maleadt commented Nov 26, 2019

maleadt commented Nov 27, 2019

ararslan commented Mar 29, 2020

ararslan commented Mar 29, 2020

maleadt commented Mar 30, 2020

ararslan commented Mar 30, 2020

jeremiedb commented Apr 8, 2020 •

edited

Loading

maleadt commented Apr 8, 2020

Spurious RNN failure with CUDNN #923

Spurious RNN failure with CUDNN #923

Comments

maleadt commented Nov 6, 2019

YolCruz commented Nov 7, 2019 • edited Loading

MikeInnes commented Nov 15, 2019

maleadt commented Nov 15, 2019

appleparan commented Nov 18, 2019 • edited Loading

MikeInnes commented Nov 19, 2019 • edited Loading

maleadt commented Nov 19, 2019

MikeInnes commented Nov 19, 2019

maleadt commented Nov 19, 2019

MikeInnes commented Nov 19, 2019

maleadt commented Nov 19, 2019

appleparan commented Nov 19, 2019 • edited Loading

appleparan commented Nov 19, 2019 • edited Loading

MikeInnes commented Nov 19, 2019

maleadt commented Nov 26, 2019

MikeInnes commented Nov 26, 2019

maleadt commented Nov 26, 2019

MikeInnes commented Nov 26, 2019

appleparan commented Nov 26, 2019

maleadt commented Nov 26, 2019

appleparan commented Nov 26, 2019

DhairyaLGandhi commented Nov 26, 2019

maleadt commented Nov 26, 2019

maleadt commented Nov 27, 2019

ararslan commented Mar 29, 2020

ararslan commented Mar 29, 2020

maleadt commented Mar 30, 2020

ararslan commented Mar 30, 2020

jeremiedb commented Apr 8, 2020 • edited Loading

maleadt commented Apr 8, 2020

YolCruz commented Nov 7, 2019 •

edited

Loading

appleparan commented Nov 18, 2019 •

edited

Loading

MikeInnes commented Nov 19, 2019 •

edited

Loading

appleparan commented Nov 19, 2019 •

edited

Loading

appleparan commented Nov 19, 2019 •

edited

Loading

jeremiedb commented Apr 8, 2020 •

edited

Loading