Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove Recurrent cell eltype restriction #1755

Closed
wants to merge 6 commits into from

Conversation

rmsrosa
Copy link

@rmsrosa rmsrosa commented Oct 25, 2021

Remove type restriction in the methods associated with recurrent cells RNN, LSTM, GRU and GRUv3.

Change conversion of Nil to other types, so Nil() gets converted to NaN16, NaN32, NaN for the corresponding float types and also complex types with float real and imaginary parts. Conversion errors for Integer and Rational types, in accordance to the errors when converting non integer floats.

This allows outputsize to work with recurrent cells as well.

Suitable tests were added.

This should fix #1565.

PR Checklist

  • [X ] Tests are added
  • Entry in NEWS.md
  • Documentation, if applicable
  • API changes require approval from a committer (different from the author, if applicable)

make Nil() convert to float as NaN
and error when converting to other Number subtypes

Allows Nil[] to be passed to recurrent layers (when type not enforced)
@DhairyaLGandhi
Copy link
Member

Looks quite good to me, thanks!

@DhairyaLGandhi
Copy link
Member

Could we do a few quick tests that make sure that mismatched eltypes work in the RNN layers as expected on the GPU

DhairyaLGandhi
DhairyaLGandhi previously approved these changes Oct 25, 2021
@rmsrosa
Copy link
Author

rmsrosa commented Oct 25, 2021

Yeah, I didn't check any CUDA related stuff. No knowledge of that.

How bad are these unsuccessful tests above?

@DhairyaLGandhi
Copy link
Member

For the RNN tests, first I'd create a regular rnn layer, and then pass it to Flux.f64 and run it with Float32 data.

And rinse and repeat for CUDA.

src/outputsize.jl Outdated Show resolved Hide resolved
@darsnack
Copy link
Member

darsnack commented Oct 26, 2021

This goes against #1483, so I don't think we should merge without a discussion. The implementation is good other than @CarloLucibello's suggestion though.

Are the errors from #1483 no longer an issue? What made them go away? Did something change on the Zygote side of thing @mcabbott?

@mcabbott
Copy link
Member

No idea, have never looked at #1483. When I try, I get different errors:

julia> Flux.train!(loss, Flux.params(m), x, ADAM())
ERROR: MethodError: no method matching (::Flux.RNNCell{typeof(tanh), Matrix{Float32}, Vector{Float32}, Matrix{Float32}})(::Matrix{Float32}, ::Float64)

julia> x32 = [rand(Float32, 5) for i in 1:2]
2-element Vector{Vector{Float32}}:
 [0.19781059, 0.095588446, 0.7885516, 0.20238525, 0.21201706]
 [0.85013026, 0.44129986, 0.82300264, 0.8385034, 0.3073842]

julia> Flux.train!(loss, Flux.params(m), x32, ADAM())
ERROR: MethodError: no method matching (::Flux.RNNCell{typeof(tanh), Matrix{Float32}, Vector{Float32}, Matrix{Float32}})(::Matrix{Float32}, ::Float32)
Closest candidates are:
  (::Flux.RNNCell{F, A, V, <:AbstractMatrix{T}})(::Any, ::Union{AbstractVector{T}, AbstractMatrix{T}, Flux.OneHotArray}) where {F, A, V, T} at ~/.julia/packages/Flux/ZnXxS/src/layers/recurrent.jl:103
Stacktrace:
  [1] macro expansion
    @ ~/.julia/packages/Zygote/rv6db/src/compiler/interface2.jl:0 [inlined]
  [2] _pullback(::Zygote.Context, ::Flux.RNNCell{typeof(tanh), Matrix{Float32}, Vector{Float32}, Matrix{Float32}}, ::Matrix{Float32}, ::Float32)
    @ Zygote ~/.julia/packages/Zygote/rv6db/src/compiler/interface2.jl:9
  [3] _pullback
    @ ~/.julia/packages/Flux/ZnXxS/src/layers/recurrent.jl:47 [inlined]

I see that Flux.Recur is mutable, and Zygote currently has issues with some such, and almost no tests... does Flux have decent tests for these?

@ToucheSir
Copy link
Member

ToucheSir commented Oct 26, 2021

I don't think Flux.train! does the right thing at all when given batched sequence data, so the MWEs from #1483 definitely need to be tweaked.

Also changes float(::Type{Nil}) for consistency
@@ -16,10 +16,9 @@ const nil = Nil()
Nil(::T) where T<:Number = nil
Nil(::Nil) = nil
(::Type{T})(::Nil) where T<:Number = T(NaN)
Base.convert(::Type{Nil}, ::Type{T}) where T<:Number = T(NaN)
Base.convert(::Type{T}, ::Nil) where T<:Number = T(NaN)
Copy link
Member

@darsnack darsnack Oct 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also keep the old method.

Suggested change
Base.convert(::Type{T}, ::Nil) where T<:Number = T(NaN)
Base.convert(::Nil, ::Type{T}) where T<:Number = nil
Base.convert(::Type{T}, ::Nil) where T<:Number = T(NaN)



Base.float(::Type{Nil}) = Nil
Base.float(::Type{Nil}) = Float64
Copy link
Member

@darsnack darsnack Oct 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Base.float(::Type{Nil}) = Float64
Base.float(::Nil) = NaN

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, since NaN isa Float64, shouldn't it actually be both:

Base.float(::Type{Nil}) = Float64
Base.float(::Nil) = NaN

@mcabbott mcabbott added the RNN label Oct 30, 2021
@jeremiedb
Copy link
Contributor

jeremiedb commented Nov 5, 2021

I'm afraid the proposed change results in hiding an error that arise from inconsistent eltype of input (X) and cell's initial state's (state0) when getting the gradients on GPU. For example, if feeding a RNN with Float64 while the model's state0 is initialized with defaults Float32:

using Flux
using Statistics: mean
using Random: seed!
using BenchmarkTools
using CUDA

#####################################
# RNN vanilla
#####################################
seed!(123)
feat = 8
h_size = 3
seq_len = 5
batch_size = 15

rnn = RNN(feat, h_size)

X = [rand(Float64, feat, batch_size) for i in 1:seq_len]
Y = rand(Float64, batch_size, seq_len) ./ 10

#### transfer to gpu ####
rnn_gpu = rnn |> gpu
X_gpu = CuArray{Float64}.(X)
Y_gpu = Float64.(Y)

θ = Flux.params(rnn)
θ_gpu = Flux.params(rnn_gpu)

function loss_cpu(x, y)
    # Flux.reset!(rnn)
    l = sum(Flux.stack(map(rnn, x), dims=2))  # edit: dims keyword
    return l
end

function loss_gpu(x, y)
    l = sum(Flux.stack(map(rnn_gpu, x), dims=2))  # edit
    return l
end

loss_cpu(X, Y)
loss_gpu(X_gpu, Y_gpu)

opt = ADAM(1e-3)
Flux.train!(loss_cpu, θ, [(X, Y)], opt)

opt_gpu = ADAM(1e-3)
Flux.train!(loss_gpu, θ_gpu, [(X_gpu, Y_gpu)], opt_gpu)

CPU's train! works fine, while GPU's results in:

ERROR: LoadError: this intrinsic must be compiled to be called
Stacktrace:
  [1] macro expansion
    @ C:\Users\jerem\.julia\packages\Zygote\rv6db\src\compiler\interface2.jl:0 [inlined]
  [2] _pullback(::Zygote.Context, ::Core.IntrinsicFunction, ::String, ::Type{Int64}, ::Type{Tuple{Ptr{Int64}}}, ::Ptr{Int64})
...

If using Float32 instead: X_gpu = CuArray{Float32}.(X), there's no issue.

Also, under the current forced parametrization on the RNNCell forward path definition function, the issue would be catched in advance through the missmatched types:

ERROR: LoadError: MethodError: no method matching (::Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}})(::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, ::CuArray{Float64, 2, CUDA.Mem.DeviceBuffer})
Closest candidates are:
  (::Flux.RNNCell{F, A, V, var"#s526"} where var"#s526"<:AbstractMatrix{T})(::Any, ::Union{AbstractMatrix{T}, AbstractVector{T}, Flux.OneHotArray}) where {F, A, V, T} at 
c:\github\Flux.jl\src\layers\recurrent.jl:104

A way to properly handle Float32 params with Float64 input could be to specify the init_state type:

rnn = RNN(feat, h_size, init_state=zeros)

which results in a valid model on CPU. However, moving to GPU, the |> gpu approach converts all model parameters to Float32, which keeps the problem in place.

I guess ideal scenario is for the GPU gradient calculation to handle where state0 type differs from input/hidden state type, but I miss the underlying CUDA implications :\

@DhairyaLGandhi
Copy link
Member

Eltype promotion is how Julia handles multiple code paths and is a necessary component in (say) working with complex data, or duals etc. We also have tools to manage eltypes consistently (f64, f32), and type stability is a pretty common pattern to follow too. In conv we warn on mismatched eltypes but don't error, so this can be made consistent.

@darsnack
Copy link
Member

darsnack commented Nov 5, 2021

I think having at least a warning here is important. It's a more complex issue than the kind of promotion that occurs for convs, because of the type of the storage field.

I do think the better long term solution is an API. Per Julia's promotion system, the expected state type should be the promotion of the weights and the input. Right now we "guess" the state type based on the weights alone.

@jeremiedb
Copy link
Contributor

Maybe there's a necessary compromise here since if the Recur's state isn't parametrized (in the struct definition), then the promotion will happen successfully on the state, even on GPU. However, Recur's output type then cannot be inferred, which has resulted in reported performance issue.
But if the intrinsic must be compiled error on GPU can be resolved, it would result in the desired behavior right? Would be worth opening an issue in CUDA.jl?

As a side note, if one is aware of current gotchas and wants to initialize a model with Float32 weights and Float64 input and state, an issue is that that gpu conversion moves everything into Float32, thus losing the required Float64 for state:

rnn_cpu = RNN(feat, h_size, init=Flux.zeros32, initb=Flux.zeros32, init_state=zeros)
rnn_gpu = rnn_cpu |> gpu
julia> rnn_cpu.state
3×1 Matrix{Float64}:
 0.0
 0.0
 0.0

julia> rnn_gpu.state
3×1 CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}:
 0.0
 0.0
 0.0

Wouldn't it be desirable for gpu to preserve the types? If such was the case, then the handling of different types for weights vs input/state would be at least more convenient as the above would result in valid coded for GPU.

@DhairyaLGandhi
Copy link
Member

Wouldn't it be desirable for gpu to preserve the types?

I'm not sure on that last bit, since GPUs can have very different performance properties depending on precision. Either way, if we know ahead of time that gpu produces f32 we can much easier predict the eltypes we will get. Note that the error raised successfully prevents the user from mixing eltypes on the GPU which can have vastly larger differences in what performance we see. Is it common for users to need different eltypes for different parameters? Most cases I have seen don't seem to mind f32 everywhere, and don't actually intend to use the full precision of f64. Relaxing the eltypes makes it way easier to handle mixed precision cases without needing to hold on to too many specific kernels for different eltypes.

@rmsrosa
Copy link
Author

rmsrosa commented May 5, 2023

superseded by #2234

@rmsrosa rmsrosa closed this May 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Recurrent cell eltype restriction breaks outputsize
7 participants