-
-
Notifications
You must be signed in to change notification settings - Fork 607
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove Recurrent cell eltype restriction #1755
Conversation
make Nil() convert to float as NaN and error when converting to other Number subtypes Allows Nil[] to be passed to recurrent layers (when type not enforced)
Looks quite good to me, thanks! |
Could we do a few quick tests that make sure that mismatched eltypes work in the RNN layers as expected on the GPU |
Yeah, I didn't check any CUDA related stuff. No knowledge of that. How bad are these unsuccessful tests above? |
For the RNN tests, first I'd create a regular rnn layer, and then pass it to And rinse and repeat for CUDA. |
This goes against #1483, so I don't think we should merge without a discussion. The implementation is good other than @CarloLucibello's suggestion though. Are the errors from #1483 no longer an issue? What made them go away? Did something change on the Zygote side of thing @mcabbott? |
No idea, have never looked at #1483. When I try, I get different errors:
I see that |
I don't think |
Also changes float(::Type{Nil}) for consistency
@@ -16,10 +16,9 @@ const nil = Nil() | |||
Nil(::T) where T<:Number = nil | |||
Nil(::Nil) = nil | |||
(::Type{T})(::Nil) where T<:Number = T(NaN) | |||
Base.convert(::Type{Nil}, ::Type{T}) where T<:Number = T(NaN) | |||
Base.convert(::Type{T}, ::Nil) where T<:Number = T(NaN) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's also keep the old method.
Base.convert(::Type{T}, ::Nil) where T<:Number = T(NaN) | |
Base.convert(::Nil, ::Type{T}) where T<:Number = nil | |
Base.convert(::Type{T}, ::Nil) where T<:Number = T(NaN) |
|
||
|
||
Base.float(::Type{Nil}) = Nil | ||
Base.float(::Type{Nil}) = Float64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Base.float(::Type{Nil}) = Float64 | |
Base.float(::Nil) = NaN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, since NaN isa Float64
, shouldn't it actually be both:
Base.float(::Type{Nil}) = Float64
Base.float(::Nil) = NaN
I'm afraid the proposed change results in hiding an error that arise from inconsistent eltype of input (X) and cell's initial state's ( using Flux
using Statistics: mean
using Random: seed!
using BenchmarkTools
using CUDA
#####################################
# RNN vanilla
#####################################
seed!(123)
feat = 8
h_size = 3
seq_len = 5
batch_size = 15
rnn = RNN(feat, h_size)
X = [rand(Float64, feat, batch_size) for i in 1:seq_len]
Y = rand(Float64, batch_size, seq_len) ./ 10
#### transfer to gpu ####
rnn_gpu = rnn |> gpu
X_gpu = CuArray{Float64}.(X)
Y_gpu = Float64.(Y)
θ = Flux.params(rnn)
θ_gpu = Flux.params(rnn_gpu)
function loss_cpu(x, y)
# Flux.reset!(rnn)
l = sum(Flux.stack(map(rnn, x), dims=2)) # edit: dims keyword
return l
end
function loss_gpu(x, y)
l = sum(Flux.stack(map(rnn_gpu, x), dims=2)) # edit
return l
end
loss_cpu(X, Y)
loss_gpu(X_gpu, Y_gpu)
opt = ADAM(1e-3)
Flux.train!(loss_cpu, θ, [(X, Y)], opt)
opt_gpu = ADAM(1e-3)
Flux.train!(loss_gpu, θ_gpu, [(X_gpu, Y_gpu)], opt_gpu)
CPU's ERROR: LoadError: this intrinsic must be compiled to be called
Stacktrace:
[1] macro expansion
@ C:\Users\jerem\.julia\packages\Zygote\rv6db\src\compiler\interface2.jl:0 [inlined]
[2] _pullback(::Zygote.Context, ::Core.IntrinsicFunction, ::String, ::Type{Int64}, ::Type{Tuple{Ptr{Int64}}}, ::Ptr{Int64})
... If using Float32 instead: Also, under the current forced parametrization on the RNNCell forward path definition function, the issue would be catched in advance through the missmatched types: ERROR: LoadError: MethodError: no method matching (::Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}})(::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, ::CuArray{Float64, 2, CUDA.Mem.DeviceBuffer})
Closest candidates are:
(::Flux.RNNCell{F, A, V, var"#s526"} where var"#s526"<:AbstractMatrix{T})(::Any, ::Union{AbstractMatrix{T}, AbstractVector{T}, Flux.OneHotArray}) where {F, A, V, T} at
c:\github\Flux.jl\src\layers\recurrent.jl:104 A way to properly handle Float32 params with Float64 input could be to specify the rnn = RNN(feat, h_size, init_state=zeros) which results in a valid model on CPU. However, moving to GPU, the I guess ideal scenario is for the GPU gradient calculation to handle where |
Eltype promotion is how Julia handles multiple code paths and is a necessary component in (say) working with complex data, or duals etc. We also have tools to manage eltypes consistently ( |
I think having at least a warning here is important. It's a more complex issue than the kind of promotion that occurs for convs, because of the type of the storage field. I do think the better long term solution is an API. Per Julia's promotion system, the expected state type should be the promotion of the weights and the input. Right now we "guess" the state type based on the weights alone. |
Maybe there's a necessary compromise here since if the Recur's state isn't parametrized (in the struct definition), then the promotion will happen successfully on the state, even on GPU. However, Recur's output type then cannot be inferred, which has resulted in reported performance issue. As a side note, if one is aware of current gotchas and wants to initialize a model with Float32 weights and Float64 input and state, an issue is that that gpu conversion moves everything into Float32, thus losing the required Float64 for state: rnn_cpu = RNN(feat, h_size, init=Flux.zeros32, initb=Flux.zeros32, init_state=zeros)
rnn_gpu = rnn_cpu |> gpu
julia> rnn_cpu.state
3×1 Matrix{Float64}:
0.0
0.0
0.0
julia> rnn_gpu.state
3×1 CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}:
0.0
0.0
0.0 Wouldn't it be desirable for |
I'm not sure on that last bit, since GPUs can have very different performance properties depending on precision. Either way, if we know ahead of time that |
superseded by #2234 |
Remove type restriction in the methods associated with recurrent cells
RNN
,LSTM
,GRU
andGRUv3
.Change conversion of
Nil
to other types, soNil()
gets converted toNaN16, NaN32, NaN
for the corresponding float types and also complex types with float real and imaginary parts. Conversion errors for Integer and Rational types, in accordance to the errors when converting non integer floats.This allows
outputsize
to work with recurrent cells as well.Suitable tests were added.
This should fix #1565.
PR Checklist