Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tied Weights #488

Closed
jtmatamalas opened this issue Nov 12, 2018 · 2 comments
Closed

Tied Weights #488

jtmatamalas opened this issue Nov 12, 2018 · 2 comments

Comments

@jtmatamalas
Copy link

jtmatamalas commented Nov 12, 2018

Hi guys,

I'm trying to build an autoencoder with tied weights between the encoder and the decoder. The model is easy to encode, just transposing the weights of the encoder layers on the decoder layers. Here's an example with just one layer for the encoder and one for the decoder:

data = [rand(10,2), rand(10,2)]

encoder = Dense(10, 5, relu)
decoder = Dense(transpose(encoder.W), zeros(Float64, 10), relu)

m = Chain(encoder, decoder)

loss(x) = mse(m(x), x)
opt = ADAM(params(m))

evalcb = throttle(() -> @show(loss(data[1])), 5)

Flux.train!(loss, zip(data), opt, cb = evalcb)

This model works quite well when I run it on CPU. However, when I'm trying to execute it on GPU:

data = gpu.([rand(10,2), rand(10,2)])

encoder = Dense(10, 5, relu) |> gpu
decoder = Dense(transpose(encoder.W), gpu(zeros(Float64, 10)), relu)

m = Chain(encoder, decoder)

loss(x) = mse(m(x), x)
opt = ADAM(params(m))

evalcb = throttle(() -> @show(loss(data[1])), 5)

Flux.train!(loss, zip(data), opt, cb = evalcb)

I got a pretty ugly error message, see below. I'm pretty sure that there is something wrong with the broadcasting over the transposed array on the GPU.

Do you have any clue about what's happening?

> ERROR: GPU compilation of #25(CuArrays.CuKernelState, CUDAnative.CuDeviceArray{Float32,2,CUDAnative.AS.Global}, Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64},Base.OneTo{Int64}},typeof(+),Tuple{Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{Float32,2,CUDAnative.AS.Global},Tuple{Bool,Bool},Tuple{Int64,Int64}},Base.Broadcast.Extruded{Base.ReshapedArray{Float32,2,LinearAlgebra.Adjoint{Float32,CuArray{Float32,2}},Tuple{Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64}}},Tuple{Bool,Bool},Tuple{Int64,Int64}}}}) failed
> KernelError: passing and using non-bitstype argument
@MikeInnes
Copy link
Member

I can reproduce this with:

julia> x = param(cu(rand(10,10)));

julia> sum((x')*x) |> Flux.back!

So that's something we should fix.

OTOH it's probably not best to write your autoencoder this way. I would write it more like:

w = param(rand(5, 10))
function m(x)
  encoding = w*x
  decoding = w'*encoding
end

This way the transpose of w happens inside the model, where we're taking gradients, rather than outside.

@CarloLucibello
Copy link
Member

This is fixed now,

x = rand(10,2) |> gpu
encoder = Dense(10, 5, relu)
decoder = Dense(transpose(encoder.W), zeros(Float32, 10), relu) 
m = Chain(encoder, decoder) |> gpu
gradient(() -> Flux.Losses.mse(m(x), x), Flux.params(m))

works fine

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants