Tied Weights #488

jtmatamalas · 2018-11-12T19:07:13Z

Hi guys,

I'm trying to build an autoencoder with tied weights between the encoder and the decoder. The model is easy to encode, just transposing the weights of the encoder layers on the decoder layers. Here's an example with just one layer for the encoder and one for the decoder:

data = [rand(10,2), rand(10,2)]

encoder = Dense(10, 5, relu)
decoder = Dense(transpose(encoder.W), zeros(Float64, 10), relu)

m = Chain(encoder, decoder)

loss(x) = mse(m(x), x)
opt = ADAM(params(m))

evalcb = throttle(() -> @show(loss(data[1])), 5)

Flux.train!(loss, zip(data), opt, cb = evalcb)

This model works quite well when I run it on CPU. However, when I'm trying to execute it on GPU:

data = gpu.([rand(10,2), rand(10,2)])

encoder = Dense(10, 5, relu) |> gpu
decoder = Dense(transpose(encoder.W), gpu(zeros(Float64, 10)), relu)

m = Chain(encoder, decoder)

loss(x) = mse(m(x), x)
opt = ADAM(params(m))

evalcb = throttle(() -> @show(loss(data[1])), 5)

Flux.train!(loss, zip(data), opt, cb = evalcb)

I got a pretty ugly error message, see below. I'm pretty sure that there is something wrong with the broadcasting over the transposed array on the GPU.

Do you have any clue about what's happening?

> ERROR: GPU compilation of #25(CuArrays.CuKernelState, CUDAnative.CuDeviceArray{Float32,2,CUDAnative.AS.Global}, Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64},Base.OneTo{Int64}},typeof(+),Tuple{Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{Float32,2,CUDAnative.AS.Global},Tuple{Bool,Bool},Tuple{Int64,Int64}},Base.Broadcast.Extruded{Base.ReshapedArray{Float32,2,LinearAlgebra.Adjoint{Float32,CuArray{Float32,2}},Tuple{Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64}}},Tuple{Bool,Bool},Tuple{Int64,Int64}}}}) failed
> KernelError: passing and using non-bitstype argument

The text was updated successfully, but these errors were encountered:

MikeInnes · 2018-11-14T12:25:46Z

I can reproduce this with:

julia> x = param(cu(rand(10,10)));

julia> sum((x')*x) |> Flux.back!

So that's something we should fix.

OTOH it's probably not best to write your autoencoder this way. I would write it more like:

w = param(rand(5, 10))
function m(x)
  encoding = w*x
  decoding = w'*encoding
end

This way the transpose of w happens inside the model, where we're taking gradients, rather than outside.

CarloLucibello · 2021-02-12T08:33:01Z

This is fixed now,

x = rand(10,2) |> gpu
encoder = Dense(10, 5, relu)
decoder = Dense(transpose(encoder.W), zeros(Float32, 10), relu) 
m = Chain(encoder, decoder) |> gpu
gradient(() -> Flux.Losses.mse(m(x), x), Flux.params(m))

works fine

tueboesen mentioned this issue Aug 1, 2019

Layer initialization with predefined weights #830

Closed

CarloLucibello closed this as completed Feb 12, 2021

CarloLucibello mentioned this issue Feb 12, 2021

tied weights (by transposition) are not tied when sent to gpu #1504

Open

dfenn mentioned this issue May 7, 2021

Tied weights using Flux layers #1592

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tied Weights #488

Tied Weights #488

jtmatamalas commented Nov 12, 2018 •

edited by MikeInnes

Loading

MikeInnes commented Nov 14, 2018

CarloLucibello commented Feb 12, 2021

Tied Weights #488

Tied Weights #488

Comments

jtmatamalas commented Nov 12, 2018 • edited by MikeInnes Loading

MikeInnes commented Nov 14, 2018

CarloLucibello commented Feb 12, 2021

jtmatamalas commented Nov 12, 2018 •

edited by MikeInnes

Loading