Flux.Zeros conflicts with Flux.loadparams! #1277

pxl-th · 2020-07-11T11:39:56Z

Consider a model where all convolutions have their biases disabled:

m = Chain(
    Conv((3, 3), 1 => 2, bias=Flux.Zeros()),
    Conv((3, 3), 2 => 3, bias=Flux.Zeros()),
    Conv((3, 3), 3 => 4, bias=Flux.Zeros()),
)

If I collect its parameters (e.g. for saving them later) and print:

mp = m |> params .|> cpu
for p in mp
    println("$(typeof(p)) $(size(p))")
end

in the output I will have:

Array{Float32,4} (3, 3, 1, 2)
Array{Bool,0} () <--- disabled bias from 1st conv
Array{Float32,4} (3, 3, 2, 3)
Array{Float32,4} (3, 3, 3, 4)

So parameters of the model contain disabled bias, but only from the 1st layer (since parameters are stored in Zygote.IdSet others are ignored).
Moreover if I want to load these parameters into the model:

Flux.loadparams!(m, mp)

I get error ERROR: setindex! not defined for Flux.Zeros{Bool,0}.

Easy fix would be to define params! for Flux.Zeros that does not add it to parameters:

params!(p::Flux.Params, x::Flux.Zeros{<:Number}, seen = Flux.IdSet()) = nothing

I wanted to create a PR with this fix, but this solution conflicts with this test which expects Flux.Zeros to be in parameters.
Maybe change it to instead check that bias is not in gradients?

The text was updated successfully, but these errors were encountered:

CarloLucibello · 2020-07-11T12:08:27Z

I don't see any reason why Zeros should be in parameters. @DhairyaLGandhi what is that test for?

DhairyaLGandhi · 2020-07-11T21:55:48Z

It's to make sure we don't train on bias on accident

Functor would automatically collect all the params currently, a better solution might be to teach loadparams how to handle Zeros.

pxl-th · 2020-07-12T19:12:42Z

Also I've found out, that if you move model to any device, then Flux.Zeros gets converted to actual Array representation and gradient w.r.t. bias is no longer nothing.

MWE to reproduce

device = Flux.cpu
model = Conv((3, 3), 1 => 2, bias=Flux.Zeros()) |> device
xin = randn(Float32, 4, 4, 1, 1) |> device
you = randn(Float32, 2, 2, 1, 1) |> device

model_params = Flux.params(model)
println(">>> Model parameters:")
for p in model_params
    println("$(typeof(p)) $(size(p))")
end

loss(x, y) = Flux.mse(model(x), y)
grads = Flux.gradient(model_params) do
    loss(xin, you)
end

println(">>> Gradients:")
for (k, v) in grads.grads
    println("$(typeof(k)) $(size(k)) => $(typeof(v))")
end

MWE output without moving model to any device

>>> Model parameters:
Array{Float32,4} (3, 3, 1, 2)
Flux.Zeros{Bool,0} ()
>>> Gradients:
Array{Float32,4} (3, 3, 1, 2) => Array{Float32,4}
Flux.Zeros{Bool,0} () => Nothing

MWE output when `cpu` is device

>>> Model parameters:
Array{Float32,4} (3, 3, 1, 2)
Array{Bool,0} ()
>>> Gradients:
Array{Bool,0} () => Array{Float32,0}
Array{Float32,4} (3, 3, 1, 2) => Array{Float32,4}

MWE output when `gpu` is device

>>> Model parameters:
CUDA.CuArray{Float32,4,Nothing} (3, 3, 1, 2)
CUDA.CuArray{Bool,0,Nothing} ()
>>> Gradients:
CUDA.CuArray{Float32,4,Nothing} (3, 3, 1, 2) => CUDA.CuArray{Float32,4,Nothing}
CUDA.CuArray{Bool,0,Nothing} () => CUDA.CuArray{Float32,0,CUDA.CuArray{Float32,4,CUDA.CuArray{Float32,4,Nothing}}}

MacKenzieHnC · 2020-09-08T16:29:10Z

Would solving #1294 and #1284 solve this? It seems like not separating the different "trainable" parameters is becoming a problem

DhairyaLGandhi · 2020-09-08T16:46:24Z

I think having a solid notion of how to

Load Zeros
Don't convert when it's moved to a device

Would be valuable. #1284 would be a great to have too

MacKenzieHnC · 2020-09-08T17:51:23Z

If you give someone this Julia file:

using Flux
using BSON

model = BSON.load("arbitrary_saved_model.bson")

do they have enough information to reverse-engineer that model in a different language?

Maybe I'm way off-base here, but it feels like using Zeros to turn bias off is covering up for an insufficient definition somewhere

DhairyaLGandhi · 2020-09-08T18:33:45Z

I agree that there is missing information in the parameters. Btw if we make Zeros a mutable struct, then I think we get a lot of the same benefits with having the flexibility of not defining a Singleton Flux.Zeros, so it should reflect as different parameters in the params object. The fact that the bias is turned off is valuable information to maintain the reproducibility of the model, which can't be ignored, and therefore should be reflected in the params.

For the record, making Zeros mutable would give you:

julia> m = Chain(
           Conv((3, 3), 1 => 2, bias=Flux.Zeros()),
           Conv((3, 3), 2 => 3, bias=Flux.Zeros()),
           Conv((3, 3), 3 => 4, bias=Flux.Zeros()),
       )
Chain(Conv((3, 3), 1=>2), Conv((3, 3), 2=>3), Conv((3, 3), 3=>4))

julia> length(Flux.params(m))
6

julia> Flux.params(m)[2]
0-dimensional Flux.Zeros{Bool,0}:
0

julia> gs = gradient(() -> sum(m(rand(Float32, 28,28,1,1))), Flux.params(m))
Grads(...)

julia> gs.grads
IdDict{Any,Any} with 6 entries:
  Float32[0.0169228 -… => Float32[387.49 390.719 392.534; 385.96…
  fill(false)          => nothing
  fill(false)          => nothing
  fill(false)          => nothing
  Float32[0.092979 -0… => Float32[-432.106 -433.08 -435.981; -43…
  Float32[-0.166749 0… => Float32[-0.000347137 -0.42526 0.50606;…

DhairyaLGandhi · 2020-09-08T19:32:38Z

Easy fix would be to define params! for Flux.Zeros that does not add it to parameters:

If we don't add it to the parameters, how do we encode that the bias is turned off when pulling out the weights from the model? I think it would work if the entire model was saved, but loadparams!(model, weights_with_zeros) would fail

MacKenzieHnC · 2020-09-09T17:33:10Z

If we don't add it to the parameters, how do we encode that the bias is turned off when pulling out the weights from the model?

I am kind of new to Julia. Are there no polymorphism tricks we can use? Is there any way to just define a Conv struct that doesn't have a bias property in the first place?

edit: Reading more into the documentation, it looks like that's already kind of what Zeros is doing. But what problem is solved by making bias Union{Zeros, AbstractVector{T}} instead of Union{Nothing, AbstractVector{T}}?

Per the documentation:

A particularly useful case of a Union type is Union{T, Nothing}, where T can be any type and Nothing is the singleton type whose only instance is the object nothing. This pattern is the Julia equivalent of Nullable, Option or Maybe types in other languages. Declaring a function argument or a field as Union{T, Nothing} allows setting it either to a value of type T, or to nothing to indicate that there is no value.

edit: I got Union{Nothing, AbstractVector{T}} to work but it required putting an if statement in the function that evaluates Conv

function (c::Conv)(x::AbstractArray)
  # TODO: breaks gpu broadcast :(
  # ndims(x) == ndims(c.weight)-1 && return squeezebatch(c(reshape(x, size(x)..., 1)))
  
  ### My changes
  σ = c.σ
  if c.bias != nothing
      b = reshape(c.bias, ntuple(_->1, length(c.stride))..., :, 1)
  else
      b = 0
  end
  ###
  cdims = DenseConvDims(x, c.weight; stride=c.stride, padding=c.pad, dilation=c.dilation)
  σ.(conv(x, c.weight, cdims) .+ b)
end

Is that what Zeros is supposed to solve? Is it a performance optimization? Here's the results of the example after my changes:

julia> m = Chain(
           Conv((3, 3), 1 => 2, bias=nothing),
           Conv((3, 3), 2 => 3, bias=nothing),
           Conv((3, 3), 3 => 4, bias=nothing),
       )
Chain(Conv((3, 3), 1=>2), Conv((3, 3), 2=>3), Conv((3, 3), 3=>4))

julia> length(Flux.params(m))
3

julia> Flux.params(m)[2]
3×3×2×3 Array{Float32,4}:
[:, :, 1, 1] =
  0.089666   0.28373   -0.300706
 -0.272113   0.252896  -0.100829
  0.171586  -0.324309   0.00790166
...

julia> gs = gradient(() -> sum(m(rand(Float32, 28,28,1,1))), Flux.params(m))
Grads(...)

julia> gs.grads
IdDict{Any,Any} with 3 entries:
  Float32[0.089666 0.28373 -0.300706; -0.272113 0.252896 -0.100829; 0.1715… => Float32[-189.062 -189.535 -189.128; -188.437 -188.879 -189.078; -187.731 -189.049 -189.331]…
  Float32[-0.10269 -0.0312617 -0.184054; -0.0267541 0.172398 0.170193; 0.1… => Float32[-38.5559 -38.4327 -38.9365; -38.2541 -38.1558 -39.0549; -38.5386 -37.9995 -38.0654]…
  Float32[0.177636 0.394305 -0.407948; 0.34289 -0.426031 0.319731; 0.00487… => Float32[245.675 244.423 244.569; 244.205 243.673 244.096; 243.72 243.263 243.177]…

julia> Flux.loadparams!(m, Flux.params(m))

julia> length(params(m))
3

The fact that the bias is turned off is valuable information to maintain the reproducibility of the model, which can't be ignored, and therefore should be reflected in the params.

For a given i it seems like we don't know what Flux.params(model)[i] is anyway unless we already knew what kinds of params we were expecting.

This might be way more ambitious than I realize, but I was hoping there was some way to have Flux.params return a dict, kind of like tf.compat.v1.trainable_variables does in Python. We already have names for all the parameters (since we get them from the struct) but we throw that information away at some step

edit: I promise I'm doing my homework. I read the thread where Flux.Zeros() got added, I just can't see what it is solving that the original nothing solution didn't.

MacKenzieHnC · 2020-09-10T16:18:41Z

Another way of looking at it is, if you see an array where you try loading it with nothing (instead of zeros) you break the assumption that it might be populated with legitimate values if you train it further.

@DhairyaLGandhi what did you mean by this comment in #1281?

@mcabbott

1379: Fix some issues with Zeros option 2 r=CarloLucibello a=DrChainsaw Fixes #1332, #1277 This is take two on fixing the above issues and is intended to be mutually exclusive to #1374. In this version Zeros is no longer an AbstractArray and is thus more like a there-is-no-such-parameter marker instead of this-parameter-is-an-immutable-array-of-zeros type. I made the change so that the advertised way of disabling bias is through `bias=false` rather than `bias=Zeros()`. `Zeros` itself is alot less capable here compared to the previous attempt, but that also keeps the number of moving parts down, i.e no need to test that both the 0-dim version and the full size version works the same. All in all there are fewer specializations compared to #1374. I think that a rename could definitely be in place. Better names I can think of are `bias=Off`, `bias=None` or as suggested by @mcabbott `bias=False()`. The best argument against renaming I can think of is to be a little bit less breaking. ### PR Checklist - [X] Tests are added - [X] Entry in NEWS.md - [ ] Documentation, if applicable - [ ] Final review from `@dhairyagandhi96` (for API changes). Co-authored-by: DrChainsaw <Christian.kyril.skarby@gmail.com>

DrChainsaw · 2020-11-09T00:45:06Z

This should have been closed by #1379, I must have messed up the 'fixes' syntax.

pxl-th mentioned this issue Jul 12, 2020

Ignore Zeros when gathering parameters #1281

Closed

4 tasks

This was referenced Oct 31, 2020

Fix some issues with Zeros #1374

Closed

Fix some issues with Zeros option 2 #1379

Merged

CarloLucibello closed this as completed Nov 9, 2020

DhairyaLGandhi mentioned this issue Nov 23, 2020

Allow Zeros with dimensions #1402

Closed

4 tasks

DrChainsaw mentioned this issue Feb 14, 2021

Zeros has old behaviour on releases up to 0.11.6 #1507

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flux.Zeros conflicts with Flux.loadparams! #1277

Flux.Zeros conflicts with Flux.loadparams! #1277

pxl-th commented Jul 11, 2020 •

edited

Loading

CarloLucibello commented Jul 11, 2020

DhairyaLGandhi commented Jul 11, 2020

pxl-th commented Jul 12, 2020 •

edited

Loading

MacKenzieHnC commented Sep 8, 2020

DhairyaLGandhi commented Sep 8, 2020

MacKenzieHnC commented Sep 8, 2020

DhairyaLGandhi commented Sep 8, 2020 •

edited

Loading

DhairyaLGandhi commented Sep 8, 2020

MacKenzieHnC commented Sep 9, 2020 •

edited

Loading

MacKenzieHnC commented Sep 10, 2020

DrChainsaw commented Nov 9, 2020

Flux.Zeros conflicts with Flux.loadparams! #1277

Flux.Zeros conflicts with Flux.loadparams! #1277

Comments

pxl-th commented Jul 11, 2020 • edited Loading

CarloLucibello commented Jul 11, 2020

DhairyaLGandhi commented Jul 11, 2020

pxl-th commented Jul 12, 2020 • edited Loading

MacKenzieHnC commented Sep 8, 2020

DhairyaLGandhi commented Sep 8, 2020

MacKenzieHnC commented Sep 8, 2020

DhairyaLGandhi commented Sep 8, 2020 • edited Loading

DhairyaLGandhi commented Sep 8, 2020

MacKenzieHnC commented Sep 9, 2020 • edited Loading

MacKenzieHnC commented Sep 10, 2020

DrChainsaw commented Nov 9, 2020

pxl-th commented Jul 11, 2020 •

edited

Loading

pxl-th commented Jul 12, 2020 •

edited

Loading

DhairyaLGandhi commented Sep 8, 2020 •

edited

Loading

MacKenzieHnC commented Sep 9, 2020 •

edited

Loading