-
-
Notifications
You must be signed in to change notification settings - Fork 612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flux.Zeros conflicts with Flux.loadparams! #1277
Comments
I don't see any reason why Zeros should be in parameters. @DhairyaLGandhi what is that test for? |
It's to make sure we don't train on bias on accident Functor would automatically collect all the params currently, a better solution might be to teach loadparams how to handle Zeros. |
Also I've found out, that if you move model to any device, then MWE to reproduce
device = Flux.cpu
model = Conv((3, 3), 1 => 2, bias=Flux.Zeros()) |> device
xin = randn(Float32, 4, 4, 1, 1) |> device
you = randn(Float32, 2, 2, 1, 1) |> device
model_params = Flux.params(model)
println(">>> Model parameters:")
for p in model_params
println("$(typeof(p)) $(size(p))")
end
loss(x, y) = Flux.mse(model(x), y)
grads = Flux.gradient(model_params) do
loss(xin, you)
end
println(">>> Gradients:")
for (k, v) in grads.grads
println("$(typeof(k)) $(size(k)) => $(typeof(v))")
end MWE output without moving model to any device
MWE output when `cpu` is device
MWE output when `gpu` is device
|
I think having a solid notion of how to
Would be valuable. #1284 would be a great to have too |
If you give someone this Julia file: using Flux
using BSON
model = BSON.load("arbitrary_saved_model.bson") do they have enough information to reverse-engineer that model in a different language? Maybe I'm way off-base here, but it feels like using |
I agree that there is missing information in the parameters. Btw if we make Zeros a mutable struct, then I think we get a lot of the same benefits with having the flexibility of not defining a Singleton For the record, making Zeros mutable would give you: julia> m = Chain(
Conv((3, 3), 1 => 2, bias=Flux.Zeros()),
Conv((3, 3), 2 => 3, bias=Flux.Zeros()),
Conv((3, 3), 3 => 4, bias=Flux.Zeros()),
)
Chain(Conv((3, 3), 1=>2), Conv((3, 3), 2=>3), Conv((3, 3), 3=>4))
julia> length(Flux.params(m))
6
julia> Flux.params(m)[2]
0-dimensional Flux.Zeros{Bool,0}:
0
julia> gs = gradient(() -> sum(m(rand(Float32, 28,28,1,1))), Flux.params(m))
Grads(...)
julia> gs.grads
IdDict{Any,Any} with 6 entries:
Float32[0.0169228 -… => Float32[387.49 390.719 392.534; 385.96…
fill(false) => nothing
fill(false) => nothing
fill(false) => nothing
Float32[0.092979 -0… => Float32[-432.106 -433.08 -435.981; -43…
Float32[-0.166749 0… => Float32[-0.000347137 -0.42526 0.50606;… |
If we don't add it to the parameters, how do we encode that the bias is turned off when pulling out the weights from the model? I think it would work if the entire model was saved, but |
I am kind of new to Julia. Are there no polymorphism tricks we can use? Is there any way to just define a edit: Reading more into the documentation, it looks like that's already kind of what Per the documentation:
edit: I got function (c::Conv)(x::AbstractArray)
# TODO: breaks gpu broadcast :(
# ndims(x) == ndims(c.weight)-1 && return squeezebatch(c(reshape(x, size(x)..., 1)))
### My changes
σ = c.σ
if c.bias != nothing
b = reshape(c.bias, ntuple(_->1, length(c.stride))..., :, 1)
else
b = 0
end
###
cdims = DenseConvDims(x, c.weight; stride=c.stride, padding=c.pad, dilation=c.dilation)
σ.(conv(x, c.weight, cdims) .+ b)
end Is that what julia> m = Chain(
Conv((3, 3), 1 => 2, bias=nothing),
Conv((3, 3), 2 => 3, bias=nothing),
Conv((3, 3), 3 => 4, bias=nothing),
)
Chain(Conv((3, 3), 1=>2), Conv((3, 3), 2=>3), Conv((3, 3), 3=>4))
julia> length(Flux.params(m))
3
julia> Flux.params(m)[2]
3×3×2×3 Array{Float32,4}:
[:, :, 1, 1] =
0.089666 0.28373 -0.300706
-0.272113 0.252896 -0.100829
0.171586 -0.324309 0.00790166
...
julia> gs = gradient(() -> sum(m(rand(Float32, 28,28,1,1))), Flux.params(m))
Grads(...)
julia> gs.grads
IdDict{Any,Any} with 3 entries:
Float32[0.089666 0.28373 -0.300706; -0.272113 0.252896 -0.100829; 0.1715… => Float32[-189.062 -189.535 -189.128; -188.437 -188.879 -189.078; -187.731 -189.049 -189.331]…
Float32[-0.10269 -0.0312617 -0.184054; -0.0267541 0.172398 0.170193; 0.1… => Float32[-38.5559 -38.4327 -38.9365; -38.2541 -38.1558 -39.0549; -38.5386 -37.9995 -38.0654]…
Float32[0.177636 0.394305 -0.407948; 0.34289 -0.426031 0.319731; 0.00487… => Float32[245.675 244.423 244.569; 244.205 243.673 244.096; 243.72 243.263 243.177]…
julia> Flux.loadparams!(m, Flux.params(m))
julia> length(params(m))
3
For a given This might be way more ambitious than I realize, but I was hoping there was some way to have edit: I promise I'm doing my homework. I read the thread where |
@DhairyaLGandhi what did you mean by this comment in #1281? |
1379: Fix some issues with Zeros option 2 r=CarloLucibello a=DrChainsaw Fixes #1332, #1277 This is take two on fixing the above issues and is intended to be mutually exclusive to #1374. In this version Zeros is no longer an AbstractArray and is thus more like a there-is-no-such-parameter marker instead of this-parameter-is-an-immutable-array-of-zeros type. I made the change so that the advertised way of disabling bias is through `bias=false` rather than `bias=Zeros()`. `Zeros` itself is alot less capable here compared to the previous attempt, but that also keeps the number of moving parts down, i.e no need to test that both the 0-dim version and the full size version works the same. All in all there are fewer specializations compared to #1374. I think that a rename could definitely be in place. Better names I can think of are `bias=Off`, `bias=None` or as suggested by @mcabbott `bias=False()`. The best argument against renaming I can think of is to be a little bit less breaking. ### PR Checklist - [X] Tests are added - [X] Entry in NEWS.md - [ ] Documentation, if applicable - [ ] Final review from `@dhairyagandhi96` (for API changes). Co-authored-by: DrChainsaw <Christian.kyril.skarby@gmail.com>
This should have been closed by #1379, I must have messed up the 'fixes' syntax. |
Consider a model where all convolutions have their biases disabled:
If I collect its parameters (e.g. for saving them later) and print:
in the output I will have:
So parameters of the model contain disabled bias, but only from the 1st layer (since parameters are stored in
Zygote.IdSet
others are ignored).Moreover if I want to load these parameters into the model:
I get error
ERROR: setindex! not defined for Flux.Zeros{Bool,0}
.Easy fix would be to define
params!
forFlux.Zeros
that does not add it to parameters:I wanted to create a PR with this fix, but this solution conflicts with this test which expects
Flux.Zeros
to be in parameters.Maybe change it to instead check that
bias
is not in gradients?The text was updated successfully, but these errors were encountered: