inconsistency between params and destructure #1733

CarloLucibello · 2021-10-03T15:03:40Z

In my understanding, destructure is supposed to behave similarly to params and collect trainable params of a struct into a vector. Instead, it collects all arrays, as one can see in this example using BatchNorm:

julia> b = BatchNorm(2)
BatchNorm(2)        # 4 parameters, plus 4 non-trainable

julia> ps = params(b)
Params([Float32[0.0, 0.0], Float32[1.0, 1.0]])

julia> p, re = Flux.destructure(b)
(Float32[0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0], Flux.var"#60#62"{BatchNorm{typeof(identity), Vector{Float32}, Float32, Vector{Float32}}}(BatchNorm(2)))

julia> length(p)
8

Should we modify destructure to act like params, therefore recurse over trainable(m) instead of just applying fmap as it currently does?

Also, maybe we should add to both params and destructure a keyword argument which with possible values :trainable, :buffer, :all, and default value :trainable.

Related to #1727

The text was updated successfully, but these errors were encountered:

DhairyaLGandhi · 2021-10-03T15:55:38Z

They're pretty different functions and meant for different reasons. destructred outputs can be used by external libraries, which conservatively need all the parameters from a training routine to be used downstream.

CarloLucibello · 2021-10-03T16:04:30Z

Seems exactly the same use to me, training, which means that both functions should output what is meant to be trained. I think we need some input from SciML people, since they are prominent consumers of desctructure @ChrisRackauckas

DhairyaLGandhi · 2021-10-03T16:11:25Z

Not just training, some libraries use it for initial guesses as well. I'd prefer to not make trainable a "special" requirement for defining layers

ToucheSir · 2021-10-03T16:21:35Z

Differing use cases aside, making destructure call trainable won't make trainable any more of a requirement for defining layers than it already is (i.e. not at all), because it forwards to functor by default.

ChrisRackauckas · 2021-10-03T20:55:21Z

I thought it only grabbed the trainable params 😅 . The reason is to, for example, get the vector and use BFGS to train the parameters of the Chain. If it's not giving the same vector out, then training with Optim wouldn't be the same as training with Flux's optimizers, which is not what we would intend. So it should probably exclude the values that are meant to be constant.

darsnack · 2021-10-03T21:01:18Z

We can switch to using the new walk keyword for fmap to only walk the trainable parameters with an option to pass an alternate walk into destructure.

DhairyaLGandhi · 2021-10-04T04:21:07Z

It currently excluded some values that we would want included at higher order cases, and on our end it's actually easier to ignore the objects that don't have gradients which can differ depending on layer configurations by grabbing the gradients from the NamedTuple directly. That is a nice and generic approach.

CarloLucibello mentioned this issue Oct 11, 2021

Have destructure return only trainable params #1742

Closed

mcabbott mentioned this issue Mar 8, 2022

Use destructure from Optimisers.jl #1901

Merged

mcabbott closed this as completed in #1901 Mar 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

inconsistency between params and destructure #1733

inconsistency between params and destructure #1733

CarloLucibello commented Oct 3, 2021 •

edited

Loading

DhairyaLGandhi commented Oct 3, 2021

CarloLucibello commented Oct 3, 2021

DhairyaLGandhi commented Oct 3, 2021

ToucheSir commented Oct 3, 2021

ChrisRackauckas commented Oct 3, 2021

darsnack commented Oct 3, 2021

DhairyaLGandhi commented Oct 4, 2021

inconsistency between params and destructure #1733

inconsistency between params and destructure #1733

Comments

CarloLucibello commented Oct 3, 2021 • edited Loading

DhairyaLGandhi commented Oct 3, 2021

CarloLucibello commented Oct 3, 2021

DhairyaLGandhi commented Oct 3, 2021

ToucheSir commented Oct 3, 2021

ChrisRackauckas commented Oct 3, 2021

darsnack commented Oct 3, 2021

DhairyaLGandhi commented Oct 4, 2021

CarloLucibello commented Oct 3, 2021 •

edited

Loading