Per-leaf freezing #49

mcabbott · 2022-02-01T15:39:22Z

This is one way we could handle freezing of certain nodes, by altering the state tree.

julia> model = (x=[1,2], y=([3,4], sin));

julia> state = Optimisers.setup(Descent(), model);

julia> state = Optimisers.freeze(state, :y)
(x = Leaf(Descent{Float32}(0.1), nothing, false), y = (Leaf(Descent{Float32}(0.1), nothing, true), nothing))

julia> state, model = Optimisers.update(state, model, (x=[1,10], y=([100,1000], nothing)));

julia> model
(x = [0.8999999985098839, 0.9999999850988388], y = ([3, 4], sin))

julia> state = Optimisers.thaw(state)
(x = Leaf(Descent{Float32}(0.1), nothing, false), y = (Leaf(Descent{Float32}(0.1), nothing, false), nothing))

julia> state, model = Optimisers.update(state, model, (x=[1,10], y=([100,1000], nothing)));

julia> model
(x = [0.7999999970197678, -2.9802322387695312e-8], y = ([-7.000000149011612, -96.00000149011612], sin))

darsnack · 2022-02-01T16:52:18Z

This is the mental model I had in my mind, but I forgot that we have Leaf now. Between Leaf and Ties (or whatever it is called), we can store node-local and tree-global hints.

One difference this approach vs. separate trees for auxiliary information is that the latter is extensible. For example, if we didn't have freezing built-in, a separate package could define a freeze utility that walks the state and produces a tree with true/false indicating which nodes are frozen. Then with a custom walk, that package could implement a freezing-compatible update.

I'm just using freezing as a hypothetical here to illustrate that a Functors.jl solution with multiple trees could serve both Optimisers.jl use-cases and external use-cases. Of course, it's trickier and more complex than what we have here.

mcabbott · 2022-02-01T17:51:09Z

This is true. I'd worry a little bit that understanding the API for an extensible multi-tree walk might be harder than writing it yourself.

One extensibility thought is: Instead of building this into Leaf, it could be a separate Freeze which wraps it, and provides an update! specialisation which stops it. That's perhaps a more extensible model to provide. It would make a state for which tree.x.y.z no longer matches model.x.y.z, but this won't affect the tie-addressing pick/place story, as that only acts on the gradient and the model, never on the state tree.

Would such a mechanism work for other things one may want to hook on? What are some examples?

One more question to think about: Unlike #42 this exposes the "address" tuple as something the user is supposed to provide. Do we like or hate it? freeze actually seems very quick, like a few ns on small models, so perhaps there is no need to think about making it faster.

mcabbott · 2022-02-02T15:59:54Z

Related, Flux at present has this distinction:

julia> Flux.functor(Chain(x=sin, y=cos))
((x = sin, y = cos), Flux.var"#154#155"())

julia> Flux.functor(Parallel(vcat, (x=sin, y=cos)))
((connection = vcat, layers = (x = sin, y = cos)), Flux.var"#178#179"())

So to freeze a named branch of Parallel, you'd have to say (:layers, :x). Maybe that should change?

ToucheSir · 2022-02-02T17:15:09Z

Given that Chain will receive a structural tangent of shape (layers = (...)), I feel like it's the custom functor overload which is incorrect. Relatedly, it also has the nasty side effect of not preserving layer names:

julia> func, re = Flux.functor(Chain(x=sin, y=cos))
((x = sin, y = cos), Flux.var"#154#155"())

julia> re(func)
Chain(sin, cos)

julia> re(func).layers
(sin, cos)

mcabbott · 2022-02-02T17:52:47Z

Oh that's bad. We should either not hide this layers tuple (besides in printing), or else we should do it consistently.

mcabbott · 2022-02-02T21:51:24Z

There's a splat which loses the names. But the functor method does in fact deal with the structural gradient:

julia> m = Chain(Dense([1 2; 3 4.0], [5,6], relu), identity);

julia> g = gradient(m -> m([3,2])[1], m)[1]
(layers = ((weight = [3.0 2.0; 0.0 0.0], bias = [1.0, 0.0], σ = nothing), nothing),)

julia> s = Optimisers.setup(Optimisers.Descent(pi/10), m)
((weight = Leaf(Descent{Float64}(0.314159), nothing), bias = Leaf(Descent{Float64}(0.314159), nothing), σ = nothing), nothing)

julia> s2, m2 = Optimisers.update(s, m, g);

julia> m2.layers[1].weight
2×2 Matrix{Float64}:
 0.0575222  1.37168
 3.0        4.0

Notice, aside, this way to screw up, which runs without error:

julia> s2, m2 = Optimisers.update(m, g);

julia> m2
(layers = ((weight = [3.0 2.0; 0.0 0.0], bias = [1.0, 0.0], σ = nothing), nothing),)

ToucheSir · 2022-08-02T23:10:50Z

Thoughts on reviving this without the addressing functionality so we can defer that decision? Users can always use Accessors.jl in the interim (and possibly in the long term, if we want to support that) for fine-grained manipulation.

per-leaf freezing

c70a6da

mcabbott mentioned this pull request Feb 1, 2022

Handle tied weights in update! #42

Closed

tidy up

f5ced31

mcabbott force-pushed the freeze branch from dd8d1a0 to f5ced31 Compare February 2, 2022 05:22

mcabbott mentioned this pull request Feb 4, 2022

Chain forgets names under fmap FluxML/Flux.jl#1857

Closed

darsnack mentioned this pull request Apr 6, 2022

Freezing layers at model construction time FluxML/Flux.jl#1931

Open

mcabbott added the enhancement New feature or request label Jun 7, 2022

avik-pal mentioned this pull request Jul 30, 2022

How to freeze layers? LuxDL/Lux.jl#111

Closed

mcabbott mentioned this pull request Aug 28, 2022

Frozen parameters #107

Open

mcabbott mentioned this pull request Oct 13, 2022

Add freeze!/thaw! #112

Merged

2 tasks

mcabbott closed this Nov 25, 2022

mcabbott deleted the freeze branch November 25, 2022 06:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Per-leaf freezing #49

Per-leaf freezing #49

mcabbott commented Feb 1, 2022

darsnack commented Feb 1, 2022

mcabbott commented Feb 1, 2022 •

edited

Loading

mcabbott commented Feb 2, 2022

ToucheSir commented Feb 2, 2022

mcabbott commented Feb 2, 2022

mcabbott commented Feb 2, 2022

ToucheSir commented Aug 2, 2022

Per-leaf freezing #49

Per-leaf freezing #49

Conversation

mcabbott commented Feb 1, 2022

darsnack commented Feb 1, 2022

mcabbott commented Feb 1, 2022 • edited Loading

mcabbott commented Feb 2, 2022

ToucheSir commented Feb 2, 2022

mcabbott commented Feb 2, 2022

mcabbott commented Feb 2, 2022

ToucheSir commented Aug 2, 2022

mcabbott commented Feb 1, 2022 •

edited

Loading