Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Per-leaf freezing #49

Closed
wants to merge 2 commits into from
Closed

Per-leaf freezing #49

wants to merge 2 commits into from

Conversation

mcabbott
Copy link
Member

@mcabbott mcabbott commented Feb 1, 2022

This is one way we could handle freezing of certain nodes, by altering the state tree.

julia> model = (x=[1,2], y=([3,4], sin));

julia> state = Optimisers.setup(Descent(), model);

julia> state = Optimisers.freeze(state, :y)
(x = Leaf(Descent{Float32}(0.1), nothing, false), y = (Leaf(Descent{Float32}(0.1), nothing, true), nothing))

julia> state, model = Optimisers.update(state, model, (x=[1,10], y=([100,1000], nothing)));

julia> model
(x = [0.8999999985098839, 0.9999999850988388], y = ([3, 4], sin))

julia> state = Optimisers.thaw(state)
(x = Leaf(Descent{Float32}(0.1), nothing, false), y = (Leaf(Descent{Float32}(0.1), nothing, false), nothing))

julia> state, model = Optimisers.update(state, model, (x=[1,10], y=([100,1000], nothing)));

julia> model
(x = [0.7999999970197678, -2.9802322387695312e-8], y = ([-7.000000149011612, -96.00000149011612], sin))

@darsnack
Copy link
Member

darsnack commented Feb 1, 2022

This is the mental model I had in my mind, but I forgot that we have Leaf now. Between Leaf and Ties (or whatever it is called), we can store node-local and tree-global hints.

One difference this approach vs. separate trees for auxiliary information is that the latter is extensible. For example, if we didn't have freezing built-in, a separate package could define a freeze utility that walks the state and produces a tree with true/false indicating which nodes are frozen. Then with a custom walk, that package could implement a freezing-compatible update.

I'm just using freezing as a hypothetical here to illustrate that a Functors.jl solution with multiple trees could serve both Optimisers.jl use-cases and external use-cases. Of course, it's trickier and more complex than what we have here.

@mcabbott
Copy link
Member Author

mcabbott commented Feb 1, 2022

This is true. I'd worry a little bit that understanding the API for an extensible multi-tree walk might be harder than writing it yourself.

One extensibility thought is: Instead of building this into Leaf, it could be a separate Freeze which wraps it, and provides an update! specialisation which stops it. That's perhaps a more extensible model to provide. It would make a state for which tree.x.y.z no longer matches model.x.y.z, but this won't affect the tie-addressing pick/place story, as that only acts on the gradient and the model, never on the state tree.

Would such a mechanism work for other things one may want to hook on? What are some examples?

One more question to think about: Unlike #42 this exposes the "address" tuple as something the user is supposed to provide. Do we like or hate it? freeze actually seems very quick, like a few ns on small models, so perhaps there is no need to think about making it faster.

@mcabbott
Copy link
Member Author

mcabbott commented Feb 2, 2022

Related, Flux at present has this distinction:

julia> Flux.functor(Chain(x=sin, y=cos))
((x = sin, y = cos), Flux.var"#154#155"())

julia> Flux.functor(Parallel(vcat, (x=sin, y=cos)))
((connection = vcat, layers = (x = sin, y = cos)), Flux.var"#178#179"())

So to freeze a named branch of Parallel, you'd have to say (:layers, :x). Maybe that should change?

@ToucheSir
Copy link
Member

Given that Chain will receive a structural tangent of shape (layers = (...)), I feel like it's the custom functor overload which is incorrect. Relatedly, it also has the nasty side effect of not preserving layer names:

julia> func, re = Flux.functor(Chain(x=sin, y=cos))
((x = sin, y = cos), Flux.var"#154#155"())

julia> re(func)
Chain(sin, cos)

julia> re(func).layers
(sin, cos)

@mcabbott
Copy link
Member Author

mcabbott commented Feb 2, 2022

Oh that's bad. We should either not hide this layers tuple (besides in printing), or else we should do it consistently.

@mcabbott
Copy link
Member Author

mcabbott commented Feb 2, 2022

There's a splat which loses the names. But the functor method does in fact deal with the structural gradient:

julia> m = Chain(Dense([1 2; 3 4.0], [5,6], relu), identity);

julia> g = gradient(m -> m([3,2])[1], m)[1]
(layers = ((weight = [3.0 2.0; 0.0 0.0], bias = [1.0, 0.0], σ = nothing), nothing),)

julia> s = Optimisers.setup(Optimisers.Descent(pi/10), m)
((weight = Leaf(Descent{Float64}(0.314159), nothing), bias = Leaf(Descent{Float64}(0.314159), nothing), σ = nothing), nothing)

julia> s2, m2 = Optimisers.update(s, m, g);

julia> m2.layers[1].weight
2×2 Matrix{Float64}:
 0.0575222  1.37168
 3.0        4.0

Notice, aside, this way to screw up, which runs without error:

julia> s2, m2 = Optimisers.update(m, g);

julia> m2
(layers = ((weight = [3.0 2.0; 0.0 0.0], bias = [1.0, 0.0], σ = nothing), nothing),)

@ToucheSir
Copy link
Member

Thoughts on reviving this without the addressing functionality so we can defer that decision? Users can always use Accessors.jl in the interim (and possibly in the long term, if we want to support that) for fine-grained manipulation.

@mcabbott mcabbott mentioned this pull request Aug 28, 2022
@mcabbott mcabbott mentioned this pull request Oct 13, 2022
2 tasks
@mcabbott mcabbott closed this Nov 25, 2022
@mcabbott mcabbott deleted the freeze branch November 25, 2022 06:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Development

Successfully merging this pull request may close these issues.

3 participants