Use Optimisers.jl #1481

DhairyaLGandhi · 2021-01-27T08:16:40Z

Fixes #637, fixes #823.

DhairyaLGandhi · 2021-01-28T09:55:04Z

DhairyaLGandhi · 2021-01-28T09:58:42Z

ToucheSir · 2021-01-29T00:01:52Z

src/optimise/train.jl

+function train!(m, loss, data, opt; cb = (x...) -> ()
+                                    prehook = (x...) -> (),
+                                    posthook = (x...) -> ())
+  st = [Optimisers.init(opt, p) for p in Flux.params(m)]


I think this can be reduced to st = Optimisers.state(opt, m).

I think this should be defined as a function in the optimiser.jl file. One for state(opt, p::Params) and another for update(opt, x::Params, dx, state).

darsnack · 2021-02-02T15:16:14Z

I would prefer we remove the hooks from Flux.train! in this PR and save those for a later commit. This PR can simply remove the optimizers in favor of Optimisers.jl.

DhairyaLGandhi · 2021-02-11T10:44:13Z

Done a bunch of cleanup, including removing the train loop changes, but I feel like we would circle back to it soon

darsnack · 2021-02-13T04:13:38Z

src/optimise/train.jl

@@ -97,12 +102,13 @@ Multiple optimisers and callbacks can be passed to `opt` and `cb` as arrays.
 function train!(loss, ps, data, opt; cb = () -> ())
  ps = Params(ps)
  cb = runall(cb)
+  st = [Optimisers.init(opt, p) for p in ps]


Could we define init for Params like we do update?

I'm thinking if this should be wholesale replaced with Optimisers.state instead

I'm only worried about corner cases that this might hit

I think state makes sense. For every p in ps, we should be able to independently optimize each one by calling the user facing state and update. I think it's safe to do that here.

Right, so that works in cases where we have arrays as params, but not arbitrary structs, which is what we want.

The user facing API is something like (pseudocode; although it might work in this MWE)

m = Dense(3,3) opt = ADAM() st = Optimisers.state(opt, m) # `m` could contain arbitrary structs which shouldn't be functor'd loss(m, x, y) = Flux.mse(m(x), y) for i = 1:1000 gs, = gradient(m) do m @show loss(m, w, w′) end m, st = opt(m, gs, st) end

That's exactly what I'm suggesting. To be more specific, change the code to:

Suggested change

st = [Optimisers.init(opt, p) for p in ps]

st = state(opt, ps)

And elsewhere (still in Flux.jl), define:

Optimisers.init(o, ps::Params) = [init(o, p) for p in ps]

Should allow for the future case where m is not a Params but also the current case where we need to support Params.

darsnack

Looking pretty good. Two ML calls ago, we talked about having apply! also defined in Optimisers.jl as a temporary measure to avoid potential performance regressions. Is that still the plan?

docs/src/training/optimisers.md

darsnack · 2021-03-11T18:57:14Z

src/optimise/train.jl

-  x .-= apply!(opt, x, x̄)
+function update!(opt, x, x̄, st)
+  x̄, st = apply(opt, x, x̄, st)
+  x .-= x̄


Should we use Optimisers.patch here for consistency?

src/optimise/train.jl

darsnack · 2021-03-11T18:58:32Z

src/optimise/train.jl

@@ -97,12 +104,13 @@ Multiple optimisers and callbacks can be passed to `opt` and `cb` as arrays.
 function train!(loss, ps, data, opt; cb = () -> ())
  ps = Params(ps)
  cb = runall(cb)
+  st = Optimisers.init(opt, ps)


Suggested change

st = Optimisers.init(opt, ps)

st = Optimisers.state(opt, ps)

DhairyaLGandhi · 2021-03-11T19:36:06Z

See FluxML/Optimisers.jl#13

darsnack · 2021-03-11T20:52:48Z

Cool, looks good. Should we be calling Optimisers.update here then? Calling apply directly bypasses the mutability check.

darsnack · 2021-06-17T15:18:36Z

Ref #1613 (comment)

In my mind, this PR is basically good to go. What I would suggest is establishing a optimizers.jl branch off master (similar to how we had a zygote branch for that transition). Then we can safely merge this into the optimisers.jl branch without much concern, and everyone can start the process of benchmarking/validating the transition.

DhairyaLGandhi · 2021-06-17T17:02:12Z

That branch exists already - it's this pr! I think we're kind of tied to merge the zeros first, then this and boom.

darsnack · 2021-06-17T17:05:53Z

Why does this depend on Zeros?

If this is the branch, then I guess let's rebase it?

DhairyaLGandhi · 2021-07-11T07:37:19Z

We can rebase. Without the zeros, state initialisation and optimisation doesn't work. In order to define methods in init, one would have to do it manually.

darsnack · 2021-07-11T12:42:36Z

I'm not sure what you mean. Why is this not enough

Optimisers.init(o, ps::Params) = [Optimisers.init(o, p) for p in ps]

DhairyaLGandhi · 2021-07-11T12:53:38Z

We also have a route that does not require the use of Params.

darsnack · 2021-07-11T13:26:36Z

You mean where we allow passing in [W, b]? Wouldn't that route be handled by Optimisers.jl already?

Co-authored-by: Kyle Daruwalla <daruwalla.k.public@icloud.com>

Dhairya Gandhi added 5 commits January 25, 2021 01:35

add Optimisers

de65314

add Optimisers to env

14abfa9

go through some tests

07d647c

general runall

0328aab

pkg up

91a1de0

ToucheSir reviewed Jan 29, 2021

View reviewed changes

darsnack mentioned this pull request Jan 29, 2021

Move to Optimisers.jl and add schedules #1487

Closed

3 tasks

DhairyaLGandhi mentioned this pull request Jan 29, 2021

Move all optimizers to Optimisers.jl FluxML/Optimisers.jl#9

Merged

2 tasks

Dhairya Gandhi added 16 commits February 3, 2021 18:23

rm optimisers

fabd6eb

fix exports

6931438

use immutable apply in update

2ae9b72

typo

2425ffb

add doc references

50a0579

fixes

312643f

rm train changes

d4f27a7

allow explicit state

4c5d669

git fixes

319d7c2

pkg up + compat

28e6a7e

rm import

857f4db

fix exports

06c6599

updates to tests

3012f87

dirty name hack

952f65c

add return state from update step

6185e2e

disable some currently unsupported cases

a56d2d3

darsnack mentioned this pull request Feb 12, 2021

Add basic scheduling policies and a scheduler #1506

Open

4 tasks

darsnack reviewed Feb 13, 2021

View reviewed changes

DhairyaLGandhi mentioned this pull request Feb 14, 2021

Zeros has old behaviour on releases up to 0.11.6 #1507

Closed

Dhairya Gandhi added 3 commits February 24, 2021 19:34

add Optimisers to imports

447b8a4

qualify optimisers

a41c6e1

define init(o, params)

da31efd

DhairyaLGandhi marked this pull request as ready for review March 8, 2021 14:00

darsnack reviewed Mar 11, 2021

View reviewed changes

ToucheSir mentioned this pull request Nov 7, 2021

Doc update (saving.md): removed outdated info; Typo fix. #1762

Merged

Apply suggestions from code review

ff17b0b

Co-authored-by: Kyle Daruwalla <daruwalla.k.public@icloud.com>

mcabbott mentioned this pull request Feb 5, 2022

Depend on Optimisers.jl #1864

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Optimisers.jl #1481

Use Optimisers.jl #1481

DhairyaLGandhi commented Jan 27, 2021 •

edited by darsnack

Loading

DhairyaLGandhi commented Jan 28, 2021

DhairyaLGandhi commented Jan 28, 2021

ToucheSir Jan 29, 2021

darsnack Feb 2, 2021

darsnack commented Feb 2, 2021 •

edited

Loading

DhairyaLGandhi commented Feb 11, 2021

darsnack Feb 13, 2021

DhairyaLGandhi Feb 24, 2021

DhairyaLGandhi Feb 24, 2021

darsnack Feb 24, 2021 •

edited

Loading

DhairyaLGandhi Feb 24, 2021

darsnack Feb 24, 2021 •

edited

Loading

darsnack left a comment

darsnack Mar 11, 2021

darsnack Mar 11, 2021

DhairyaLGandhi commented Mar 11, 2021 •

edited

Loading

darsnack commented Mar 11, 2021

darsnack commented Jun 17, 2021

DhairyaLGandhi commented Jun 17, 2021

darsnack commented Jun 17, 2021

DhairyaLGandhi commented Jul 11, 2021

darsnack commented Jul 11, 2021 •

edited

Loading

DhairyaLGandhi commented Jul 11, 2021

darsnack commented Jul 11, 2021

	st = [Optimisers.init(opt, p) for p in ps]
	st = state(opt, ps)

Use Optimisers.jl #1481

Are you sure you want to change the base?

Use Optimisers.jl #1481

Conversation

DhairyaLGandhi commented Jan 27, 2021 • edited by darsnack Loading

DhairyaLGandhi commented Jan 28, 2021

DhairyaLGandhi commented Jan 28, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

darsnack commented Feb 2, 2021 • edited Loading

DhairyaLGandhi commented Feb 11, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

darsnack Feb 24, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

darsnack Feb 24, 2021 • edited Loading

Choose a reason for hiding this comment

darsnack left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DhairyaLGandhi commented Mar 11, 2021 • edited Loading

darsnack commented Mar 11, 2021

darsnack commented Jun 17, 2021

DhairyaLGandhi commented Jun 17, 2021

darsnack commented Jun 17, 2021

DhairyaLGandhi commented Jul 11, 2021

darsnack commented Jul 11, 2021 • edited Loading

DhairyaLGandhi commented Jul 11, 2021

darsnack commented Jul 11, 2021

DhairyaLGandhi commented Jan 27, 2021 •

edited by darsnack

Loading

darsnack commented Feb 2, 2021 •

edited

Loading

darsnack Feb 24, 2021 •

edited

Loading

darsnack Feb 24, 2021 •

edited

Loading

DhairyaLGandhi commented Mar 11, 2021 •

edited

Loading

darsnack commented Jul 11, 2021 •

edited

Loading