add `trainstep!` #666

oxinabox · 2019-03-06T18:45:34Z

Following up from #607

We should expose functionality, that lets the user write a training loop,
while thinking only about loss.
Rather than thinking about gradient and update!.
loss is a higher level concept than gradients.

Custom traing loops are important since many things do not coomfortably fit into the abstraction of
train!(args->loss(args...), data, params, opt,callbacks).
The train! functioin is good for things that comfortably fit supervised training,
and while it can do anything it becomes increasing akward the further you are from that.
At the other end is writing a custom training loop, invoking gradients and update.
This is fully general and you can do all kinds of things l like messing with the gradients during the training loop.
But there is a middle ground,
where you can define the loss, but you have nothing to say about the gradients.

For this I think we should have

train_step!(getloss, ps, opt), where getloss is a 0 arg closure returning the loss (and using the model).
This wouldwould have pleasing symetry in name and arguments,
to train!(loss, ps, data, opt) where loss is a closure taking args as provided by iterating data.

This would be useful because you are not being required to use the abstraction of having data, but you have the rest.

The implementation would be very simple, but I feel the abstraction away from gradients is worth it.

function train_step!(getloss, ps, opt)
    gs = gradient(getloss, ps)
    update!(opt, ps, gs)
end

This would go into the core of train!.
Replacing

Flux.jl/src/optimise/train.jl

Lines 71 to 74 in 3a4c627

    
           gs = gradient(ps) do 
        
             loss(d...) 
        
           end 
        
           update!(opt, ps, gs)

with

trainstep!(x->loss(d...), ps, opt)

The text was updated successfully, but these errors were encountered:

MikeInnes · 2019-03-07T19:58:20Z

I'm on board. It will make sense to have this as the default interface for gradients if we go full force on #628, since the gradient object will be a little more complex than usual.

I think this should be written step!(opt) do ... (it's kind of method of the optimiser). It may as well also return the loss.

Initially it will have to be step!(opt, ps) do ..., but we can remove the ps with #628 (and also reorder train! arguments to be more consistent with this).

@oxinabox up for sketching it out?

oxinabox · 2019-03-07T21:37:57Z

Do you really think the train! needs to have it's argument order changed too?
I like the consistency,
But it #628 will remove ps,
then reorder args will mean just an extra round of depreciations for little gain.

MikeInnes · 2019-03-08T01:47:41Z

Doesn't have to be an extra round – we can do train!(loss, ps, data, opt) -> train!(loss, opt, data) in one go, separately from implementing step!.

MikeInnes · 2019-03-10T12:27:12Z

I think this actually has to step!(loss, opt, x...) where x is whatever training data you have. The reason is that if we move to implicit parameters, if you write

step!(opt) do 
  mse(W*X .+ b, Y)
end

or similar then you're going to be surprised when Flux improves your loss by optimising the training data. The other option is to write dropgrad(X) but, meh. It seems pretty consistent to say: closed-over variables are parameters, formal arguments and constants are not.

You can also view this as being like the gradient(f, x) interface but instead of getting dx and ignoring f, we actually get df and ignore x. This seems weird because it's exactly opposite to every other framework and AD tool in existence, but I think it's necessary if we want to shift the focus to optimising programs like f rather than values like x.

I will sketch this out, along with #379, in #669, and also write up some simple usage examples to give a feel for it.

oxinabox · 2019-03-10T13:23:17Z

One thing I have been thinking about, is should multiple return values be allowed?

Because it is often more convenient to calculate certain other things while you are calculating the loss. E.g. some metrics perhaps used in early stopping.

Right now for those I have just been modifing a vector in the parent scope.
But an alternative is that step! returns whatever getloss returns, but only optimizers first of that.

I think the implict parameter thing makes sense. This is distinct from train! as step! does not iterator it's arguments.
This is one minibatch, or ors online or ...
Infact it knows nothing about them at all. Using them is purly the responsibility of the getloss function.

MikeInnes · 2019-03-10T13:40:45Z

In effect train! would become a thin wrapper around step! that replaces the single input to loss with an iterator of them. I think seeing it as a slight generalisation of step! is a nice way to look at it.

As far as calculating other values goes, I think it's closures FTW here:

total_loss = 0
for ...
  step!(opt) do
    total_loss += loss(...)
  end
end

Of course if we return the loss from step! then the example doesn't need to be written that way, but generally you could accumulate or inspect any intermediate result this way.

oxinabox · 2019-03-10T14:05:53Z

Closures like that is what i have been doing so far.
I might play around with my current code and see what it looks like without closures.

Hmm when writing it up might be good to highlight that

step!(opt, X, y) do X, y
   # code
   ...
end

Can be written

step!(opt) do
   X=dropgrad(X)
   Y=dropgrad(y)
   # code
   ...
end

oxinabox · 2019-07-30T02:59:20Z

Just a example I started to write for other reasons:

callback = let
    prev_dev_loss=Inf
    function()
        dev_loss = loss(mdl(X_dev), Y_dev)
        @info(dev_loss)
        dev_loss > prev_dev_loss && throw(Stop())
        prev_dev_loss = dev_loss
    end
end

train!(loss, repeat((X_train, Y_train), 1000), opt, params(mdl), cb=callback)

One would write:

prev_dev_loss=Inf
ps = params(mdl)
for epoch in 1:1000
    train_loss = step!(opt, ps) do
        loss(mdl(X_train), Y_train)
    end

    dev_loss = loss(mdl(X_dev), Y_dev)
    @info(dev_loss)
    dev_loss > prev_dev_loss && break   # Early stopping
    prev_dev_loss = dev_loss
end

CarloLucibello · 2020-02-07T07:52:54Z

My 2 cents: both the current train abstraction and the step! one proposed here add just a little more conciseness while making the code more obscure compared to the "unrolled" train documented in #994

MikeInnes added the discussion label Mar 10, 2019

oxinabox mentioned this issue Apr 2, 2019

Configuring Multiple Optmizers for different parts of the network #724

Open

MikeInnes mentioned this issue Apr 4, 2019

using Zygote #669

Merged

9 tasks

oxinabox mentioned this issue Jul 30, 2019

Optimizer handling of infinite loss #821

Open

oxinabox mentioned this issue Jan 27, 2020

Add custom training loops to docs #994

Merged

DhairyaLGandhi mentioned this issue Feb 3, 2020

Refactor train! #1017

Open

johnnychen94 mentioned this issue Mar 1, 2020

Integrate epochs within Flux.train! #1058

Open

Roger-luo mentioned this issue Mar 6, 2020

WIP: Make optimize work on structs #1073

Closed

FelixBenning mentioned this issue Jan 13, 2022

add step! #1833

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add `trainstep!` #666

add `trainstep!` #666

oxinabox commented Mar 6, 2019

MikeInnes commented Mar 7, 2019

oxinabox commented Mar 7, 2019

MikeInnes commented Mar 8, 2019

MikeInnes commented Mar 10, 2019 •

edited

Loading

oxinabox commented Mar 10, 2019

MikeInnes commented Mar 10, 2019

oxinabox commented Mar 10, 2019 •

edited

Loading

oxinabox commented Jul 30, 2019

CarloLucibello commented Feb 7, 2020 •

edited

Loading

add trainstep! #666

add trainstep! #666

Comments

oxinabox commented Mar 6, 2019

MikeInnes commented Mar 7, 2019

oxinabox commented Mar 7, 2019

MikeInnes commented Mar 8, 2019

MikeInnes commented Mar 10, 2019 • edited Loading

oxinabox commented Mar 10, 2019

MikeInnes commented Mar 10, 2019

oxinabox commented Mar 10, 2019 • edited Loading

oxinabox commented Jul 30, 2019

CarloLucibello commented Feb 7, 2020 • edited Loading

add `trainstep!` #666

add `trainstep!` #666

MikeInnes commented Mar 10, 2019 •

edited

Loading

oxinabox commented Mar 10, 2019 •

edited

Loading

CarloLucibello commented Feb 7, 2020 •

edited

Loading