-
-
Notifications
You must be signed in to change notification settings - Fork 612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add trainstep!
#666
Comments
I'm on board. It will make sense to have this as the default interface for gradients if we go full force on #628, since the gradient object will be a little more complex than usual. I think this should be written Initially it will have to be @oxinabox up for sketching it out? |
Do you really think the |
Doesn't have to be an extra round – we can do |
I think this actually has to step!(opt) do
mse(W*X .+ b, Y)
end or similar then you're going to be surprised when Flux improves your loss by optimising the training data. The other option is to write You can also view this as being like the I will sketch this out, along with #379, in #669, and also write up some simple usage examples to give a feel for it. |
One thing I have been thinking about, is should multiple return values be allowed? Because it is often more convenient to calculate certain other things while you are calculating the loss. E.g. some metrics perhaps used in early stopping. Right now for those I have just been modifing a vector in the parent scope. I think the implict parameter thing makes sense. This is distinct from |
In effect As far as calculating other values goes, I think it's closures FTW here: total_loss = 0
for ...
step!(opt) do
total_loss += loss(...)
end
end Of course if we return the loss from |
Closures like that is what i have been doing so far. Hmm when writing it up might be good to highlight that
Can be written
|
Just a example I started to write for other reasons:
One would write:
|
My 2 cents: both the current |
Following up from #607
We should expose functionality, that lets the user write a training loop,
while thinking only about loss.
Rather than thinking about
gradient
andupdate!
.loss
is a higher level concept than gradients.Custom traing loops are important since many things do not coomfortably fit into the abstraction of
train!(args->loss(args...), data, params, opt,callbacks)
.The
train!
functioin is good for things that comfortably fit supervised training,and while it can do anything it becomes increasing akward the further you are from that.
At the other end is writing a custom training loop, invoking
gradients
andupdate
.This is fully general and you can do all kinds of things l like messing with the gradients during the training loop.
But there is a middle ground,
where you can define the loss, but you have nothing to say about the gradients.
For this I think we should have
train_step!(getloss, ps, opt)
, wheregetloss
is a 0 arg closure returning the loss (and using the model).This wouldwould have pleasing symetry in name and arguments,
to
train!(loss, ps, data, opt)
whereloss
is a closure taking args as provided by iteratingdata
.This would be useful because you are not being required to use the abstraction of having
data
, but you have the rest.The implementation would be very simple, but I feel the abstraction away from
gradients
is worth it.This would go into the core of
train!
.Replacing
Flux.jl/src/optimise/train.jl
Lines 71 to 74 in 3a4c627
with
The text was updated successfully, but these errors were encountered: