Skip to content

Commit

Permalink
Small upgrades to training docs (FluxML#2331)
Browse files Browse the repository at this point in the history
  • Loading branch information
mcabbott authored and isentropic committed Mar 13, 2024
1 parent 97fdcd1 commit 5618059
Show file tree
Hide file tree
Showing 2 changed files with 14 additions and 6 deletions.
12 changes: 7 additions & 5 deletions docs/src/training/reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,6 @@ Because of this:
* Flux defines its own version of `setup` which checks this assumption.
(Using instead `Optimisers.setup` will also work, they return the same thing.)

The new implementation of rules such as Adam in the Optimisers is quite different from the old one in `Flux.Optimise`. In Flux 0.14, `Flux.Adam()` returns the old one, with supertype `Flux.Optimise.AbstractOptimiser`, but `setup` will silently translate it to its new counterpart.
The available rules are listed the [optimisation rules](@ref man-optimisers) page here;
see the [Optimisers documentation](https://fluxml.ai/Optimisers.jl/dev/) for details on how the new rules work.

```@docs
Flux.Train.setup
Flux.Train.train!(loss, model, data, state; cb)
Expand Down Expand Up @@ -47,10 +43,16 @@ Flux 0.13 and 0.14 are the transitional versions which support both; Flux 0.15 w
The blue-green boxes in the [training section](@ref man-training) describe
the changes needed to upgrade old code.

The available rules are listed the [optimisation rules](@ref man-optimisers) page here.

!!! compat "Old & new rules"
The new implementation of rules such as Adam in the Optimisers is quite different from the old one in `Flux.Optimise`. In Flux 0.14, `Flux.Adam()` still returns the old one, with supertype `Flux.Optimise.AbstractOptimiser`, but `setup` will silently translate it to its new counterpart.

For full details on the interface for implicit-style optimisers, see the [Flux 0.13.6 manual](https://fluxml.ai/Flux.jl/v0.13.6/training/training/).
See the [Optimisers documentation](https://fluxml.ai/Optimisers.jl/dev/) for details on how the new rules work.

!!! compat "Flux ≤ 0.12"
Earlier versions of Flux exported `params`, thus allowing unqualified `params(model)`
Much earlier versions of Flux exported `params`, thus allowing unqualified `params(model)`
after `using Flux`. This conflicted with too many other packages, and was removed in Flux 0.13.
If you get an error `UndefVarError: params not defined`, this probably means that you are
following code for Flux 0.12 or earlier on a more recent version.
Expand Down
8 changes: 7 additions & 1 deletion docs/src/training/training.md
Original file line number Diff line number Diff line change
Expand Up @@ -225,6 +225,9 @@ callback API. Here is an example, in which it may be helpful to note:
returns the value of the function, for logging or diagnostic use.
* Logging or printing is best done outside of the `gradient` call,
as there is no need to differentiate these commands.
* To use `result` for logging purposes, you could change the `do` block to end with
`return my_loss(result, label), result`, i.e. make the function passed to `withgradient`
return a tuple. The first element is always the loss.
* Julia's `break` and `continue` keywords let you exit from parts of the loop.

```julia
Expand Down Expand Up @@ -319,9 +322,12 @@ The first, [`WeightDecay`](@ref Flux.WeightDecay) adds `0.42` times original par
matching the gradient of the penalty above (with the same, unrealistically large, constant).
After that, in either case, [`Adam`](@ref Flux.Adam) computes the final update.

The same trick works for *L₁ regularisation* (also called Lasso), where the penalty is
`pen_l1(x::AbstractArray) = sum(abs, x)` instead. This is implemented by `SignDecay(0.42)`.

The same `OptimiserChain` mechanism can be used for other purposes, such as gradient clipping with [`ClipGrad`](@ref Flux.Optimise.ClipValue) or [`ClipNorm`](@ref Flux.Optimise.ClipNorm).

Besides L2 / weight decay, another common and quite different kind of regularisation is
Besides L1 / L2 / weight decay, another common and quite different kind of regularisation is
provided by the [`Dropout`](@ref Flux.Dropout) layer. This turns off some outputs of the
previous layer during training.
It should switch automatically, but see [`trainmode!`](@ref Flux.trainmode!) / [`testmode!`](@ref Flux.testmode!) to manually enable or disable this layer.
Expand Down

0 comments on commit 5618059

Please sign in to comment.