v0.15.0
Flux v0.15.0
Highlights
This release includes two breaking changes:
- The recurrent layers have been thoroughly revised. See below and read the documentation for details.
- Flux now defines and exports its own gradient function. Consequently, using gradient in an unqualified manner (e.g., after
using Flux, Zygote
) could result in an ambiguity error.
The most significant updates and deprecations are as follows:
- Recurrent layers have undergone a complete redesign in PR 2500.
RNNCell
,LSTMCell
, andGRUCell
are now exported and provide functionality for single time-step processing:rnncell(x_t, h_t) -> h_{t+1}
.RNN
,LSTM
, andGRU
no longer store the hidden state internally, it has to be explicitely passed to the layer. Moreover, they now process entire sequences at once, rather than one element at a time:rnn(x, h) -> h′
.- The
Recur
wrapper has been deprecated and removed. - The
reset!
function has also been removed; state management is now entirely up to the user.
- The
Flux.Optimise
module has been deprecated in favor of the Optimisers.jl package.
Now Flux re-exports the optimisers from Optimisers.jl. Most users will be uneffected by this change.
The module is still available for now, but will be removed in a future release. - Most Flux layers will re-use memory via
NNlib.bias_act!
, when possible. - Further support for Enzyme.jl, via methods of
Flux.gradient(loss, Duplicated(model))
.
Flux now owns & exportsgradient
andwithgradient
, but withoutDuplicated
this still defaults to calling Zygote.jl. Flux.params
has been deprecated. Use Zygote's explicit differentiation instead,
gradient(m -> loss(m, x, y), model)
, or useFlux.trainables(model)
to get the trainable parameters.- Flux now requires Functors.jl v0.5. This new release of Functors assumes all types to be functors by default. Therefore, applying
Flux.@layer
orFunctors.@functor
to a type is no longer strictly necessary for Flux's models. However, it is still recommended to use@layer Model
for additional functionality like pretty printing. @layer Model
now behaves the same as@layer :expand Model
, which means that the model is expanded into its sublayers (if there are any) when printed. To force compact printing, use@layer :noexpand Model
.
Merged pull requests:
- Use
NNlib.bias_act!
(#2327) (@mcabbott) - Allow
Parallel(+, f)(x, y, z)
to work like broadcasting, and enableChain(identity, Parallel(+, f))(x, y, z)
(#2393) (@mcabbott) - Epsilon change in normalise for stability (#2421) (@billera)
- Add more
Duplicated
methods for Enzyme.jl support (#2471) (@mcabbott) - Export Optimisers and remove params and Optimise from tests (#2495) (@CarloLucibello)
- RNNs redesign (#2500) (@CarloLucibello)
- Adjust docs &
Flux.@functor
for Functors.jl v0.5, plus misc. depwarns (#2509) (@mcabbott) - GPU docs (#2510) (@mcabbott)
- CompatHelper: bump compat for Optimisers to 0.4, (keep existing compat) (#2520) (@github-actions[bot])
- Distinct init for kernel and recurrent (#2522) (@MartinuzziFrancesco)
- Functors v0.5 + tighter version bounds (#2525) (@CarloLucibello)
- deprecation of params and Optimise (continued) (#2526) (@CarloLucibello)
- Bump codecov/codecov-action from 4 to 5 (#2527) (@dependabot[bot])
- updates for Functors v0.5 (#2528) (@CarloLucibello)
- fix comment (#2529) (@oscardssmith)
- set expand option as default for
@layer
(#2532) (@CarloLucibello) - misc stuff for v0.15 release (#2534) (@CarloLucibello)
- Tweak quickstart.md (#2536) (@mcabbott)
- Remove usage of global variables in linear and logistic regression tutorial training functions (#2537) (@christiangnrd)
- Fix linear regression example (#2538) (@christiangnrd)
- Update gpu.md (#2539) (@AdamWysokinski)
Closed issues:
- RNN layer to skip certain time steps (like
Masking
layer in keras) (#644) - Backprop through time (#648)
- Initial state in RNNs should not be learnable by default (#807)
- Bad recurrent layers training performance (#980)
- flip function assumes the input sequence is a Vector or List, it can be Matrix as well. (#1042)
- Regression in package load time (#1155)
- Recurrent layers can't use Zeros() as bias (#1279)
- Flux.destructure doesn't preserve RNN state (#1329)
- RNN design for efficient CUDNN usage (#1365)
- Strange result with gradient (#1547)
- Call of Flux.stack results in StackOverfloxError for approx. 6000 sequence elements of a model output of a LSTM (#1585)
- Gradient dimension mismatch error when training rnns (#1891)
- Deprecate Flux.Optimisers and implicit parameters in favour of Optimisers.jl and explicit parameters (#1986)
- Pull request #2007 causes Flux.params() calls to not get cached (#2040)
- gradient of
Flux.normalise
return NaN whenstd
is zero (#2096) - explicit differentiation for RNN gives wrong results (#2185)
- Make RNNs blocked (and maybe fixing gradients along the way) (#2258)
- Should everything be a functor by default? (#2269)
- Flux new explicit API does not work but old implicit API works for a simple RNN (#2341)
- Adding Simple Recurrent Unit as a recurrent layer (#2408)
- deprecate Flux.params (#2413)
- Implementation of
AdamW
differs from PyTorch (#2433) gpu
should warn if cuDNN is not installed (#2440)- device movement behavior inconsistent (#2513)
- mark as public any non-exported but documented interface (#2518)
- broken image in the quickstart (#2530)
- Consider making the
:expand
option the default in@layer
(#2531) Flux.params
is broken (#2533)