This is a testing ground for some extensions related to Adam (Adaptive Moment Estimation) using Julia. MadamOpt.jl was born out of a need for gradient-free online optimization.
Note that while this library could be used to train deep models, that is not
its chief design goal. Nevertheless, an example of using MadamOpt with
FluxML is included in the examples
directory (the library supports
GPU acceleration / CUDA when a gradient is provided).
The extensions currently implemented by the library are:
- L1 regularization via ISTA (Iterative Shrinkage-Thresholding Algorithm).
- Gradient-free optimization via a discrete approximation of the gradient using a subset of model parameters at each iteration (suitable for small to medium-sized models).
- A technique loosely based on simulated annealing for estimating non-convex functions without using a gradient.
In the standard Adam, the scaling of the gradient prevents the tresholding from
affecting only relatively insignificant features (i.e. dividing the mean
gradient by square root of the uncentered variance results in a term that
multiplies Adam's alpha term by a value between -1.0 and 1.0, modulo
differences in their decay rates). Therefore, the step size is further scaled
by log(1+abs(gradient))
.
See the unit test for examples on fitting a 100-dimensional non-convex Ackley function, a sparse 500x250 matrix, and the Rosenbrock function.
For an API overview, see the docs, unit tests, and examples.