Optimizer module overhaul #396

iblislin · 2017-12-31T18:19:10Z

will sort out summary later...

* Before ```julia NadamScheduler(; mu0 = 0.99, delta = 0.004, gamma = 0.5, alpha = 0.96) ``` * After ```julia NadamScheduler(; μ₀ = 0.99, δ = 0.004, γ = 0.5, α = 0.96) ```

Blocker: #394 renames: * `get_momentum` -> `getmomentum` * `get_momentum_scheduler` -> `getmomsched` * `Momentum.Fixed.momentum` -> `Momentum.Fixed.μ`

Decouple learning rate update from OptimizationState, give more control for user to trigger update via `update!`

Let user can control it. And provide default value `1/batch_size` in high-level API `fit!` only.

[ci skip]

iblislin · 2018-01-01T15:35:05Z

Ready for review.
I list changes in NEWS.md:
https://github.com/dmlc/MXNet.jl/pull/396/files#diff-8312ad0561ef661716b48d09478362f3R263

The motivation of this PR is building more elegant APIs than Python (thanks to good Unicode support of Julia's REPL and editor plugin),
and I'm paving the way for porting Python's Gluon.

pluskid · 2018-01-07T23:01:06Z

Thanks for the efforts.

I'm OK with most of the renaming to unicodes.
I'm a bit against using \eta for learning rate, \mu for momentum and \lambda for weight decay. Although it seems to be commonly used, but some papers use different symbols and could be a bit confusing. But those are still OK if you insist.
I'm strictly against using ∇c for gradient clip and ∇r for gradient rescaling. Those are just super confusing. Because ∇ typically means taking gradient with respect to something or similar things.

iblislin · 2018-01-08T00:47:22Z

I'm a bit against using \eta for learning rate, \mu for momentum and \lambda for weight decay. Although it seems to be commonly used, but some papers use different symbols and could be a bit confusing. But those are still OK if you insist.

I'm strictly against using ∇c for gradient clip and ∇r for gradient rescaling. Those are just super confusing. Because ∇ typically means taking gradient with respect to something or similar things.

I agree the your point of view on gradient clipping and rescaling.

I want to hear what naming you want. We can list all of them here, then vote.

pluskid · 2018-01-08T01:09:52Z

Unfortunately, I don't have better naming suggestion apart from the more verbose grad_clip and grad_scale.

iblislin · 2018-01-08T01:16:05Z

Add another options:

clip
rescale or scale

iblislin · 2018-01-13T08:54:01Z

Is it okay to omit the grad_ prefix, given that optimizor is related to gradient stuff?

iblislin · 2018-01-19T01:13:19Z

I did the renaming, please check it out.

iblislin · 2018-01-22T06:02:27Z

good to go?

pluskid · 2018-01-22T17:49:23Z

Sorry for the late reply. I still prefer with grad_ prefix, but I do not have strong objection to the current naming as it is the options for optimizers as you mentioned (and we probably are not going to have scales for momentum and other quantities). Please feel free to merge it.

iblislin · 2018-01-23T03:23:48Z

Thanks a lot.

About the grad_ prefix, there is a way to add aliases in Julia 0.7 (JuliaLang/julia#24960). Once we drop the support of Julia 0.6, I can add the prefix as aliases.

pluskid · 2018-01-23T17:22:56Z

Alias sounds good!

iblislin added 25 commits December 30, 2017 15:38

opt: rename keyword args of NadamScheduler

ab9aaf4

* Before ```julia NadamScheduler(; mu0 = 0.99, delta = 0.004, gamma = 0.5, alpha = 0.96) ``` * After ```julia NadamScheduler(; μ₀ = 0.99, δ = 0.004, γ = 0.5, α = 0.96) ```

fix mu0

fbbc005

constructer

64daac2

opt: decouple OptimizationState from get_momentum and renames

ea82fde

Blocker: #394 renames: * `get_momentum` -> `getmomentum` * `get_momentum_scheduler` -> `getmomsched` * `Momentum.Fixed.momentum` -> `Momentum.Fixed.μ`

fix sigature

8aa1ce0

test: add test cases for vanilla SGD

58c36c2

Merge branch 'master' into ib/opt-getmomentum

6fc0b72

rework on SGD

f0f2b1f

rework RMSProp

cf43a50

rework AdaDelta

2c10185

rework AdaGrad

51427ad

rework ADAM

a0508a8

rework AdaMax

92b7acc

rework Nadam

908e4bb

gradient clipping

20c7c39

update regression example

7d25fad

fix math docstring

65c1cbc

internal: rename get_updater to getupdater

481f71e

rename update to update!

57e5333

rework LearningRate module

cdfaa0c

Decouple learning rate update from OptimizationState, give more control for user to trigger update via `update!`

update examples

43b2bc2

make gradient rescaling as optimizer option

9f11d7a

Let user can control it. And provide default value `1/batch_size` in high-level API `fit!` only.

fix Nadam momentum

8861a9b

remove state field from optimizers

6c71fd4

Remove AbstractOptimizerOptions

c545bb5

This was referenced Dec 31, 2017

internal: rename get_updater to getupdater #392

Closed

opt: decouple OptimizationState from get_momentum and renames #395

Closed

opt: rename keyword args of NadamScheduler #394

Closed

iblislin changed the title ~~Optimizer module overhaul~~ WIP: Optimizer module overhaul Dec 31, 2017

docstring of NadamScheduler

2380fbd

[ci skip]

iblislin added 11 commits January 1, 2018 03:53

Update README [ci skip]

8bd8834

Update NEWS [ci skip]

6d5d254

rework of Momentum module

e8db268

comment

ded058b

exports

e16f6a6

LearningRate test cases

2143b11

SGD docstring [ci skip]

7f217ec

SGD docstring [ci skip]

b3a2a0b

nadam docstring

6136c7b

add momentum test cases

748394b

update news

84f7164

iblislin changed the title ~~WIP: Optimizer module overhaul~~ Optimizer module overhaul Jan 1, 2018

iblislin requested a review from pluskid January 1, 2018 15:35

iblislin added 2 commits January 1, 2018 23:36

Update NEWS

eda82ef

Update doc

bea54ab

iblislin added this to the 0.4.0 milestone Jan 7, 2018

rename ∇c -> clip, ∇r -> scale

43f1335

iblislin merged commit 9f4f533 into master Jan 31, 2018

iblislin deleted the ib/opt-rework branch January 31, 2018 03:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimizer module overhaul #396

Optimizer module overhaul #396

iblislin commented Dec 31, 2017

iblislin commented Jan 1, 2018

pluskid commented Jan 7, 2018

iblislin commented Jan 8, 2018

pluskid commented Jan 8, 2018

iblislin commented Jan 8, 2018 •

edited

Loading

iblislin commented Jan 13, 2018

iblislin commented Jan 19, 2018

iblislin commented Jan 22, 2018

pluskid commented Jan 22, 2018

iblislin commented Jan 23, 2018

pluskid commented Jan 23, 2018

Optimizer module overhaul #396

Optimizer module overhaul #396

Conversation

iblislin commented Dec 31, 2017

iblislin commented Jan 1, 2018

pluskid commented Jan 7, 2018

iblislin commented Jan 8, 2018

pluskid commented Jan 8, 2018

iblislin commented Jan 8, 2018 • edited Loading

iblislin commented Jan 13, 2018

iblislin commented Jan 19, 2018

iblislin commented Jan 22, 2018

pluskid commented Jan 22, 2018

iblislin commented Jan 23, 2018

pluskid commented Jan 23, 2018

iblislin commented Jan 8, 2018 •

edited

Loading