Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Passing scale to Optimizer changes gradient clipping threshold #641

Closed
alexandres opened this issue Jun 29, 2017 · 2 comments
Closed

Passing scale to Optimizer changes gradient clipping threshold #641

alexandres opened this issue Jun 29, 2017 · 2 comments
Labels
moderate bug Issues that should be fixed but only affect less common environments or functionality

Comments

@alexandres
Copy link

Trainers accept an initial learning and a decay parameter. To control the learning rate dynamically without using the built-in decay, one can pass a scale parameter to the trainer.update(scale) function. Unfortunately, this directly affects gradient clipping:

From https://github.com/clab/dynet/blob/master/dynet/training.cc#L68:

if (scale * gg > clip_threshold) {                              
   ++clips;                                                      
   ++clips_since_status;                                         
   gscale = clip_threshold / (scale * gg);                       
}  

Using trainer.update(scale) to manage the learning rate, clip_threshold is effectively divided by scale. Though there might be a reason for this, I was surprised by it. To get the same results as I would with TensorFlow, I have to compensante for this by calling trainer.set_clip_threshold(clip_threshold * scale) before calling update(scale).

@neubig neubig added the moderate bug Issues that should be fixed but only affect less common environments or functionality label Jun 30, 2017
@neubig
Copy link
Contributor

neubig commented Jun 30, 2017

Thanks for pointing this out, we'll take a look.

@neubig
Copy link
Contributor

neubig commented Jul 14, 2017

This was indeed unintuitive, so we revised the training interface, simplifying it and removing the "scaling" parameter in update, which was not intended to be used to scale the learning rate. Now you need to manage the learning rate in the way that it was intended, by setting the learning_rate member of the trainer class: #695

@neubig neubig closed this as completed Jul 14, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
moderate bug Issues that should be fixed but only affect less common environments or functionality
Projects
None yet
Development

No branches or pull requests

2 participants