-
Notifications
You must be signed in to change notification settings - Fork 27.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add inverse sqrt learning rate scheduler #21495
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this! I just have one comment.
src/transformers/optimization.py
Outdated
Note: this implementation is adapted from | ||
https://github.com/google-research/big_vision/blob/f071ce68852d56099437004fd70057597a95f6ef/big_vision/utils.py#L930 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a comment in the doc, the user reading the documentation won't really care about this.
The documentation is not available anymore as the PR was closed or merged. |
What does this PR do?
Adds the original invserse sqrt learning rate scheduler from Vaswani et al. (2017).
It is argued that this scheduler achieves the best performance when scaling ViTs on indefinite training times (Zhai et al. (2022)).
This PR ads a
get_inverse_sqrt_schedule
function and also updates the test intests/optimization/test_optimization.py
and the docs.The implementation is adapted from:
Timescale equals the no. of warmup steps by default as in:
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.