Add inverse sqrt learning rate scheduler #21495

Sager611 · 2023-02-07T17:38:39Z

What does this PR do?

Adds the original invserse sqrt learning rate scheduler from Vaswani et al. (2017).

It is argued that this scheduler achieves the best performance when scaling ViTs on indefinite training times (Zhai et al. (2022)).

This PR ads a get_inverse_sqrt_schedule function and also updates the test in tests/optimization/test_optimization.py and the docs.

The implementation is adapted from:

https://github.com/google-research/big_vision/blob/f071ce68852d56099437004fd70057597a95f6ef/big_vision/utils.py#L930

Timescale equals the no. of warmup steps by default as in:

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

sgugger

Thanks for adding this! I just have one comment.

sgugger · 2023-02-07T17:53:16Z

src/transformers/optimization.py

+    Note: this implementation is adapted from
+    https://github.com/google-research/big_vision/blob/f071ce68852d56099437004fd70057597a95f6ef/big_vision/utils.py#L930


This should be a comment in the doc, the user reading the documentation won't really care about this.

HuggingFaceDocBuilderDev · 2023-02-07T17:53:49Z

The documentation is not available anymore as the PR was closed or merged.

Sager611 and others added 9 commits February 6, 2023 16:32

added inverse sqrt lr scheduler

e69d4b8

Updated get_scheduler in src/transformers/optimization.py

271b77c

Updated src/transformers/__init__.py

8521bcc

Added inverse sqrt lr scheduler test

6e1e513

Updated docs/source/en/main_classes/optimizer_schedules.mdx

464a428

Ran style and quality scripts

5271b64

Merge branch 'huggingface:main' into inv-sqrt-lr-scheduler

7471452

Fix get_inverse_sqrt_schedule docstring

87c8047

Merge branch 'huggingface:main' into inv-sqrt-lr-scheduler

17c4049

sgugger approved these changes Feb 7, 2023

View reviewed changes

Comment implementation URL

57c696f

sgugger merged commit a3034c7 into huggingface:main Feb 7, 2023

muellerzr mentioned this pull request Feb 13, 2024

Correct zero division error in inverse sqrt scheduler #28982

Merged

5 tasks

sdbds mentioned this pull request Jun 28, 2024

Add New lr scheduler kohya-ss/sd-scripts#1393

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add inverse sqrt learning rate scheduler #21495

Add inverse sqrt learning rate scheduler #21495

Sager611 commented Feb 7, 2023

sgugger left a comment

sgugger Feb 7, 2023

HuggingFaceDocBuilderDev commented Feb 7, 2023 •

edited

Loading

		Note: this implementation is adapted from
		https://github.com/google-research/big_vision/blob/f071ce68852d56099437004fd70057597a95f6ef/big_vision/utils.py#L930

Add inverse sqrt learning rate scheduler #21495

Add inverse sqrt learning rate scheduler #21495

Conversation

Sager611 commented Feb 7, 2023

What does this PR do?

Before submitting

Who can review?

sgugger left a comment

Choose a reason for hiding this comment

sgugger Feb 7, 2023

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Feb 7, 2023 • edited Loading

HuggingFaceDocBuilderDev commented Feb 7, 2023 •

edited

Loading