fix a bug when using fp16 training & gradient clipping #5426

YKX-A · 2021-09-30T04:54:15Z

Fixes # .

Changes proposed in this pull request:

A bug that incorrectly clips scaled gradients when we use mixed-precision training and gradient clipping at the same time. So I modify the logic of gradient clipping. (see also Concern about the implementation of gradient_clipping during fp16 training #5413)
(@epwalsh, Come here to see if it's ok.)

Before submitting

I've read and followed all steps in the Making a pull request
section of the CONTRIBUTING docs.
I've updated or added any relevant docstrings following the syntax described in the
Writing docstrings section of the CONTRIBUTING docs.
If this PR fixes a bug, I've added a test that will fail without my fix.
If this PR adds a new feature, I've added tests that sufficiently cover my new functionality.

After submitting

All GitHub Actions jobs for my pull request have passed.
codecov/patch reports high test coverage (at least 90%).
You can find this under the "Actions" tab of the pull request once the other checks have finished.

epwalsh · 2021-10-01T16:49:58Z

@YKX-A can you run black . from the repo root to fix formatting.

epwalsh

This looks great other than the minor formatting issues!

epwalsh · 2021-10-01T16:50:50Z

allennlp/training/gradient_descent_trainer.py

+            # 1. We have to unscale the gradient before clipping
+            if self._scaler is not None:
+                optimizer_state = self._scaler._per_optimizer_states[id(self.optimizer)]
+                # 2. The `unscale_` should't be performed more than once per optimizer per step call,


Suggested change

# 2. The `unscale_` should't be performed more than once per optimizer per step call,

# 2. The `unscale_` shouldn't be performed more than once per optimizer per step call,

Oh, that's my ignorance 😰. Is it correct now?

YKX-A · 2021-10-04T12:46:34Z

@epwalsh

epwalsh

LGTM! Thanks @YKX-A 🙂

fix a bug when using fp16 training & gradient clipping

7adae90

epwalsh reviewed Oct 1, 2021

View reviewed changes

fix format

e41d0d9

format

fd8e358

epwalsh approved these changes Oct 7, 2021

View reviewed changes

epwalsh merged commit 17ef1aa into allenai:main Oct 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix a bug when using fp16 training & gradient clipping #5426

fix a bug when using fp16 training & gradient clipping #5426

YKX-A commented Sep 30, 2021

epwalsh commented Oct 1, 2021

epwalsh left a comment

epwalsh Oct 1, 2021

YKX-A Oct 2, 2021

YKX-A commented Oct 4, 2021

epwalsh left a comment

	# 2. The `unscale_` should't be performed more than once per optimizer per step call,
	# 2. The `unscale_` shouldn't be performed more than once per optimizer per step call,

fix a bug when using fp16 training & gradient clipping #5426

fix a bug when using fp16 training & gradient clipping #5426

Conversation

YKX-A commented Sep 30, 2021

Before submitting

After submitting

epwalsh commented Oct 1, 2021

epwalsh left a comment

Choose a reason for hiding this comment

epwalsh Oct 1, 2021

Choose a reason for hiding this comment

YKX-A Oct 2, 2021

Choose a reason for hiding this comment

YKX-A commented Oct 4, 2021

epwalsh left a comment

Choose a reason for hiding this comment