Sharded Accelerator 1/n: Expose clip gradients to plugins via abstract class #4639

SeanNaren · 2020-11-12T14:20:04Z

Ties to #4178

What does this PR do?

To ensure we have the appropriate behaviour in clip gradients, I've made it a responsibility of the plugin if it exists in the accelerator. I've also simplified the logic to reflect what should happen in AMP/Apex cases using the torch built in clip_grad_norm_ function.

This is required because sharded accelerator will need a custom clip gradient function :)

Before submitting

Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
Did you read the contributor guideline, Pull Request section?
Did you make sure your PR does only one thing, instead of bundling different changes together? Otherwise, we ask you to create a separate PR for every change.
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?
Did you verify new and existing tests pass locally with your changes?
If you made a notable change (that affects users), did you update the CHANGELOG?

PR review

Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In in short, see following bullet-list:

Is this pull request ready for review? (if not, please submit in draft mode)
Check that all items from Before submitting are resolved
Make sure the title is self explanatory and the description concisely explains the PR
Add labels and milestones (and optionally projects) to the PR so it can be classified; Bugfixes should be including in bug-fix release milestones (m.f.X) and features should be included in (m.X.b) releases.

Did you have fun?

Make sure you had fun coding 🙃

…e within accelerator to clip gradients

SeanNaren · 2020-11-12T14:33:01Z

Need to sort out APEX fp16 (with level set to O2 all parameters are in FP16, I need to change the epsilon to match this)

EDIT: is done :)

…gradients), add override for O2 support apex

tchaton

Great ! Just one question about clipping for cpu.

pytorch_lightning/accelerators/accelerator.py

SkafteNicki

Smaller comments, LGTM

pytorch_lightning/plugins/native_amp.py

pytorch_lightning/plugins/precision_plugin.py

…tor. Default to standard clip function for vanilla torch

pytorch_lightning/accelerators/tpu_accelerator.py

tchaton

Great PR !

awaelchli · 2020-11-12T16:14:38Z

pytorch_lightning/accelerators/accelerator.py

+    def _clip_gradients(self, optimizer, grad_clip_val):
+        if self.trainer.amp_backend:
+            self.trainer.precision_connector.backend.clip_gradients(grad_clip_val, optimizer)
        else:
-            parameters = model.parameters()
-
-        max_norm = grad_clip_val
-        norm_type = float(2.0)
-
-        if isinstance(parameters, torch.Tensor):
-            parameters = [parameters]
-        parameters = list(filter(lambda p: p.grad is not None, parameters))
-
-        if norm_type == math.inf:
-            total_norm = max(p.grad.data.abs().max() for p in parameters)
-        else:
-            device = parameters[0].device
-            out = torch.empty(len(parameters), device=device)
-            for i, p in enumerate(parameters):
-                torch.norm(p.grad.data.to(device), norm_type, out=out[i])
-            total_norm = torch.norm(out, norm_type)
-
-        eps = EPSILON_FP16 if self.trainer.precision == 16 else EPSILON
-        clip_coef = torch.tensor(max_norm, device=device) / (total_norm + eps)
-        clip_coef = torch.min(clip_coef, torch.ones_like(clip_coef))
-        for p in parameters:
-            p.grad.data.mul_(clip_coef.to(p.grad.data.device))
+            model = self.trainer.get_model()
+            torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=grad_clip_val, norm_type=2.0)


I wonder if it would make sense to have a precision plugin for 32bit for the default case.
Then we wouldn't need this if/else block and could just call
self.trainer.precision_connector.backend.clip_gradients(grad_clip_val, optimizer)
just as a refactoring idea for the future.

talking about this block (github selected more):

def _clip_gradients(self, optimizer, grad_clip_val): if self.trainer.amp_backend: self.trainer.precision_connector.backend.clip_gradients(grad_clip_val, optimizer) else: model = self.trainer.get_model() torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=grad_clip_val, norm_type=2.0)

I agree fully, a precision 32 plugin makes the most sense because the accelerators shouldn't need to care about these details (It goes into @justusschock's refactor proposal of making sure the accelerators are hardware only).

Out of the scope of this PR, but definitely something low hanging to reduce liability of issues in the accelerators

I'm going to do this in my refactor proposal :)

Borda

I feel a bit confused about what is going to be overwritten for Accelerator and plugin, one seems to overwrite in child clip_gradients the other _clip_gradients

pytorch_lightning/accelerators/accelerator.py

pytorch_lightning/accelerators/tpu_accelerator.py

pytorch_lightning/plugins/apex.py

codecov · 2020-11-12T16:29:47Z

Codecov Report

Merging #4639 (58190e4) into master (4a01fd0) will increase coverage by 3%.
The diff coverage is 62%.

@@           Coverage Diff           @@
##           master   #4639    +/-   ##
=======================================
+ Coverage      90%     93%    +3%     
=======================================
  Files         116     117     +1     
  Lines        8881    8898    +17     
=======================================
+ Hits         8012    8264   +252     
+ Misses        869     634   -235

…t class (#4639) * Added abstract precision plugin to expose clip_gradients function, use within accelerator to clip gradients * Exclude model from override, keep optimizer (needed for sharded clip gradients), add override for O2 support apex * Fix doc * Applied codereview changes * Refactored clip function to encapsulate tpu changes with tpu accelerator. Default to standard clip function for vanilla torch * Pass correct grad clip val * Moved var to property * Apply code review suggestions (cherry picked from commit bacabae)

…t class (#4639) * Added abstract precision plugin to expose clip_gradients function, use within accelerator to clip gradients * Exclude model from override, keep optimizer (needed for sharded clip gradients), add override for O2 support apex * Fix doc * Applied codereview changes * Refactored clip function to encapsulate tpu changes with tpu accelerator. Default to standard clip function for vanilla torch * Pass correct grad clip val * Moved var to property * Apply code review suggestions

Added abstract precision plugin to expose clip_gradients function, us…

d4074c5

…e within accelerator to clip gradients

SeanNaren added feature Is an improvement or enhancement distributed Generic distributed-related topic priority: 1 Medium priority task labels Nov 12, 2020

SeanNaren requested review from ananyahjha93, awaelchli, Borda, justusschock, nateraw, tchaton, teddykoker and williamFalcon as code owners November 12, 2020 14:20

SeanNaren self-assigned this Nov 12, 2020

Merge branch 'master' into feature/817-fairscale-1n

5ab2562

SeanNaren added this to the 1.1 milestone Nov 12, 2020

Exclude model from override, keep optimizer (needed for sharded clip …

7a818f9

…gradients), add override for O2 support apex

tchaton requested changes Nov 12, 2020

View reviewed changes

pytorch_lightning/accelerators/accelerator.py Outdated Show resolved Hide resolved

Fix doc

ac92cf3

SkafteNicki approved these changes Nov 12, 2020

View reviewed changes

pytorch_lightning/plugins/native_amp.py Outdated Show resolved Hide resolved

pytorch_lightning/plugins/native_amp.py Outdated Show resolved Hide resolved

pytorch_lightning/plugins/precision_plugin.py Show resolved Hide resolved

SeanNaren added 3 commits November 12, 2020 15:20

Applied codereview changes

d3e6e9e

Refactored clip function to encapsulate tpu changes with tpu accelera…

90460d6

…tor. Default to standard clip function for vanilla torch

Pass correct grad clip val

9d3b82c

tchaton requested changes Nov 12, 2020

View reviewed changes

pytorch_lightning/accelerators/tpu_accelerator.py Outdated Show resolved Hide resolved

Moved var to property

205967a

tchaton approved these changes Nov 12, 2020

View reviewed changes

Merge branch 'master' into feature/817-fairscale-1n

378c4eb

awaelchli approved these changes Nov 12, 2020

View reviewed changes

Borda reviewed Nov 12, 2020

View reviewed changes

justusschock approved these changes Nov 12, 2020

View reviewed changes

Apply code review suggestions

58190e4

SeanNaren merged commit bacabae into master Nov 12, 2020

SeanNaren deleted the feature/817-fairscale-1n branch November 12, 2020 17:18

snyk-bot mentioned this pull request Mar 9, 2023

[Snyk] Upgrade axios from 0.25.0 to 0.27.2 slumericanBx/lightning#5

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sharded Accelerator 1/n: Expose clip gradients to plugins via abstract class #4639

Sharded Accelerator 1/n: Expose clip gradients to plugins via abstract class #4639

SeanNaren commented Nov 12, 2020

SeanNaren commented Nov 12, 2020 •

edited

Loading

tchaton left a comment

SkafteNicki left a comment

tchaton left a comment

awaelchli Nov 12, 2020

awaelchli Nov 12, 2020 •

edited

Loading

SeanNaren Nov 12, 2020

justusschock Nov 12, 2020

Borda left a comment •

edited

Loading

codecov bot commented Nov 12, 2020 •

edited

Loading

Sharded Accelerator 1/n: Expose clip gradients to plugins via abstract class #4639

Sharded Accelerator 1/n: Expose clip gradients to plugins via abstract class #4639

Conversation

SeanNaren commented Nov 12, 2020

What does this PR do?

Before submitting

PR review

Did you have fun?

SeanNaren commented Nov 12, 2020 • edited Loading

tchaton left a comment

Choose a reason for hiding this comment

SkafteNicki left a comment

Choose a reason for hiding this comment

tchaton left a comment

Choose a reason for hiding this comment

awaelchli Nov 12, 2020

Choose a reason for hiding this comment

awaelchli Nov 12, 2020 • edited Loading

Choose a reason for hiding this comment

SeanNaren Nov 12, 2020

Choose a reason for hiding this comment

justusschock Nov 12, 2020

Choose a reason for hiding this comment

Borda left a comment • edited Loading

Choose a reason for hiding this comment

codecov bot commented Nov 12, 2020 • edited Loading

Codecov Report

SeanNaren commented Nov 12, 2020 •

edited

Loading

awaelchli Nov 12, 2020 •

edited

Loading

Borda left a comment •

edited

Loading

codecov bot commented Nov 12, 2020 •

edited

Loading