[RFC] Gradient clipping hooks in the LightningModule #6346

carmocca · 2021-03-04T15:59:42Z

🚀 Feature

Add clipping hooks to the LightningModule

Motivation

It's currently very difficult to change the clipping logic

Pitch

class LightningModule:
    def clip_gradients(self, optimizer, optimizer_idx):
        ...

The default implementation would be the same as we currently provide, where the trainer's clipping flags are used.

Maybe those would be deprecated in favor of LightningModule properties.

class LightningOptimizer
    def step(closure=closure)
        if closure is None:
            closure = do_nothing_closure
        def wrapper_closure()
            closure()
            self._trainer.call_hook("clip_gradients", self.optimizer)
        self.optimizer.step(closure=wrapper_closure)

Need to evaluate the limitations since clipping is currently tied to plugins

Additional context

This would fix #5096, #6123 (comment), #5671, #5982, and allow easily implementing new clipping techniques without having to merge them into Lightning

cc: @rohitgr7 who has been pushing for this for a while

tchaton · 2021-03-04T16:02:04Z

@SeanNaren @awaelchli Any thoughts ?

rohitgr7 · 2021-03-04T17:12:45Z

Thanks @carmocca for bringing this up ☺️. This is on my TODO list actually but I kept pushing it further due to refactors. It would be great to have this since it's a part of optimization just like optimizer_step.

edenlightning · 2021-07-01T20:29:43Z

@rohitgr7 firendly ping?

ericharper · 2021-09-17T18:46:11Z

Being able to customize gradient clipping is still really important to the NeMo team. I hope these hooks can be added soon.

rohitgr7 · 2021-09-17T18:49:25Z

yes, working on it.. will update soon.

ananthsub · 2021-09-17T20:30:49Z

@rohitgr7 @carmocca Some of the issues we ought to resolve here:

What's the expected API on the LightningModule?
How do we reconcile the LightningModule that the user sees vs the automatically parallelized model the trainer sees? Not all gradient clipping techniques are compatible with techniques like DeepSpeed or FSDP. For example: https://github.com/PyTorchLightning/pytorch-lightning/blob/c7451b3ccf742b0e8971332caf2e041ceabd9fe8/pytorch_lightning/plugins/precision/fully_sharded_native_amp.py#L28-L46
What happens to the trainer flags for gradient clip value or gradient clip algorithm? How does someone know if those flags are being used or not?
@awaelchli asked why not implement this in on_after_backward ?

Using self.trainer.accelerator as part of the default implementation in the LightningModule is too fragile: someone will override the whole method with a custom implementation, and it'll fail when using different trainer flags. At the same time, it's not good if we have 2 different places where gradient clipping is implemented: once in the plugin, and once in the LIghtningModule.

It also goes against the principle of #7315

tchaton · 2021-09-22T08:30:41Z

on_after_backward

Hey @ananthsub,

1 ) I believe we could pass the clip_val and gradient_clip_algorithm as extra arguments to the function. However, users might ignore both if they have a custom implementation.

2 ) DeepSpeed and FSDP don't support gradient clipping as the gradients are concatenated. I don't think we should block the features entirely for 2 plugins which are non mature for this feature.

The PrecisionPlugin should raise a MisConfiguration if the values are being provided and the plugins doesn't support it. If the users override the clip_gradient, we would just put a warning.
on_after_backward is actually the wrong place to implement this. This won't work with accumulated_grad_batches as it is called after each backward, it should be done in on_before_optimizer_step hook. The fact we are confused ourself where to apply gradient clipping makes me think users will be too. A dedicated hook seems simpler
Should we do loop -> lightning module -> trainer -> accelerator or loop -> accelerator -> lightning module. I would prefer the first one as it is makes maintenance simpler and the code more minimalist. But don't have a strong opinion there.

Best,
T.C

carmocca · 2021-09-22T12:25:03Z

+1 to passing the existing arguments to the hook so that they can be used for a default implementation

rohitgr7 · 2021-09-22T20:29:39Z

+1 to passing the existing arguments to the hook so that they can be used for a default implementation

so are you suggesting keeping both trainer arguments add adding arguments in the new clip_gradients hook? I'd prefer not to since it might confuse users a little bit.. like having the same hparams at 2 different locations.. but that's just a personal opinion.
regarding loop -> lightning module -> trainer -> accelerator part I think we have the same workflow for optimizer_step so this looks ok to me.

rohitgr7 · 2021-09-23T15:20:18Z

ok seems like keeping trainer arguments is a good option and we can pass them to this hook directly as defaults.

for now I can think of 2 ways for api design.

def clip_gradients(self, opt, opt_idx, clip_val, clip_algo):
    # if someone wants to use the internal implementation, use super
   if opt_idx == 0:
     super().clip_gradients(opt, clip_val, clip_algo)
   else:
      # implement your own

on this one, pointed out by @awaelchli that setting the values in the hooks signature and then calling super() is not something we have anywhere in Lightning so it will be very unfamiliar to users.

or we can put the current implementation inside clip_gradients and make configure_gradient_clipping available to users

def configure_gradient_clipping(self, optimizer, optimizer_idx, clip_val, clip_algo):
    if optimizer_idx == 0:
        self.clip_gradients(optimizer, clip_val, clip_algo). # lightning will handle this
    else:
        # implement your own

but open to suggestions :)

tchaton · 2021-09-27T15:36:28Z

Hey @rohitgr7,

Interesting. It is a great point !

I like the option 2.

@ananthsub @carmocca Any thoughts ?

Best,
T.C

carmocca added feature Is an improvement or enhancement help wanted Open to be worked on refactor design Includes a design discussion labels Mar 4, 2021

carmocca added this to the 1.3 milestone Mar 4, 2021

edenlightning mentioned this issue Mar 22, 2021

Adaptive Gradient Clipping #2963

Closed

edenlightning removed this from the v1.3 milestone Apr 27, 2021

edenlightning added this to the v1.4 milestone May 9, 2021

edenlightning assigned rohitgr7 Jul 1, 2021

edenlightning modified the milestones: v1.4, v1.5 Jul 1, 2021

rohitgr7 mentioned this issue Sep 17, 2021

Add configure_gradient_clipping hook in LightningModule #9584

Merged

12 tasks

tchaton mentioned this issue Sep 21, 2021

[Feat] Add clip_gradients hook to LightningModule #9624

Closed

12 tasks

rohitgr7 closed this as completed in #9584 Oct 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Gradient clipping hooks in the LightningModule #6346

[RFC] Gradient clipping hooks in the LightningModule #6346

carmocca commented Mar 4, 2021 •

edited

Loading

tchaton commented Mar 4, 2021 •

edited

Loading

rohitgr7 commented Mar 4, 2021

edenlightning commented Jul 1, 2021

ericharper commented Sep 17, 2021

rohitgr7 commented Sep 17, 2021

ananthsub commented Sep 17, 2021 •

edited

Loading

tchaton commented Sep 22, 2021 •

edited

Loading

carmocca commented Sep 22, 2021

rohitgr7 commented Sep 22, 2021

rohitgr7 commented Sep 23, 2021

tchaton commented Sep 27, 2021

[RFC] Gradient clipping hooks in the LightningModule #6346

[RFC] Gradient clipping hooks in the LightningModule #6346

Comments

carmocca commented Mar 4, 2021 • edited Loading

🚀 Feature

Motivation

Pitch

Additional context

tchaton commented Mar 4, 2021 • edited Loading

rohitgr7 commented Mar 4, 2021

edenlightning commented Jul 1, 2021

ericharper commented Sep 17, 2021

rohitgr7 commented Sep 17, 2021

ananthsub commented Sep 17, 2021 • edited Loading

tchaton commented Sep 22, 2021 • edited Loading

carmocca commented Sep 22, 2021

rohitgr7 commented Sep 22, 2021

rohitgr7 commented Sep 23, 2021

tchaton commented Sep 27, 2021

carmocca commented Mar 4, 2021 •

edited

Loading

tchaton commented Mar 4, 2021 •

edited

Loading

ananthsub commented Sep 17, 2021 •

edited

Loading

tchaton commented Sep 22, 2021 •

edited

Loading