-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Gradient clipping hooks in the LightningModule #6346
Comments
@SeanNaren @awaelchli Any thoughts ? |
Thanks @carmocca for bringing this up |
@rohitgr7 firendly ping? |
Being able to customize gradient clipping is still really important to the NeMo team. I hope these hooks can be added soon. |
yes, working on it.. will update soon. |
@rohitgr7 @carmocca Some of the issues we ought to resolve here:
Using It also goes against the principle of #7315 |
Hey @ananthsub, 1 ) I believe we could pass the clip_val and gradient_clip_algorithm as extra arguments to the function. However, users might ignore both if they have a custom implementation. 2 ) DeepSpeed and FSDP don't support gradient clipping as the gradients are concatenated. I don't think we should block the features entirely for 2 plugins which are non mature for this feature.
Best, |
+1 to passing the existing arguments to the hook so that they can be used for a default implementation |
|
ok seems like keeping trainer arguments is a good option and we can pass them to this hook directly as defaults. for now I can think of 2 ways for api design. def clip_gradients(self, opt, opt_idx, clip_val, clip_algo):
# if someone wants to use the internal implementation, use super
if opt_idx == 0:
super().clip_gradients(opt, clip_val, clip_algo)
else:
# implement your own
on this one, pointed out by @awaelchli that setting the values in the hooks signature and then calling super() is not something we have anywhere in Lightning so it will be very unfamiliar to users.
def configure_gradient_clipping(self, optimizer, optimizer_idx, clip_val, clip_algo):
if optimizer_idx == 0:
self.clip_gradients(optimizer, clip_val, clip_algo). # lightning will handle this
else:
# implement your own but open to suggestions :) |
Hey @rohitgr7, Interesting. It is a great point ! I like the option 2. @ananthsub @carmocca Any thoughts ? Best, |
🚀 Feature
Add clipping hooks to the LightningModule
Motivation
It's currently very difficult to change the clipping logic
Pitch
The default implementation would be the same as we currently provide, where the trainer's clipping flags are used.
Maybe those would be deprecated in favor of LightningModule properties.
Need to evaluate the limitations since clipping is currently tied to plugins
Additional context
This would fix #5096, #6123 (comment), #5671, #5982, and allow easily implementing new clipping techniques without having to merge them into Lightning
cc: @rohitgr7 who has been pushing for this for a while
The text was updated successfully, but these errors were encountered: