-
Notifications
You must be signed in to change notification settings - Fork 26.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add gradient checkpointing to Whisper Flax #22954
Add gradient checkpointing to Whisper Flax #22954
Conversation
The documentation is not available anymore as the PR was closed or merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice @versae! Just a few minor nits, but otherwise this PR is looking good!
init_cache=init_cache, | ||
output_attentions=output_attentions, | ||
deterministic=deterministic, | ||
attention_mask, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to reviewer: remat
does not support key-word arguments, hence the need to change to pure arguments
Thanks for the review, @sanchit-gandhi! Should be all good now 😃. |
Amazing @versae! Requesting final review before we can get this merged 🤗 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution!
Thank you! I learnt a lot 🤓 |
* Add gradient checkpointing to Whisper Flax * self.gradient_checkpointing only needed in nn.Module, removing unnecessary comments
* Add gradient checkpointing to Whisper Flax * self.gradient_checkpointing only needed in nn.Module, removing unnecessary comments
It uses
flax.linen.remat
and follows on PRs #13657 and #17994.What does this PR do?
Adds gradient_checkpointing to Flax Whisper models.
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@sanchit-gandhi @peregilk