Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the on_before_optimizer_step hook #8048

Conversation

ddrevicky
Copy link
Contributor

@ddrevicky ddrevicky commented Jun 20, 2021

What does this PR do?

Adds on_before_optimizer_step hook which behaves as on_after_backward did, i.e., it is called only once before optimizer step and after all backward passes and after on_after_backward

co-author: @carmocca

Continues #8328
Fixes #7924

Before submitting

  • Was this discussed/approved via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)

PR review

  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified

@ddrevicky ddrevicky changed the title WIP: Bug/7924 on after backward should always run [WIP]: Bug/7924 on after backward should always run Jun 20, 2021
@ddrevicky ddrevicky marked this pull request as draft June 20, 2021 23:23
@codecov
Copy link

codecov bot commented Jun 20, 2021

Codecov Report

Merging #8048 (683d1a5) into master (1c825a2) will decrease coverage by 4%.
The diff coverage is 33%.

@@           Coverage Diff           @@
##           master   #8048    +/-   ##
=======================================
- Coverage      92%     88%    -4%     
=======================================
  Files         214     214            
  Lines       13900   13910    +10     
=======================================
- Hits        12814   12237   -577     
- Misses       1086    1673   +587     

@ddrevicky
Copy link
Contributor Author

Hi @carmocca, this is still a draft but I wanted to get your feedback.

Question 1

I tried putting on_before_optimizer_step before this line https://github.com/PyTorchLightning/pytorch-lightning/blob/cdcc483e9b7a79de3e5a7ac9c1e9dfd12ab77f4f/pytorch_lightning/loops/training_batch_loop.py#L187

so it looks like this:

if self.trainer.lightning_module.automatic_optimization:
    self.trainer.call_hook('on_before_optimizer_step', batch_idx, optimizer, opt_idx)       # Problem: called first
    self.optimizer_step(optimizer, opt_idx, batch_idx, closure)          # on_after_backward called after from within the closure

while calling self.on_after_backward(batch_idx, result.loss) right after this line https://github.com/PyTorchLightning/pytorch-lightning/blob/cdcc483e9b7a79de3e5a7ac9c1e9dfd12ab77f4f/pytorch_lightning/loops/training_batch_loop.py#L571

and putting it outside the if not self.should_accumulate(): condition. If that's what you meant then there is a problem there that on_before_optimizer_step is called first and then on_after_backward is called from within the closure which is not what we want here if I understand correctly. I therefore put on_before_optimizer_step into the closure exactly where on_after_backward is currently (which makes sense to me since the whole point of the issue that it currently behaves like on_before_optimizer_step). Is that fine with you (seemed to work fine when I tried it with accum_grad_batches)?

Question 2

Could you clear up for me what you meant by this quote (from the original issue)?

Regardless, maybe we shouldn't call it in that case as it won't get called when using manual optimization without AMP.
In those cases, it should be replaced with before_optimizer_step.

I would like to know whether I should modify the precision plugins in any way with regards to on_before_optimizer_step and on_after_backward and how exactly because I am not clear on how those work.

TODO:

I will have to update the hook tests most likely since they're failing and also do a once over to check that the new hook is everywhere where it should be.

@carmocca carmocca mentioned this pull request Jul 7, 2021
11 tasks
@ddrevicky
Copy link
Contributor Author

No problem @carmocca. If it's fine by you, please finish this PR once you're done with #8071, it will be simpler than keeping me in the loop. Thanks 🙂.

@carmocca carmocca changed the title [WIP]: Bug/7924 on after backward should always run Add the on_before_optimizer_step hook Jul 8, 2021
@carmocca carmocca added this to the v1.4 milestone Jul 8, 2021
@carmocca carmocca self-assigned this Jul 8, 2021
@carmocca carmocca added design Includes a design discussion priority: 0 High priority task labels Jul 8, 2021
@pep8speaks
Copy link

pep8speaks commented Jul 8, 2021

Hello @ddrevicky! Thanks for updating this PR.

Line 128:13: W503 line break before binary operator

Comment last updated at 2021-07-09 10:03:12 UTC

@carmocca carmocca marked this pull request as ready for review July 9, 2021 10:06
@carmocca carmocca enabled auto-merge (squash) July 9, 2021 10:16
Copy link
Contributor

@tchaton tchaton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM !

@carmocca carmocca merged commit 1b06edf into Lightning-AI:master Jul 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design Includes a design discussion priority: 0 High priority task
Projects
None yet
Development

Successfully merging this pull request may close these issues.

on_after_backward should always run after backward
6 participants