Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor plugins backward #8328

Merged
merged 6 commits into from
Jul 8, 2021
Merged

Refactor plugins backward #8328

merged 6 commits into from
Jul 8, 2021

Conversation

carmocca
Copy link
Contributor

@carmocca carmocca commented Jul 7, 2021

What does this PR do?

  • Add pre_backward and post_backward to the precision plugin. Just as we have for the training type plugin.
  • The previous plugins can modify the closure_loss.
  • Refactor the precision backward API to reduce code duplication across plugins and clean the structure.
  • Properly type the optimizer arguments as Optional since they are not available during manual optimization.
  • Docs fixes
  • Always call the on_after_backward hook after backward regardless of should_accumulate
  • Always call the on_after_backward hook after backward when using manual optimization
  • Always call the LightningModule.backward hook. It was only done for Apex in manual optimization

Part of #7740 which validates that the hook order is correct

Does your PR introduce any breaking changes ? If yes, please list them.

EXPERIMENTAL API:

  • The backward hook of the precision plugin no longer takes a should_accumulate flag.
  • The backward hook of the precision plugin no longer returns a value.
  • The pre_backward and post_backward hooks of the training type plugin no longer take optimizer arguments.

STABLE API:

  • on_after_backward was only called if should_accumulate. It is now called always after backward.
  • on_after_backward was only called with the unscaled loss on mixed precision. This is no longer the case.

NOTES:

  • The LightningModule.backward hook will still get the optimizers arguments, which are now passed through *args, **kwargs. So no changes.
  • A follow-up PR (Add the on_before_optimizer_step hook #8048) will implement on_before_optimizer_step which will have the unscaled loss and will be only called if not accumulating. So the users who were relying on the previous behaviour can just update to the new hook.

Before submitting

  • Was this discussed/approved via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • [n/a] Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)
  • Did you list all the breaking changes introduced by this pull request?

PR review

  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified

@carmocca carmocca added bug Something isn't working refactor labels Jul 7, 2021
@carmocca carmocca added this to the v1.4 milestone Jul 7, 2021
@carmocca carmocca self-assigned this Jul 7, 2021
@codecov
Copy link

codecov bot commented Jul 7, 2021

Codecov Report

Merging #8328 (c1ae485) into master (c4353ea) will decrease coverage by 4%.
The diff coverage is 73%.

@@           Coverage Diff           @@
##           master   #8328    +/-   ##
=======================================
- Coverage      92%     88%    -4%     
=======================================
  Files         213     214     +1     
  Lines       13791   13891   +100     
=======================================
- Hits        12732   12223   -509     
- Misses       1059    1668   +609     

CHANGELOG.md Show resolved Hide resolved
pytorch_lightning/accelerators/accelerator.py Show resolved Hide resolved
pytorch_lightning/accelerators/accelerator.py Show resolved Hide resolved
pytorch_lightning/core/lightning.py Show resolved Hide resolved
Copy link
Member

@justusschock justusschock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Just added some notes on potential breaking changes on master, which I think are fine if the follow-up is done quickly :)

pytorch_lightning/accelerators/accelerator.py Show resolved Hide resolved
tests/plugins/test_amp_plugins.py Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
pytorch_lightning/core/lightning.py Show resolved Hide resolved
@carmocca carmocca merged commit eb6d991 into master Jul 8, 2021
@carmocca carmocca deleted the refactor/plugins-backward branch July 8, 2021 14:02
@carmocca carmocca mentioned this pull request Jul 8, 2021
11 tasks
@mergify mergify bot added the ready PRs ready to be merged label Jul 21, 2021
@kandluis
Copy link
Contributor

kandluis commented Jul 24, 2021

So one interesting non-BC change here is that on_after_backward is now called before the gradients are clipped/normalized (if gradient_clip_val is set in the Trainer).

I want to confirm this is an intentional design choice here? For a short-term mitigation, we've changed some of our downstream modules to use the on_before_optimizer_step hook rather than the on_after_backward.

Is this life-cycle detail documented somewhere I might have missed.

Edit: To clarify, this is for the Callback hook. I've not confirmed the behavior of the LightningModule hooks.

FYI @ananthsub

@carmocca
Copy link
Contributor Author

carmocca commented Jul 26, 2021

Hi @kandluis!

You are correct, what you pointed out is an intentional design choice. The change does not look as clear by just looking at this PR since it was done in 2 separate PRs.

The second part is in #8048. See the associated CHANGELOG updates

This is also mentioned in this PR's description:

A follow-up PR (Add the on_before_optimizer_step hook #8048) will implement on_before_optimizer_step which will have the unscaled loss and will be only called if not accumulating. So the users who were relying on the previous behaviour can just update to the new hook.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working ready PRs ready to be merged refactor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants