Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loop Refactor 1/N - Training Loop #7871

Merged
merged 543 commits into from
Jun 15, 2021
Merged

Conversation

awaelchli
Copy link
Contributor

@awaelchli awaelchli commented Jun 7, 2021

What does this PR do?

Part of #7700

Introduces the new loop interface.
Old classes will be removed asap in a follow up.

co-author: @justusschock

How to review this PR

  1. Review the new loop base class found in pytorch_lightning/loops/base.py
  2. Review the new training loop classes
    • EpochLoop: runs over epochs
    • TrainingLoop: runs over all batches in a single epoch
    • BatchLoop: runs over a single batch (could include looping over several optimizers or truncated backprop steps).
  3. Review changes in the Trainer class file. Here we have some re-routing from old to new. This will be cleaned up

The new training loop classes are re-organizing the old TrainLoop class.
Important: Changes requested on code lines that are 1-1 from the old TrainLoop class will be dismissed. We cannot address these comments here. More refactor iterations will be required to simplify all existing helper functions.

TODO:

More follow-up work is listed in this issue tracker: #7938

Before submitting

  • Was this discussed/approved via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)

PR review

Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:

  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

Make sure you had fun coding 🙃

Fixes #7034

justusschock and others added 3 commits June 14, 2021 17:45
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
…train' into refactor/loops/loops_everywhere_train
Copy link
Contributor

@kaushikb11 kaushikb11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What a kickass PR!

@tchaton
Copy link
Contributor

tchaton commented Jun 14, 2021

Hey @awaelchli,

It is not a blocker for this PR, but there is a small inconsistency in the loop right now.
"on_train_epoch_start" is being called in FitLoop `on_advance_start
and "on_train_epoch_end" is being called in TrainellopLoop in "on_run_end".

Best,
T.C

@awaelchli
Copy link
Contributor Author

great observation @tchaton, moved to on_run_start!

CHANGELOG.md Outdated Show resolved Hide resolved
pytorch_lightning/loops/fit_loop.py Show resolved Hide resolved
pytorch_lightning/loops/base.py Outdated Show resolved Hide resolved
@tchaton tchaton mentioned this pull request Jun 15, 2021
11 tasks
@awaelchli awaelchli enabled auto-merge (squash) June 15, 2021 12:00
@awaelchli awaelchli merged commit 971908a into master Jun 15, 2021
@awaelchli awaelchli deleted the refactor/loops/loops_everywhere_train branch June 15, 2021 12:55
@carmocca carmocca mentioned this pull request Jun 21, 2021
9 tasks
Comment on lines +205 to +206
if epoch_output is None:
return
Copy link
Contributor

@carmocca carmocca Sep 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@awaelchli do you remember why this was necessary?

edit: checking here #9298

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a good reason, something along those lines of my other comment. pretty sure without this a test will fail. I would not have added it otherwise. But I should have added a comment. De-cyphering handling of these edge cases was definitely a center point of frustration for these loop refactors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +177 to +179
if self.batches_seen == 0:
# dataloader/iterator did not produce a batch
return
Copy link
Contributor

@carmocca carmocca Sep 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@awaelchli do you remember why this was necessary?

edit: checking here #9298

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, basically if the epoch is empty we don't want to run epoch_end hooks or invoke any logging. when does this happen? a user can stop training or skip an epoch by returning a signal from one of the hooks. I don't remember the exact use case. Pretty sure if you remove this line a test will fail and inform you exactly the reason.

@mergify mergify bot added the ready PRs ready to be merged label Sep 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design Includes a design discussion feature Is an improvement or enhancement ready PRs ready to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants