-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fix] Ensure we set the eval/train flag correctly on accelerator model #6877
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great catch!!! let's add comments here for when to use self.model vs when to directly use self.lightning_module
Thanks @ananthsub! The difference from the names can be quite confusing, and in most cases the choice is clearer; if you want to access |
Ah great point, forgot |
Sounds good to me ! |
Thanks @tchaton just a slight mistake I should be referring to the module hooks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good <3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM !
@SeanNaren the test is failing |
…ed in the setup function
The test showed that we call the predict hook twice, I've remedied this and it now follows the same behaviour as run_evaluation and run_train |
Codecov Report
@@ Coverage Diff @@
## master #6877 +/- ##
=======================================
- Coverage 91% 86% -6%
=======================================
Files 193 193
Lines 12299 12539 +240
=======================================
- Hits 11252 10779 -473
- Misses 1047 1760 +713 |
#6877) * Ensure we move the model to eval mode before running evaluation * Ensure we set the flag appropriately across all stages * Add test, move hooks logic * Apply same fix to the validate loop * Update pytorch_lightning/trainer/trainer.py * Fix function name * Fix order, add predict * Shorten the name * Fix input dm, drop duplicate on predict start hook call, as it's called in the setup function * Use hook, remove double call (cherry picked from commit 742c48e)
#6877) * Ensure we move the model to eval mode before running evaluation * Ensure we set the flag appropriately across all stages * Add test, move hooks logic * Apply same fix to the validate loop * Update pytorch_lightning/trainer/trainer.py * Fix function name * Fix order, add predict * Shorten the name * Fix input dm, drop duplicate on predict start hook call, as it's called in the setup function * Use hook, remove double call (cherry picked from commit 742c48e)
#6877) * Ensure we move the model to eval mode before running evaluation * Ensure we set the flag appropriately across all stages * Add test, move hooks logic * Apply same fix to the validate loop * Update pytorch_lightning/trainer/trainer.py * Fix function name * Fix order, add predict * Shorten the name * Fix input dm, drop duplicate on predict start hook call, as it's called in the setup function * Use hook, remove double call (cherry picked from commit 742c48e)
What does this PR do?
Fixes #6876.
This PR is ready, with more information underneath.
We also require facebookresearch/fairscale#587 to be merged and included in next release, but since we do not rely on the upstream release yet, this fix isn't going to change any FairScale related tests on CI. This will probably also block #6152 as we move from our own fork to the latest FairScale release.
In terms of testing, once #6152 is merged to deprecate the current Pipe implementation, this fix will come to fruition as the FairScale version will be reliant on the upstream release.
Overall however, I notice in many places we call
train
eval
on just the lightning module, not the accelerator wrapped model. In the case of custom implementations like SDP, this is an issue. I swapped this to usingself.model
which should recursively set the lightning module when it is wrapped.Before submitting
PR review
Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:
Did you have fun?
Make sure you had fun coding 🙃