-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
.fit() returns last not best weights in ddp_spawn #2565
Conversation
@@ -559,9 +560,13 @@ def ddp_train(self, process_idx, q, model, is_master=False, proc_offset=0): | |||
torch.cuda.empty_cache() | |||
|
|||
if self.global_rank == 0 and q is not None: | |||
rank_zero_warn('cleaning up ddp environment...') | |||
q.put(self.checkpoint_callback.best_model_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we add a None check here for checkpoint_callback? because the user can set it to None, if they want.
See #2547
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain this PR a bit? Just curious since it touched the code I touched and I want to make sure I understand it. :) is this only for “testing” as in unit/integration tests?
In ddp_spawn the model is only updated in a subprocess. Thus when .fit() ends the original model is still untrained. |
@williamFalcon ok and you use the queue to send back the best weight path to the main process in case the user wants the best weights for testing. got it. |
Hello @williamFalcon! Thanks for updating this PR.
Comment last updated at 2020-07-09 15:22:21 UTC |
Codecov Report
@@ Coverage Diff @@
## master #2565 +/- ##
=======================================
+ Coverage 87% 91% +4%
=======================================
Files 70 70
Lines 5703 5718 +15
=======================================
+ Hits 4960 5209 +249
+ Misses 743 509 -234 |
Fixes #2547