-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
deprecate passing ModelCheckpoint instance to Trainer(checkpoint_callback=...) #4336
Conversation
Hello @awaelchli! Thanks for updating this PR. There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2020-10-30 03:12:23 UTC |
@@ -85,7 +85,7 @@ class Trainer( | |||
def __init__( | |||
self, | |||
logger: Union[LightningLoggerBase, Iterable[LightningLoggerBase], bool] = True, | |||
checkpoint_callback: Union[ModelCheckpoint, bool] = True, | |||
checkpoint_callback: bool = True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: Why can we not just remove the checkpoint_callback argument all together?
A: Lightning philosophy is to provide good defaults, this includes checkpointing. We need a way to turn it off though, i.e., checkpoint_callback=False
What about keeping the checkpoint_callback
parameter but changing its default value to False
? I think it will be pretty annoying to have to set checkpoint_callback=False
every time you pass a custom ModelCheckpoint
via callbacks. And I think most people use a custom ModelCheckpoint
instead of just checkpoint_callback=True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about keeping the checkpoint_callback parameter but changing its default value to False?
doesn't solve the problem I'm trying to solve here, which is eliminate ambiguity when restoring the state of trainer. see answer of 2nd FAQ question.
pretty annoying to have to set checkpoint_callback=False every time you pass a custom ModelCheckpoint
with this PR proposal, the value will be ignored if you pass in a custom one. False is only needed when you want to disable checkpointing completely. I believe I have this covered in a test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i agree with @carmocca , this is super confusing when adding my own checkpoint callback. given how loose the default checkpoint callback is, and with the coming customizations, I'd rather drop the checkpoint_callback
arg altogether and force everything to be configured through callbacks
. Given the callback implementation already exists, I personally don't think it's much of a request for people to instantiate the checkpoint callback (and confirm their settings while doing so) and pass it along to the trainer.
I also think that's a nice message for users: "See how extensible this framework is" vs "look at all the magic this trainer configures for you which you can't change"
Even if that's not in this PR, it feels inevitable that checkpoint_callback=False
would eventually be the new default and then later we could drop the arg altogether
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes fine with me, I don't have strong preference here.
Note that this PR does NOT close the disussion on whether there should be a checkpoint_callback arg or not.
I'm simply restricting what can be passed to the argument.
It looks like a lot of api change, but it is really more a bugfix.
@williamFalcon @teddykoker remove ambiguity by making checkpoint_callback boolean. sg? |
@@ -144,7 +144,6 @@ def __scale_batch_reset_params(trainer, model, steps_per_trial): | |||
trainer.weights_summary = None # not needed before full run | |||
trainer.logger = DummyLogger() | |||
trainer.callbacks = [] # not needed before full run | |||
trainer.checkpoint_callback = False # required for saving |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed these from Tuner because ModelCheckpoint now entirely lives in callbacks list, and this is properly backed up by Tuner already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @SkafteNicki
Codecov Report
@@ Coverage Diff @@
## master #4336 +/- ##
=======================================
+ Coverage 91% 93% +2%
=======================================
Files 113 111 -2
Lines 8323 8138 -185
=======================================
- Hits 7588 7563 -25
+ Misses 735 575 -160 |
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jeff Yang <ydcjeff@outlook.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the change makes checkpoint_callback
a bool
, and to add args to it (monitor, top_k, dir, etc...) a custom one is to be passed into trainer with callback=[checkpoint_callback]
?
If so, the warning in trainer docs: Only user defined callbacks (ie: Not ModelCheckpoint)
need to be removed as well (trainer/init.py line 490)
@s-rog Yes correct. I have already updated these docs you mention on another branch, which will follow up to this. I didn't want to make the PR too big, so keeping docs updates separate. |
…back=...) (#4336) * first attempt * update tests * support multiple * test bugfix * changelog * pep * pep * import order * import * improve test for resuming * test * update test * add references test Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * docstring suggestion deprecation Co-authored-by: Jeff Yang <ydcjeff@outlook.com> * paramref Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Jeff Yang <ydcjeff@outlook.com> (cherry picked from commit d1234c5)
What does this PR do?
Pitch: Deprecate passing
ModelCheckpoint
instance directly tocheckpoint_callbacks
argument. Allow only bool.FAQ
Q: Why can we not just remove the
checkpoint_callback
argument all together?A: Lightning philosophy is to provide good defaults, this includes checkpointing. We need a way to turn it off though, i.e.,
checkpoint_callback=False
Q: Why can't we keep it the way it is?
A: When we add
Trainer(checkpoint_callback=mycustomcallback)
, internally Trainer adds it toself.callbacks
.A problem arises when we reload the Trainer using
resume_from_checkpoint="..."
. In this case, we would end up with a conflict and a decision needs to be made: Do we load the callback that was saved in the checkpoint, or do we overwrite it with the one that is passed in to Trainer. We can try to handle this case and make the decision for the user (#4027) or we go with this PR here to remove the ambiguity alltogether.Alternatives
checkpoint_callback
. This implies no default checkpointing.checkpoint_callback
and iftrainer.callbacks
is empty, add a ModelCheckpoint automatically. If any other callbacks need to be passed in, we need to supply a ModelCheckpoint manually.TODOs for this PR:
Fixes #4014 automatically
Maybe Fixes #4386 (need to reproduce first)
Closes #3990
Closes #4027
Related to #4335, as it is a good step in to the direction of allowing multiple model checkpoints.