Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deprecate passing ModelCheckpoint instance to Trainer(checkpoint_callback=...) #4336

Merged
merged 17 commits into from
Oct 30, 2020

Conversation

awaelchli
Copy link
Contributor

@awaelchli awaelchli commented Oct 24, 2020

What does this PR do?

Pitch: Deprecate passing ModelCheckpoint instance directly to checkpoint_callbacks argument. Allow only bool.

FAQ

Q: Why can we not just remove the checkpoint_callback argument all together?
A: Lightning philosophy is to provide good defaults, this includes checkpointing. We need a way to turn it off though, i.e., checkpoint_callback=False

Q: Why can't we keep it the way it is?
A: When we add Trainer(checkpoint_callback=mycustomcallback), internally Trainer adds it to self.callbacks.
A problem arises when we reload the Trainer using resume_from_checkpoint="...". In this case, we would end up with a conflict and a decision needs to be made: Do we load the callback that was saved in the checkpoint, or do we overwrite it with the one that is passed in to Trainer. We can try to handle this case and make the decision for the user (#4027) or we go with this PR here to remove the ambiguity alltogether.

Alternatives

TODOs for this PR:

  • update docs -> follow up PR

Fixes #4014 automatically
Maybe Fixes #4386 (need to reproduce first)
Closes #3990
Closes #4027
Related to #4335, as it is a good step in to the direction of allowing multiple model checkpoints.

@awaelchli awaelchli added bug Something isn't working discussion In a discussion stage checkpointing Related to checkpointing callback labels Oct 24, 2020
@pep8speaks
Copy link

pep8speaks commented Oct 24, 2020

Hello @awaelchli! Thanks for updating this PR.

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-10-30 03:12:23 UTC

@@ -85,7 +85,7 @@ class Trainer(
def __init__(
self,
logger: Union[LightningLoggerBase, Iterable[LightningLoggerBase], bool] = True,
checkpoint_callback: Union[ModelCheckpoint, bool] = True,
checkpoint_callback: bool = True,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Why can we not just remove the checkpoint_callback argument all together?
A: Lightning philosophy is to provide good defaults, this includes checkpointing. We need a way to turn it off though, i.e., checkpoint_callback=False

What about keeping the checkpoint_callback parameter but changing its default value to False? I think it will be pretty annoying to have to set checkpoint_callback=False every time you pass a custom ModelCheckpoint via callbacks. And I think most people use a custom ModelCheckpoint instead of just checkpoint_callback=True

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about keeping the checkpoint_callback parameter but changing its default value to False?

doesn't solve the problem I'm trying to solve here, which is eliminate ambiguity when restoring the state of trainer. see answer of 2nd FAQ question.

pretty annoying to have to set checkpoint_callback=False every time you pass a custom ModelCheckpoint

with this PR proposal, the value will be ignored if you pass in a custom one. False is only needed when you want to disable checkpointing completely. I believe I have this covered in a test.

Copy link
Contributor

@ananthsub ananthsub Oct 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i agree with @carmocca , this is super confusing when adding my own checkpoint callback. given how loose the default checkpoint callback is, and with the coming customizations, I'd rather drop the checkpoint_callback arg altogether and force everything to be configured through callbacks. Given the callback implementation already exists, I personally don't think it's much of a request for people to instantiate the checkpoint callback (and confirm their settings while doing so) and pass it along to the trainer.

I also think that's a nice message for users: "See how extensible this framework is" vs "look at all the magic this trainer configures for you which you can't change"

Even if that's not in this PR, it feels inevitable that checkpoint_callback=False would eventually be the new default and then later we could drop the arg altogether

Copy link
Contributor Author

@awaelchli awaelchli Oct 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes fine with me, I don't have strong preference here.
Note that this PR does NOT close the disussion on whether there should be a checkpoint_callback arg or not.
I'm simply restricting what can be passed to the argument.

It looks like a lot of api change, but it is really more a bugfix.

@edenlightning
Copy link
Contributor

@williamFalcon @teddykoker remove ambiguity by making checkpoint_callback boolean. sg?

@awaelchli awaelchli added this to the 1.0.x milestone Oct 28, 2020
@awaelchli awaelchli changed the title [wip] deprecate passing ModelCheckpoint instance to Trainer(checkpoint_callback=...) [skip ci] deprecate passing ModelCheckpoint instance to Trainer(checkpoint_callback=...) Oct 28, 2020
@@ -144,7 +144,6 @@ def __scale_batch_reset_params(trainer, model, steps_per_trial):
trainer.weights_summary = None # not needed before full run
trainer.logger = DummyLogger()
trainer.callbacks = [] # not needed before full run
trainer.checkpoint_callback = False # required for saving
Copy link
Contributor Author

@awaelchli awaelchli Oct 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed these from Tuner because ModelCheckpoint now entirely lives in callbacks list, and this is properly backed up by Tuner already.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codecov
Copy link

codecov bot commented Oct 28, 2020

Codecov Report

Merging #4336 into master will increase coverage by 2%.
The diff coverage is 100%.

@@           Coverage Diff           @@
##           master   #4336    +/-   ##
=======================================
+ Coverage      91%     93%    +2%     
=======================================
  Files         113     111     -2     
  Lines        8323    8138   -185     
=======================================
- Hits         7588    7563    -25     
+ Misses        735     575   -160     

@Lightning-AI Lightning-AI deleted a comment from awaelchli Oct 29, 2020
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
awaelchli and others added 2 commits October 29, 2020 22:07
Co-authored-by: Jeff Yang <ydcjeff@outlook.com>
Copy link
Contributor

@s-rog s-rog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the change makes checkpoint_callback a bool, and to add args to it (monitor, top_k, dir, etc...) a custom one is to be passed into trainer with callback=[checkpoint_callback]?

If so, the warning in trainer docs: Only user defined callbacks (ie: Not ModelCheckpoint) need to be removed as well (trainer/init.py line 490)

@awaelchli
Copy link
Contributor Author

awaelchli commented Oct 30, 2020

@s-rog Yes correct. I have already updated these docs you mention on another branch, which will follow up to this. I didn't want to make the PR too big, so keeping docs updates separate.

@awaelchli awaelchli merged commit d1234c5 into master Oct 30, 2020
@awaelchli awaelchli deleted the feature/ckpt-callback-bool branch October 30, 2020 03:47
Borda pushed a commit that referenced this pull request Nov 4, 2020
…back=...) (#4336)

* first attempt

* update tests

* support multiple

* test bugfix

* changelog

* pep

* pep

* import order

* import

* improve test for resuming

* test

* update test

* add references test

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* docstring suggestion deprecation

Co-authored-by: Jeff Yang <ydcjeff@outlook.com>

* paramref

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jeff Yang <ydcjeff@outlook.com>
(cherry picked from commit d1234c5)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working callback checkpointing Related to checkpointing priority: 0 High priority task
Projects
None yet
8 participants