Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not saving checkpoint when monitor is None and save_top_k is -1 #6096

Closed
ruotianluo opened this issue Feb 20, 2021 · 3 comments · Fixed by #6136
Closed

Not saving checkpoint when monitor is None and save_top_k is -1 #6096

ruotianluo opened this issue Feb 20, 2021 · 3 comments · Fixed by #6136
Assignees
Labels
bug Something isn't working checkpointing Related to checkpointing help wanted Open to be worked on priority: 0 High priority task
Milestone

Comments

@ruotianluo
Copy link
Contributor

ruotianluo commented Feb 20, 2021

🐛 Bug

When monitor is None, current will be None here
https://github.com/PyTorchLightning/pytorch-lightning/blob/6bc4490d01aed21c2d52f884d4afbeaa24a47ca0/pytorch_lightning/callbacks/model_checkpoint.py#L553

And check_monitor_top_k will return False because of that:
https://github.com/PyTorchLightning/pytorch-lightning/blob/6bc4490d01aed21c2d52f884d4afbeaa24a47ca0/pytorch_lightning/callbacks/model_checkpoint.py#L340

_update_best_and_save also doesn't take None current. raise error here: https://github.com/PyTorchLightning/pytorch-lightning/blob/6bc4490d01aed21c2d52f884d4afbeaa24a47ca0/pytorch_lightning/callbacks/model_checkpoint.py#L605

Currently, checkpointing is associated with validation, which is not necessarily always the case. I just want to save the checkpoint every k iterations.

Please reproduce using the BoringModel

Sorry, no time to reproduce for now.

To Reproduce

Use following BoringModel and post here

Expected behavior

Should save the checkpoint always if save_top_k == -1.

Environment

Note: Bugs with code are solved faster ! Colab Notebook should be made public !

You can get the script and run it with:

wget https://raw.githubusercontent.com/PyTorchLightning/pytorch-lightning/master/tests/collect_env_details.py
# For security purposes, please check the contents of collect_env_details.py before running it.
python collect_env_details.py
  • PyTorch Version (e.g., 1.0):
  • OS (e.g., Linux):
  • How you installed PyTorch (conda, pip, source):
  • Build command you used (if compiling from source):
  • Python version:
  • CUDA/cuDNN version:
  • GPU models and configuration:
  • Any other relevant information:

Additional context

@ruotianluo ruotianluo added bug Something isn't working help wanted Open to be worked on labels Feb 20, 2021
@Borda Borda added checkpointing Related to checkpointing priority: 0 High priority task labels Feb 20, 2021
@Borda Borda added this to the 1.2.x milestone Feb 20, 2021
@carmocca
Copy link
Contributor

When monitor is None, the _save_last_checkpoint function is the one to save the model (even if save_last is True), not _update_best_and_save.

I just manually checked and it seems to work properly

def test_bug(tmpdir):
    model_ckpt = ModelCheckpoint(dirpath=tmpdir, monitor=None, save_top_k=-1, save_last=False)
    model = BoringModel()
    trainer = Trainer(
        default_root_dir=tmpdir,
        logger=False,
        max_epochs=3,
        callbacks=[model_ckpt],
    )
    trainer.fit(model)

    print(os.listdir(tmpdir))
    # ['epoch=0-step=63.ckpt', 'epoch=2-step=191.ckpt', 'epoch=1-step=127.ckpt']

Can you elaborate on what is the problem?

@carmocca carmocca removed the priority: 0 High priority task label Feb 21, 2021
@ruotianluo
Copy link
Contributor Author

I see the code doing that. Interesting, let me check with my code.

@ruotianluo
Copy link
Contributor Author

ruotianluo commented Feb 21, 2021

@carmocca I c. this is because you set save_last to False. If you set save_last to True, the checkpoints will always be last.ckpt.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working checkpointing Related to checkpointing help wanted Open to be worked on priority: 0 High priority task
Projects
None yet
3 participants