-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
best_model_path
does not retrieve the path to the best monitor checkpoint file
#12485
Comments
didn't get you here, can you explain more on what's the actual issue? if there is no monitor, best_model_path should be set to nothing. |
I mean, when I have multiple |
well, yeah that's true or you can pass the checkpoint path directly. trainer.test(..., ckpt_path=checkpoint_callback2.best_model_path)
maybe we could extend it a little and if cc @carmocca wdyt? |
I agree with raising a warning in this case. |
I mean, when I have multiple ModelCheckpoints and one contains monitor, if I want to get the best monitor checkpoint by trainer.test(..., ckpt="best"), I have to put the monitor ModelCheckpoint in the first one of callback list, right?
So we wouldn't be able to filter by the instances that have an actual monitor. |
if the monitor is None, why do we need to save the anyway, my idea was to still raise a warning just to let users know that there are multiple checkpoint callbacks and best is used from the first one. |
As I said in my previous message: "so that passing ckpt_path='best' still works for them" People want to be able to pass This behavior could be changed if we have #11912 |
This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team! |
🐛 Bug
If there are more than one
ModelCheckpoint
, and the first one in callback list does NOT includemonitor
, theself.checkpoint_callback.best_model_path
will be wrong (It is not best monitor).e.g.
Related code:
https://github.com/PyTorchLightning/pytorch-lightning/blob/b2e98d61661fca80b87e1e2b49cd301d29667ce5/pytorch_lightning/trainer/trainer.py#L2342-L2353
To Reproduce
Expected behavior
Always save best monitor model checkpoint.
Environment
- GPU:
- A100-SXM4-40GB
- A100-SXM4-40GB
- A100-SXM4-40GB
- A100-SXM4-40GB
- A100-SXM4-40GB
- A100-SXM4-40GB
- A100-SXM4-40GB
- A100-SXM4-40GB
- available: True
- version: 11.3
- numpy: 1.20.1
- pyTorch_debug: False
- pyTorch_version: 1.10.2+cu113
- pytorch-lightning: 1.5.10
- tqdm: 4.63.0
- OS: Linux
- architecture:
- 64bit
-
- processor: x86_64
- python: 3.7.10
- version: Proposal for help #1 SMP Fri Mar 19 10:07:22 CST 2021
cc @carmocca @awaelchli @ninginthecloud @jjenniferdai @rohitgr7
The text was updated successfully, but these errors were encountered: