Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WandbLogger does not mark uploaded model as 'artifact' #4903

Closed
kyoungrok0517 opened this issue Nov 30, 2020 · 10 comments · Fixed by #6231
Closed

WandbLogger does not mark uploaded model as 'artifact' #4903

kyoungrok0517 opened this issue Nov 30, 2020 · 10 comments · Fixed by #6231
Labels
3rd party Related to a 3rd-party bug Something isn't working help wanted Open to be worked on logger Related to the Loggers priority: 1 Medium priority task won't fix This will not be worked on

Comments

@kyoungrok0517
Copy link

kyoungrok0517 commented Nov 30, 2020

🐛 Bug

I'm using WandbLogger with the latest pytorch-lightning==1.0.8. It seems like trained checkpoint is treated as mere file not a model artifact, even I turned on log_model=True. It's much convenient to use model artifact from other script so I hope that is done by pytorch-lightning automatically.

Environment

* CUDA:
        - GPU:
                - GeForce RTX 3090
        - available:         True
        - version:           11.0
* Packages:
        - numpy:             1.19.2
        - pyTorch_debug:     True
        - pyTorch_version:   1.7.0
        - pytorch-lightning: 1.0.8
        - tensorboard:       2.3.0
        - tqdm:              4.51.0
* System:
        - OS:                Linux
        - architecture:
                - 64bit
                - ELF
        - processor:         x86_64
        - python:            3.8.5
        - version:           #60-Ubuntu SMP Fri Nov 6 10:37:59 UTC 2020
@kyoungrok0517 kyoungrok0517 added bug Something isn't working help wanted Open to be worked on labels Nov 30, 2020
@Borda Borda added 3rd party Related to a 3rd-party logger Related to the Loggers labels Nov 30, 2020
@Borda
Copy link
Member

Borda commented Nov 30, 2020

@borisdayma mind have look?

@borisdayma
Copy link
Contributor

This is a great idea!
Right now, wandb uploads the saved model folder at the end of the run, which may include one or several models (when saving checkpoints).

We could upload artifacts at the end of the run or as the model trains.
I think it may make more sense to upload at the end of training to avoid uploading unnecessarily too many times the model, for example if we are only interested in the final version or only the top 5 best models.

Finally, I think each artifact name should be related to the run id so if we want to supersede a specific artifact, we will need to use the same run id (knowing that W&B still let you use any number of alias latest, best, etc on each artifact).

Let me know if you have any comments before I try to implement it.

@kyoungrok0517
Copy link
Author

This is a great idea!
Right now, wandb uploads the saved model folder at the end of the run, which may include one or several models (when saving checkpoints).

We could upload artifacts at the end of the run or as the model trains.
I think it may make more sense to upload at the end of training to avoid uploading unnecessarily too many times the model, for example if we are only interested in the final version or only the top 5 best models.

Finally, I think each artifact name should be related to the run id so if we want to supersede a specific artifact, we will need to use the same run id (knowing that W&B still let you use any number of alias latest, best, etc on each artifact).

Let me know if you have any comments before I try to implement it.

Thanks for the response. I think following the checkpoint settings (e.g. preserve best top-k) would be nice, as you've mentioned in your comment. Then we can try the models in the other script using wandb API (e.g. automatic download).

@stale
Copy link

stale bot commented Jan 8, 2021

This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!

@stale stale bot added the won't fix This will not be worked on label Jan 8, 2021
@borisdayma
Copy link
Contributor

Keep issue active

@stale stale bot removed the won't fix This will not be worked on label Jan 8, 2021
@tchaton
Copy link
Contributor

tchaton commented Jan 18, 2021

Hey @borisdayma,

Did you start to work on it ?

Best,
T.C

@borisdayma
Copy link
Contributor

I didn't have the time yet but it's definitely on my TODO list.
I have a PR ongoing and I prefer to wait for it to be merged before.

@edenlightning edenlightning added the priority: 1 Medium priority task label Feb 9, 2021
@borisdayma
Copy link
Contributor

borisdayma commented Feb 26, 2021

I am now working on it and I just realized that #5537 introduced a change in behavior at this line.

Models were supposed to be saved in the W&B run folder only when log_model=True (since those files may be automatically uploaded at the end of a run).

With artifacts, we can let PL save the files directly where it wants and upload models as artifacts separately.

@stale
Copy link

stale bot commented Mar 28, 2021

This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!

@stale stale bot added the won't fix This will not be worked on label Mar 28, 2021
@stale stale bot closed this as completed Apr 5, 2021
@borisdayma
Copy link
Contributor

PR has now been merged so this issue should be closed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3rd party Related to a 3rd-party bug Something isn't working help wanted Open to be worked on logger Related to the Loggers priority: 1 Medium priority task won't fix This will not be worked on
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants