-
-
Notifications
You must be signed in to change notification settings - Fork 16.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
W&B ID reset on training completion #1852
Conversation
Fix the bug of always the same W&B ID and continue overwrite with the old logging. BUG report ultralytics#1851
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👋 Hello @TommyZihao, thank you for submitting a 🚀 PR! To allow your work to be integrated as seamlessly as possible, we advise you to:
- ✅ Verify your PR is up-to-date with origin/master. If your PR is behind origin/master update by running the following, replacing 'feature' with the name of your local branch:
git remote add upstream https://github.com/ultralytics/yolov5.git
git fetch upstream
git checkout feature # <----- replace 'feature' with local branch name
git rebase upstream/master
git push -u origin -f
- ✅ Verify all Continuous Integration (CI) checks are passing.
- ✅ Reduce changes to the absolute minimum required for your bug fix or feature addition. "It is not daily increase but daily decrease, hack away the unessential. The closer to the source, the less wastage there is." -Bruce Lee
Fix the bug of always the same W&B ID and continue overwrite with the old logging. New code have been tested on my server. |
@AyushExel could you take a look at this W&B ID update PR please? Thanks! |
@TommyZihao Thanks for bringing this up. About the error that you're seeing, does that happen only when resuming the old runs? If that's the case, then it's the intended use case. The logger is designed in a way that it'll append the visualizations to the old run if resumed. |
It indeed happens on running new training runs. I tried all kinds of methods to fix this bug. Including change wandb account, change server, change dataset, change client computer, even re-download the whole repository and weights. But it happens every time when I run train.py. ID is always the same and wandb visualization only have one curve, one color. New curve will append old curve, instead of creating another curve with another color. |
I think we can add another aug --resume, let the user choose whether get a new W&B ID to make a new visualization with a new color, or continue with the old mission and resume training. |
My PR works fine in my situation. |
Your channel looks great. Let us know when the video is out.
We already have a --resume argument that continues the old run. Does this solution work with that? If it does we can merge this. |
@glenn-jocher This is very strange. This problem doesn't occur in any other case. I tested this manually by initializing multiple runs in a colab and all of them had unique IDs. Can you think of any recent changes that you made regarding the logging feature that might have caused this? |
@glenn-jocher I found the cause for this error. This happens because the yolov5s.pt model that gets downloaded before training has
So every time transfer learning is done on yolov5s.pt model, it'll detect the same id. This problem doesn't occur when training from scratch. |
Exactly, thank you. |
Yes, your PR works for training but it doesn't take into account the --resume functionality. Currently, if you resume a run, the metrics and visualizations will be logged in the same run which is being resumed, but your PR will generate a new ID in every case, even when resuming a previous run. If you can make some changes to incorporate the resume feature, that'd be great |
fix the bug of ultralytics#1851 If we had trained on yolov5s.pt, the program will generate a new unique W&B ID. If we hadn't, the program will keep the old code, we can still use --resume aug.
Fix the bug of duplicate W&B ID |
@TommyZihao this solution will work for this particular model only(yolov5s) and not for others. And any update in those models will cause the code to break as id might be different. I was thinking a logic that sets
I have checked and this works for all cases and there's no need for an extra @glenn-jocher what do you think about this solution? |
Oh! This is probably due to the recent v4.0 update, which includes new models which may be the first official models logged in W&B for the first time. I wonder if this is also occurring in ultralytics/yolov3 then. The proper fix then would be to strip the WandDB ID after training fully completes. I can add this here, where the optimizers are similarly stripped from the fully trained checkpoints. Lines 397 to 398 in 69be8e7
|
@TommyZihao @AyushExel ok I think this is all set. The problem was that the new models in the v4.0 release yesterday contained wandb_id's from their training. I've updated this PR to leave train.py alone, but to now strip wandb_id's from fully trained models (as a fully trained model is not meant to be --resumed, but can be used as a pretrained model to transfer learn or train another model, in which case a new W&B should be generated). Not included in this PR I will need to manually strip the W&B ID's from the 4 pretrained models hosted in https://github.com/ultralytics/yolov5/releases/tag/v4.0. I will do this shortly and then the problem will be solved for all future users. @TommyZihao to fix your specific model you simply need to set the wandb_id to none, or you can delete your local models, and then let a fixed model autodownload. |
@TommyZihao there's actually a built in function to reset problematic official models. If you git pull following this PR merge, then the following command will do this for all four models.
|
Models have been updated now, so autodownloaded models should now show @TommyZihao @AyushExel thank you for spotting this issue and for your contributions! Let us know if you spot any other issues. |
* Update train.py Fix the bug of always the same W&B ID and continue overwrite with the old logging. BUG report ultralytics#1851 * Fix the bug of duplicate W&B ID fix the bug of ultralytics#1851 If we had trained on yolov5s.pt, the program will generate a new unique W&B ID. If we hadn't, the program will keep the old code, we can still use --resume aug. * Update general.py * revert train.py changes Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
* Update train.py Fix the bug of always the same W&B ID and continue overwrite with the old logging. BUG report ultralytics#1851 * Fix the bug of duplicate W&B ID fix the bug of ultralytics#1851 If we had trained on yolov5s.pt, the program will generate a new unique W&B ID. If we hadn't, the program will keep the old code, we can still use --resume aug. * Update general.py * revert train.py changes Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
* Update train.py Fix the bug of always the same W&B ID and continue overwrite with the old logging. BUG report ultralytics#1851 * Fix the bug of duplicate W&B ID fix the bug of ultralytics#1851 If we had trained on yolov5s.pt, the program will generate a new unique W&B ID. If we hadn't, the program will keep the old code, we can still use --resume aug. * Update general.py * revert train.py changes Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
Fix the bug of always the same W&B ID and continue overwrite with the old logging.
BUG report
#1851
New code have been tested on my server.
🛠️ PR Summary
Made with ❤️ by Ultralytics Actions
🌟 Summary
Improved optimizer stripping function for finalizing model training.
📊 Key Changes
strip_optimizer()
function to remove additional data from the model file.🎯 Purpose & Impact