You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In addition, I also found that the time consumed by each forward is unstable, sometimes it takes 0.6X, sometimes it takes 1.X, so I want to ask which step in the training process may cause this cost instability.
For exampe, it takes 0.6XX for once forward in epoch 83.
Hi @LuletterSoul, I've noticed! However, I can not spare enough time now to optimize the training frameworks or pipelines to resolve those problems. I'd love to add it to the TODO list and optimize those problems later, but not too long.
Hi @LuletterSoul, I've noticed! However, I can not spare enough time now to optimize the training frameworks or pipelines to resolve those problems. I'd love to add it to the TODO list and optimize those problems later, but not too long.
@wondervictor Haha, it's OK. I really like this project and hope it gets better from here. Maybe the mmyolo training framework is not well optimized. As a result, the training process is slow. I also spent time to locate the above two issues, but mmyolo is really too complicated for me.
@LuletterSoul, Super thanks!! I'll move on to this issue after I fix the fine-tuning bugs. We do have plans to get rid of mmyolo. If you have any new findings or questions related to this issue, I would greatly appreciate it if you could update them under this issue, as it will provide me with valuable guidance.
I am repoducing yolol-worldv2 using 8 GPUs, 1 node. The 100% occupation seems to result in slower training times overall.
In addition, I also found that the time consumed by each forward is unstable, sometimes it takes 0.6X, sometimes it takes 1.X, so I want to ask which step in the training process may cause this cost instability.
For exampe, it takes
0.6XX
for once forward in epoch 83.However, it takes
1.XXX
for once forward in epoch 62, almost double time than before.Is there any way to shorten or stabilize the single forward time?
The text was updated successfully, but these errors were encountered: