You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The training is running on 4 nodes, but it seems splitted in two parts:
Logging results to runs/train/exp147 Starting training for 400 epochs... Image sizes 640 train, 640 val Using 16 dataloader workers Logging results to runs/train/exp146
I see two training folder exp146, exp147 where the training is storing results. Why two folders? Which one I have to consider? This is the first time I have such behaviour. Thanks.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I'm running Yolov5 on 16 Gpus, over HPC machine having 4 GPus per node:
srun python train.py --epochs 400 --data /home/my_dataset/data/bc.yaml --weights "" --cfg models/hub/yolov5x6.yaml --hyp runs/evolve/exp17/hyp_evolve.yaml --cache --device 0,1,2,3 2>&1 | tee out_training
The training is running on 4 nodes, but it seems splitted in two parts:
Logging results to runs/train/exp147 Starting training for 400 epochs... Image sizes 640 train, 640 val Using 16 dataloader workers Logging results to runs/train/exp146
I see two training folder exp146, exp147 where the training is storing results. Why two folders? Which one I have to consider? This is the first time I have such behaviour. Thanks.
Beta Was this translation helpful? Give feedback.
All reactions