Suggestion to reduce Training time for YoloX-S ,YoloX-M & YoloX-L #770

ajtvsv07 · 2021-10-05T11:24:50Z

Hi All,

I am trying to train YoloX-M model with COCO dataset it takes around 2 hours 30 mins for a single epoch with 6 V100 GPUs.
ETA is shown as approx 30 days.

command used to train model :
python tools/train.py -n yolox-m -d 6 -b 48 --fp16 -o

I tried increasing it data workers & batch size. Training time is not reducing drastically. I can reduce it to 2 hours 20mins
I also tried adding --cache to the train command
python tools/train.py -n yolox-m -d 6 -b 48 --fp16 -o --cache

I do not see a significant reduction in training time.

Any suggestion to reduce it to 10 - 15 mins / epoch.

Note :
yolov5 training time for COCO dataset is 10-15 mins / epoch with 6 V100 GPUs.

Regards,
Arunjeyan TVSV

The text was updated successfully, but these errors were encountered:

ajtvsv07 · 2021-10-05T11:43:25Z

I updated data workers count + added cache to the training command.

yolox/exp/yolox_base.py
self.data_num_workers = 20 # 4

My training time has reduced to 1 hour 30 mins / epoch.

jackhu-bme · 2021-10-06T02:45:46Z

I've encountered same problems, too.I'm training custom dataset(5k medical images, much easier and smaller datset comparing to COCO) but my calculation power is much more limited.I wonder if I could increase learning rate and decrease training epochs to achieve an adequate performance.
Any other suggestions?

ajtvsv07 · 2021-10-13T10:51:47Z

@Joker316701882 can you suggest some technique to reduce our training time ?

FateScript · 2021-10-13T11:38:20Z

What's your data time and train time if your don't change num_workers? @ajtvsv07

ajtvsv07 · 2021-10-13T11:40:31Z

I am trying to reproduce the results with COCO dataset.

Train time : 2 hours 30 mins / epoch
what do you mean by data time ? @FateScript

FateScript · 2021-10-13T12:16:25Z

I am trying to reproduce the results with COCO dataset.

Train time : 2 hours 30 mins / epoch what do you mean by data time ? @FateScript

It's log info on your terminal, also some loss value is logged. Could you please check it?

ajtvsv07 · 2021-10-13T12:33:42Z

@FateScript found it.
data time : 0.063s
iter time : 4.817s

2021-10-04 17:56:09.762 | INFO | yolox.core.trainer:after_iter:238 - epoch: 1/300, iter: 10/2465, mem: 13175Mb, iter_time: 4.817s, data_time: 0.063s, total_loss: 15.4, iou_loss: 4.6, l1_loss: 0.0, conf_loss: 8.5, cls_loss: 2.3, lr: 4.937e-09, size: 640, ETA: 41 days, 5:25:29

FateScript · 2021-10-14T03:19:32Z

@FateScript found it. data time : 0.063s iter time : 4.817s

2021-10-04 17:56:09.762 | INFO | yolox.core.trainer:after_iter:238 - epoch: 1/300, iter: 10/2465, mem: 13175Mb, iter_time: 4.817s, data_time: 0.063s, total_loss: 15.4, iou_loss: 4.6, l1_loss: 0.0, conf_loss: 8.5, cls_loss: 2.3, lr: 4.937e-09, size: 640, ETA: 41 days, 5:25:29

Your iter time is too long, this might caused by limited computation power or wrong setting of environment.
normal train log could be found here. I doubt that your are using multi-gpu training and some devices might take more time.

ajtvsv07 · 2021-10-15T04:48:20Z

Thanks @FateScript @Joker316701882 for your logs.
It was limited CPU cores availability. Now the iter time has reduced to 1 sec & the training time has drastically reduced to

ajtvsv07 closed this as completed Oct 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestion to reduce Training time for YoloX-S ,YoloX-M & YoloX-L #770

Suggestion to reduce Training time for YoloX-S ,YoloX-M & YoloX-L #770

ajtvsv07 commented Oct 5, 2021 •

edited

Loading

ajtvsv07 commented Oct 5, 2021

jackhu-bme commented Oct 6, 2021

ajtvsv07 commented Oct 13, 2021

FateScript commented Oct 13, 2021

ajtvsv07 commented Oct 13, 2021 •

edited

Loading

FateScript commented Oct 13, 2021

ajtvsv07 commented Oct 13, 2021 •

edited

Loading

FateScript commented Oct 14, 2021

ajtvsv07 commented Oct 15, 2021

Suggestion to reduce Training time for YoloX-S ,YoloX-M & YoloX-L #770

Suggestion to reduce Training time for YoloX-S ,YoloX-M & YoloX-L #770

Comments

ajtvsv07 commented Oct 5, 2021 • edited Loading

ajtvsv07 commented Oct 5, 2021

jackhu-bme commented Oct 6, 2021

ajtvsv07 commented Oct 13, 2021

FateScript commented Oct 13, 2021

ajtvsv07 commented Oct 13, 2021 • edited Loading

FateScript commented Oct 13, 2021

ajtvsv07 commented Oct 13, 2021 • edited Loading

FateScript commented Oct 14, 2021

ajtvsv07 commented Oct 15, 2021

ajtvsv07 commented Oct 5, 2021 •

edited

Loading

ajtvsv07 commented Oct 13, 2021 •

edited

Loading

ajtvsv07 commented Oct 13, 2021 •

edited

Loading