Training on non-squared images #126

100330706 · 2019-03-12T10:27:00Z

Hi! Is there any way to use this pytorch implementation of YOLOv3 to train on 3840x384 images without resizing images into squares? The width of the image is very important for our problem since the objects we are trying to detect are very small.

Regards.

glenn-jocher · 2019-03-12T12:58:01Z

We have a companion yolov3 repository that we worked on last year for the xview competition where the images were up to 5000 x 5000 pixels. For this we used a sliding window inference method rather than a letterbox inference. We haven't tried to adapt directly to rectangular training or inference though. Also keep in min that all images in a batch need to share the same exact size.

https://github.com/ultralytics/xview-yolov3

glenn-jocher · 2019-04-18T23:09:19Z

@100330706 did you make any progress on non-square images? I was thinking rectangular inference would be very useful in our iOS app to reduce power consumption and increase FPS, as about 44% of the processing is currently wasted when running inference on letterboxed 1080p or 4k video formats:

1920 * 1080 / (1920 * 1920)
Out[2]: 0.5625

100330706 · 2019-04-20T11:21:53Z

@glenn-jocher Yes. I changed the img_size parameter to two parameters: img_size_height and img_size_width and I changed the parts of the code where this parameter was necessary (removed letterbox and modified the create_grid function). I have tested it with crops of 1920x384 pixels and since its fully conv it seems to work fine. The gain in GPU memory is massive. I can upload the code if you will, the bad thing is that I am using an old version of this repository (2-3 months old).

glenn-jocher · 2019-04-21T12:02:43Z

Yes, this seems like a good idea, though you would still need to letterbox to the nearest 32x image size multiple no?

Also for training, the input images would need to be sorted by aspect ratio so that you could batch togethor similarly shaped images. This would reduce a lot of the randomness that comes with shuffle=True in the dataloader. I wonder what effect this would have?

glenn-jocher · 2019-04-21T21:48:22Z

@100330706 I've applied updates for rectangular inference now, which speeds things up about 35% on zidane.jpg on CPU. I still letterbox, I simply letterbox to the next highest multiple of 32, i.e. rather than 416x416, the default zidane.jpg letterbox is now 256 x 416, about 38% less pixels!

I haven't applied this to training yet, as it needs additional code to sort the training images by shape so I can group togethor batches of similar aspect ratios. I imagine this would likewise speed up most training by around 1/3.

100330706 · 2019-04-21T21:51:58Z

Yes, this seems like a good idea, though you would still need to letterbox to the nearest 32x image size multiple no?

Yes, the image size, even though that in this case is rectangular, has to be multiple of 32. Since the dataset I am dealing with only has static-size images of 1920x400 px, rather than letterboxing, I just re-scale all of them to 1920x384 px to achieve 32x compatiblity since the distortion is minimum. But as you commented, for a more general scenario, letterboxing can be also an option.

Also for training, the input images would need to be sorted by aspect ratio so that you could batch togethor similarly shaped images. This would reduce a lot of the randomness that comes with shuffle=True in the dataloader. I wonder what effect this would have?

As I said above, batch-sorting w.r.t the aspect ratio is something that I did not need for my specific problem so I didn't tried it. However, your point seems pretty nice and for sure doing this will ease the training. I guess than the shuffling will be someway affected by this. However, it should be not a problem if the dataset is big enough.

100330706 · 2019-04-21T21:53:18Z

@100330706 I've applied updates for rectangular inference now, which speeds things up about 35% on zidane.jpg on CPU. I still letterbox, I simply letterbox to the next highest multiple of 32, i.e. rather than 416x416, the default zidane.jpg letterbox is now 256 x 416, about 38% less pixels!

I haven't applied this to training yet, as it needs additional code to sort the training images by shape so I can group togethor batches of similar aspect ratios. I imagine this would likewise speed up most training by around 1/3.

@glenn-jocher Wow I just saw this. The results are pretty impressive. Thanks for keep updating the repo!

glenn-jocher · 2019-04-22T14:01:04Z

I've created tutorial #232 to explain this change. I'll close this issue and update the tutorial with rectangular training (and testing) as well in the future when it's completed.

The speedup is enormous BTW, well worth the effort. I feel a bit stupid now, spending all that GPU time doing convolutions of letterboxed padded areas for nothing.

joel5638 · 2020-08-12T05:00:47Z

@glenn-jocher Yes. I changed the img_size parameter to two parameters: img_size_height and img_size_width and I changed the parts of the code where this parameter was necessary (removed letterbox and modified the create_grid function). I have tested it with crops of 1920x384 pixels and since its fully conv it seems to work fine. The gain in GPU memory is massive. I can upload the code if you will, the bad thing is that I am using an old version of this repository (2-3 months old).

@100330706 Im trying to do the same with width and height. Can you share the code where imgsize is changed to height and width? It would help me. Thank you

glenn-jocher · 2023-11-14T19:10:32Z

@joel5638 that's great that you are also working on this! I suggest taking a good look at the yolov3/models.py code. You may find the __init__ function in various models, e.g. YOLOv3, and may consider updating the forward functions and other parts of the code such as the letterbox() function in util.py. Good luck!

glenn-jocher mentioned this issue Apr 18, 2019

Ideas to Reduce iOS App Power Consumption #224

Closed

glenn-jocher closed this as completed Apr 22, 2019

Iffa-Intel mentioned this issue Dec 22, 2021

Opencv4.5.4 Inference result error openvinotoolkit/openvino#9347

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training on non-squared images #126

Training on non-squared images #126

100330706 commented Mar 12, 2019

glenn-jocher commented Mar 12, 2019 •

edited

Loading

glenn-jocher commented Apr 18, 2019

100330706 commented Apr 20, 2019

glenn-jocher commented Apr 21, 2019

glenn-jocher commented Apr 21, 2019

100330706 commented Apr 21, 2019

100330706 commented Apr 21, 2019

glenn-jocher commented Apr 22, 2019

joel5638 commented Aug 12, 2020

glenn-jocher commented Nov 14, 2023

Training on non-squared images #126

Training on non-squared images #126

Comments

100330706 commented Mar 12, 2019

glenn-jocher commented Mar 12, 2019 • edited Loading

glenn-jocher commented Apr 18, 2019

100330706 commented Apr 20, 2019

glenn-jocher commented Apr 21, 2019

glenn-jocher commented Apr 21, 2019

100330706 commented Apr 21, 2019

100330706 commented Apr 21, 2019

glenn-jocher commented Apr 22, 2019

joel5638 commented Aug 12, 2020

glenn-jocher commented Nov 14, 2023

glenn-jocher commented Mar 12, 2019 •

edited

Loading