Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training on non-squared images #126

Closed
100330706 opened this issue Mar 12, 2019 · 10 comments
Closed

Training on non-squared images #126

100330706 opened this issue Mar 12, 2019 · 10 comments

Comments

@100330706
Copy link

Hi! Is there any way to use this pytorch implementation of YOLOv3 to train on 3840x384 images without resizing images into squares? The width of the image is very important for our problem since the objects we are trying to detect are very small.

Regards.

@glenn-jocher
Copy link
Member

glenn-jocher commented Mar 12, 2019

We have a companion yolov3 repository that we worked on last year for the xview competition where the images were up to 5000 x 5000 pixels. For this we used a sliding window inference method rather than a letterbox inference. We haven't tried to adapt directly to rectangular training or inference though. Also keep in min that all images in a batch need to share the same exact size.

https://github.com/ultralytics/xview-yolov3

@glenn-jocher
Copy link
Member

@100330706 did you make any progress on non-square images? I was thinking rectangular inference would be very useful in our iOS app to reduce power consumption and increase FPS, as about 44% of the processing is currently wasted when running inference on letterboxed 1080p or 4k video formats:

1920 * 1080 / (1920 * 1920)
Out[2]: 0.5625

@100330706
Copy link
Author

@glenn-jocher Yes. I changed the img_size parameter to two parameters: img_size_height and img_size_width and I changed the parts of the code where this parameter was necessary (removed letterbox and modified the create_grid function). I have tested it with crops of 1920x384 pixels and since its fully conv it seems to work fine. The gain in GPU memory is massive. I can upload the code if you will, the bad thing is that I am using an old version of this repository (2-3 months old).

@glenn-jocher
Copy link
Member

Yes, this seems like a good idea, though you would still need to letterbox to the nearest 32x image size multiple no?

Also for training, the input images would need to be sorted by aspect ratio so that you could batch togethor similarly shaped images. This would reduce a lot of the randomness that comes with shuffle=True in the dataloader. I wonder what effect this would have?

@glenn-jocher
Copy link
Member

@100330706 I've applied updates for rectangular inference now, which speeds things up about 35% on zidane.jpg on CPU. I still letterbox, I simply letterbox to the next highest multiple of 32, i.e. rather than 416x416, the default zidane.jpg letterbox is now 256 x 416, about 38% less pixels!

I haven't applied this to training yet, as it needs additional code to sort the training images by shape so I can group togethor batches of similar aspect ratios. I imagine this would likewise speed up most training by around 1/3.

@100330706
Copy link
Author

Yes, this seems like a good idea, though you would still need to letterbox to the nearest 32x image size multiple no?

Yes, the image size, even though that in this case is rectangular, has to be multiple of 32. Since the dataset I am dealing with only has static-size images of 1920x400 px, rather than letterboxing, I just re-scale all of them to 1920x384 px to achieve 32x compatiblity since the distortion is minimum. But as you commented, for a more general scenario, letterboxing can be also an option.

Also for training, the input images would need to be sorted by aspect ratio so that you could batch togethor similarly shaped images. This would reduce a lot of the randomness that comes with shuffle=True in the dataloader. I wonder what effect this would have?

As I said above, batch-sorting w.r.t the aspect ratio is something that I did not need for my specific problem so I didn't tried it. However, your point seems pretty nice and for sure doing this will ease the training. I guess than the shuffling will be someway affected by this. However, it should be not a problem if the dataset is big enough.

@100330706
Copy link
Author

@100330706 I've applied updates for rectangular inference now, which speeds things up about 35% on zidane.jpg on CPU. I still letterbox, I simply letterbox to the next highest multiple of 32, i.e. rather than 416x416, the default zidane.jpg letterbox is now 256 x 416, about 38% less pixels!

I haven't applied this to training yet, as it needs additional code to sort the training images by shape so I can group togethor batches of similar aspect ratios. I imagine this would likewise speed up most training by around 1/3.

@glenn-jocher Wow I just saw this. The results are pretty impressive. Thanks for keep updating the repo!

@glenn-jocher
Copy link
Member

I've created tutorial #232 to explain this change. I'll close this issue and update the tutorial with rectangular training (and testing) as well in the future when it's completed.

The speedup is enormous BTW, well worth the effort. I feel a bit stupid now, spending all that GPU time doing convolutions of letterboxed padded areas for nothing.

@joel5638
Copy link

@glenn-jocher Yes. I changed the img_size parameter to two parameters: img_size_height and img_size_width and I changed the parts of the code where this parameter was necessary (removed letterbox and modified the create_grid function). I have tested it with crops of 1920x384 pixels and since its fully conv it seems to work fine. The gain in GPU memory is massive. I can upload the code if you will, the bad thing is that I am using an old version of this repository (2-3 months old).

@100330706 Im trying to do the same with width and height. Can you share the code where imgsize is changed to height and width? It would help me. Thank you

@glenn-jocher
Copy link
Member

@joel5638 that's great that you are also working on this! I suggest taking a good look at the yolov3/models.py code. You may find the __init__ function in various models, e.g. YOLOv3, and may consider updating the forward functions and other parts of the code such as the letterbox() function in util.py. Good luck!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants