-
-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training on non-squared images #126
Comments
We have a companion yolov3 repository that we worked on last year for the xview competition where the images were up to 5000 x 5000 pixels. For this we used a sliding window inference method rather than a letterbox inference. We haven't tried to adapt directly to rectangular training or inference though. Also keep in min that all images in a batch need to share the same exact size. |
@100330706 did you make any progress on non-square images? I was thinking rectangular inference would be very useful in our iOS app to reduce power consumption and increase FPS, as about 44% of the processing is currently wasted when running inference on letterboxed 1080p or 4k video formats: 1920 * 1080 / (1920 * 1920)
Out[2]: 0.5625 |
@glenn-jocher Yes. I changed the img_size parameter to two parameters: img_size_height and img_size_width and I changed the parts of the code where this parameter was necessary (removed letterbox and modified the create_grid function). I have tested it with crops of 1920x384 pixels and since its fully conv it seems to work fine. The gain in GPU memory is massive. I can upload the code if you will, the bad thing is that I am using an old version of this repository (2-3 months old). |
Yes, this seems like a good idea, though you would still need to letterbox to the nearest 32x image size multiple no? Also for training, the input images would need to be sorted by aspect ratio so that you could batch togethor similarly shaped images. This would reduce a lot of the randomness that comes with |
@100330706 I've applied updates for rectangular inference now, which speeds things up about 35% on zidane.jpg on CPU. I still letterbox, I simply letterbox to the next highest multiple of 32, i.e. rather than 416x416, the default zidane.jpg letterbox is now 256 x 416, about 38% less pixels! I haven't applied this to training yet, as it needs additional code to sort the training images by shape so I can group togethor batches of similar aspect ratios. I imagine this would likewise speed up most training by around 1/3. |
Yes, the image size, even though that in this case is rectangular, has to be multiple of 32. Since the dataset I am dealing with only has static-size images of 1920x400 px, rather than letterboxing, I just re-scale all of them to 1920x384 px to achieve 32x compatiblity since the distortion is minimum. But as you commented, for a more general scenario, letterboxing can be also an option.
As I said above, batch-sorting w.r.t the aspect ratio is something that I did not need for my specific problem so I didn't tried it. However, your point seems pretty nice and for sure doing this will ease the training. I guess than the shuffling will be someway affected by this. However, it should be not a problem if the dataset is big enough. |
@glenn-jocher Wow I just saw this. The results are pretty impressive. Thanks for keep updating the repo! |
I've created tutorial #232 to explain this change. I'll close this issue and update the tutorial with rectangular training (and testing) as well in the future when it's completed. The speedup is enormous BTW, well worth the effort. I feel a bit stupid now, spending all that GPU time doing convolutions of letterboxed padded areas for nothing. |
@100330706 Im trying to do the same with width and height. Can you share the code where imgsize is changed to height and width? It would help me. Thank you |
@joel5638 that's great that you are also working on this! I suggest taking a good look at the |
Hi! Is there any way to use this pytorch implementation of YOLOv3 to train on 3840x384 images without resizing images into squares? The width of the image is very important for our problem since the objects we are trying to detect are very small.
Regards.
The text was updated successfully, but these errors were encountered: