-
-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RECTANGULAR INFERENCE #232
Comments
Rectangular training example in the works, first batch of COCO. This is a bit complicated as we need to letterbox all images in the batch to the same size, and some of the images are being pulled simultaneously by parallel dataloader workers. So part of this process is determining apriori the batch index that each image belongs to ( |
Rectangular training results on Line 64 in 7e6e189
Rectangular training can be accessed here: Lines 146 to 148 in 7e6e189
|
Rectangular inference is now working in our latest iDetection iOS App build! This is a screenshot recorded today at 192x320, inference on vertical 4k format 16:9 aspect ratio iPhone video. This pushes the performance to realtime 30 FPS!! This means that we now have YOLOv3-SPP running in realtime on an iPhone Xs using rectangular inference! This is a worldwide first as far as we know. |
Hi @glenn-jocher, I'm trying rectangular training with I'd expect the shapes to be the nearest multiples of 32 for both image dimensions. What should be I also noticed that images look rectangular in the |
@MOHAMEDELDAKDOUKY training uses a mosaic loader, which loads 4 images at a time into a mosaic. You can disable this on this line: Line 408 in e27b124
|
Yes, I disabled it but the images are still squares of |
@MOHAMEDELDAKDOUKY your repo may be out of date. git clone a new version and try again. |
@MOHAMEDELDAKDOUKY test.py's dataloader operates rectangular inference. Use the same settings in train.py. |
Well, the issue was that I used a Thanks for your reply! |
@MOHAMEDELDAKDOUKY ah of course. The batch is padded to the minimum rectangle of the entire group of images, so one square image may cause the batch to be square. Rectangular dataloading is also always in the same order, as the images are loaded in increasing aspect ratio. |
It seems that letterbox has computed ratio and padding, and scale_coords compute them again. Will it speed up if compute one time? Line 149 in 5d73b19
|
@mozpp yes, the intention is that if |
If I use this project to convert yolov3 or yolo-spp models to onnx, does the transferred onnx support rectangular inference? |
@chouxianyu yes. iDetection on iOS runs with rectangular inference using the PyTorch > ONNX > CoreML export pipeline. |
@feixiangdekaka I don't understand your question. |
hello, |
@WZMIAOMIAO imagenet mean. |
@glenn-jocher Isn't the imagenet mean [123.68, 116.78, 103.94]? |
@WZMIAOMIAO sure, sum those numbers and divide by 3. We use this because some functions prepopulate with a scalar rather than a vector. |
sorry, i can't understand this meaning. what if i use [0, 0, 0] to fill border? |
Hello, thank you for the good writing. |
@HYUNKYEONG specify long side, short side is resolved automatically, i.e. |
Thank you for your answer, I would like to detect the image size 800*600 equally. |
@HYUNKYEONG YOLOv5 P5 models have minimum stride constraints of 32, P6 models require minimum stride of 64. |
I'm very confused about rectanglar training. In the yolov5/utils/dataloaders.py Line545-568, it fills the images in the same batch into a square shape.Why is it called rectangular training? What do the rectangular images above show? Thank you in advance. |
@autograd500 rectangular training refers to the process of letterboxing images in a batch to a common size with a minimum viable multiple of 32 for the most square image. The term "rectangular" here is used to indicate that the images in the batch may have different dimensions, resulting in a rectangular shape after letterboxing. The images shown in the example demonstrate this process, where each image is letterboxed to the same size within the batch. This approach is used to optimize training speed, especially for datasets with mixed aspect ratios like COCO. I hope this clarifies the concept for you. Let me know if you have any further questions. |
Thanks for the answer, I still have the following questions:
When maxi < 1, the images s[1]/s[0] <1 within the batch . But the final image size is [img_size * maxi, img_size], where s[0] < s[1]. So, the aspect ratio of images changed after letterboxing? |
@autograd500 yes, in rectangular training, the aspect ratio of the images within the batch can be adjusted during the process of letterboxing. The code you shared is responsible for setting the training image shapes based on the aspect ratios of the images. When So, to answer your question, yes, the aspect ratio of the images can change after letterboxing to achieve a consistent shape within the batch. Let me know if you have any further questions. |
To achieve a consistent shape within the batch, it can also set the image shape to [img_size, maxi * img_size]. In this case, the aspect ratio of the images can be consistent. I intuitively feel that such process is better, because the proportions of the images are not broken. Why not set the image shape to [img_size, maxi * img_size]?Doesn’t the aspect ratio of the images matter? |
@autograd500 thank you for your question and suggestion. The aspect ratio of the images does indeed matter in object detection tasks. When training models like YOLOv3, maintaining the original aspect ratio of the images can help preserve the proportions of objects in the scene. The current approach of setting the image shape to However, your idea of setting the image shape to Thank you for your contribution and for raising this point. It's valuable feedback that could be explored further in future enhancements. Let us know if you have any more questions or suggestions.
|
There is no questions for the time being, if there is, I will consult you again. Thank you very much for your reply! |
@autograd500 hi there, You're welcome! I'm glad I could help. If you have any more questions or need further assistance in the future, please don't hesitate to reach out. Have a great day!
|
@glenn-jocher If I want to train a model with input images of size 512x288 and I want the model's input to be fixed, similar to 640x640, what should I do? Why does --rect cause each batch to have different widths and heights? Aren't the neural network inputs supposed to be of fixed size? Thank you. |
Hi there! To train a model with a fixed input size of 512x288, you will need to modify the The If you require fixed dimensions for all inputs, simply setting python train.py --img 512 288 --batch-size 16 --data dataset.yaml --weights yolov3.pt Hope this clears up your query! Let me know if there's anything else you'd like to discuss. 🌟 |
@glenn-jocher Sorry, it doesn't work.It shows |
@glenn-jocher Excuse me, do you have any idea about this question
|
@Chenplushao hey there! It looks like you tried to specify separate width and height using If you need different dimensions and want them to be fixed, you'll have to modify the model configuration file and adjust the input dimensions directly there, as YOLO typically expects square inputs. The other option is to ensure your images are resized to be square while maintaining their aspect ratio through padding before training. If you have any other questions or need further clarification, feel free to ask. Happy coding! 😊 |
Thank you sir!Have a nice day! |
@Chenplushao You're welcome, and thank you! If you need any more help down the line, don't hesitate to reach out. Have a fantastic day! 😊 |
Rectangular inference is implemented by default in
detect.py
. This reduces inference time proportionally to the amount of letterboxed area padded onto a square image vs a 32-minimum multiple rectangular image. Onzidane.jpg
, for example, CPU inference time (on a 2018 MacBook Pro) reduces from 1.01s to 0.63s, a 37% reduction, corresponding to a 38% reduction in image area (416x416 to 256x416).Square Inference
Letterboxes to 416x416 squares.
Rectangular Inference
Letterboxes to 416 along longest image dimension, pads shorter dimension to minimum multiple of 32.
zidane.jpg
bus.jpg
The text was updated successfully, but these errors were encountered: