-
-
Notifications
You must be signed in to change notification settings - Fork 16.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Want to figure out critical algorithm of Detect layer #471
Comments
@ChristopherSTAN yes this looks correct! Typically this would be written as 2sigma() rather than sigma() x 2 though. |
@glenn-jocher Awesome! |
The original yolo/darknet box equations have a serious flaw. Width and Height are completely unbounded as they are simply out=exp(in), which is dangerous, as it can lead to runaway gradients, instabilities, NaN losses and ultimately a complete loss of training. yolov3 suffers from this problem as well as yolov4. For yolov5 I made sure to patch this error by sigmoiding all model outputs, while also ensuring that the centerpoint remained unchanged 1=fcn(0), so nominal zero outputs from the model would cause the nominal anchor size to be used. The current eqn constrains anchor multiples from a minimum of 0 to a maximum of 4, and the anchor-target matching has also been updated to be width-height multiple based, with a nominal upper threshold hyperparameter of 4.0. The original thread is ultralytics/yolov3#168 |
@ChristopherSTAN BTW, you mentioned you were experimenting with lowering You can accomplish this by modifying the exponent in the equation to 1.0, which is mathematically equivalent to removing it altogether: y[..., 2:4] = (y[..., 2:4] * 2) ** 1.0 * self.anchor_grid[i] # wh
= y[..., 2:4] * 2 * self.anchor_grid[i] # wh This change would need to occur in two places: 1) Detect() module, 2) compute_loss() box calculation: Lines 472 to 475 in 1e95337
|
@glenn-jocher I am afraid I have not considered so much LOL. Maybe you are talking about another DL pro. (Apparently I am not, for now.) But I will try! |
@glenn-jocher I follow your idea, and set hyp['anchor_t'] = 3.0, will it work? |
@ChristopherSTAN don't worry, the idea is pretty simple. A neuron can control outputs in a certain range defined by the above equations, default being 0-4. If you reduce the hyperparameter that controls the matching threshold to 2.0, then boxes are only matched to anchors that are less then 2x the anchor size and greater than 1/2x the anchor size. So if an anchor size is 10 pixels, then that neuron can match labels between 5-20 pixels size, but it can output a box shape from 0-40 pixels size. So it is wasting 5/8 of it's output span. It has to fit all of it's output between 5-20, which by definition gives it less fine control for tiny corrections, which will reduce mAP. So for best results, you want the neuron to have output authority over the entire training space you want it to predict. Even with the default settings, I see I am wasting a bit of training space. With default settings, the 10 pixel anchor neuron can output sizes between 2.5 - 40, so I am currently wasting 6% of the output space. |
Yes, any value will work here, you just need to experiment with what produces the best mAP. If you lower these values though then it would also make sense to adjust the wh equations. For a 3.0 limit you might adjust the equation to this to fully capture the output space: |
@glenn-jocher For now I am thinking whether I can adjust it to perform well on my datasets, where there are lots of overlapping and medium objects: Can I consider decreasing this parameter (2, 1.73...)is also limiting the size of outputting bounding boxes? |
You should look at your labels.png to see your size distribution. Yes, changing exponent in the box equations from 2.0 to 1.6 will limit your output space from 0-4 to 0-3. This would presumably paired with an increase in anchor count, otherwise recall would suffer. |
@glenn-jocher Here: |
@ChristopherSTAN yes these look pretty typical. You have some very large class imbalances as well. Or wait, it looks like your bar chart is plotted incorrectly, as there are 15 bins but it only goes up to 13. Looks like a plotting bug. TODO: Fix labels.png bar chart. |
Signed-off-by: Glenn Jocher <glenn.jocher@ultralytics.com>
Pushed a commit 4ffd977 for improved plotting. No bug found in current plotting. |
Hi, dear Glenn, I think it is a good time for your team to formularize, paperize your work, and SHOCK the world. It is really interesting to read your code. |
@ChristopherSTAN haha, yes we do need to produce a publication, but we are still exploring design changes. Hopefully around the end of year we can send something to arxiv. |
@ChristopherSTAN I have an idea, you could try modifying the L24 activation function in the Conv() layer from LeayReLU(0.1) to Swish() or Mish() to see if this helps wheat training. I've never tried this, but it may be possible to still start from pretraind weights when you do this: Lines 18 to 31 in 5e970d4
EDIT: You'll have to reduce your batch size as these will consume much greater GPU RAM when training. |
@glenn-jocher Interesting! Edit: I plan to upload some scripts. I am not sure how to name this operation. Maybe we can name it "Enriching data" or something else. BTW, I am using EfficientDet on Wheat compete. |
@glenn-jocher That's my way: |
@glenn-jocher Now I understand your feeling when training on COCO. |
@ChristopherSTAN intersecting classes, that's a good term. Yes this would be very useful. OpenImages V5/6 have a lot of intersecting classes with coco. Yes, COCO can be very slow to train on unfortunately. |
I would point out that this is not something you want to do on the long run, depending on the actual images of your own dataset. The COCO dataset may help the model to generalise on the objects, but usually the test dataset and the real world on which you are going to use your trained model are going to have its specifics around:
For the problems I am solving, I have also used the COCO dataset for the specific classes I am training. However, I am also decreasing the COCO images in my dataset once I have a new batch of real images annotated. And, obviously, one thing you need to make sure is not happening is having any COCO images in your val/test set if these are not in accordance to your actual real scenarios. This can screw up your model evaluation pretty bad. |
@dlawrences Thanks for your suggestions! It is my first time to add COCO images into my train set. And I have similar thought of test set to yours, I do not add extra images in to val set. Because I still want test set and dev set have same distribution. Thanks again! |
@glenn-jocher I plan to try what this pro said: ultralytics/yolov3#1098 (comment) Try Leaky ReLU first and then Mish. |
@glenn-jocher Bravo! I first train yolov5x on mixed dataset (30K of COCO + 3K of a small dataset) for nearly 50 epochs. Then train 150 epochs in only 3K dataset. It gives me 0.67 -> 0.71 mAP in test set! |
@ChristopherSTAN oh, that's a big jump! What was the increase due to? The COCO pretraining? Mish/Swish was also mentioned above, or perhaps you used your intersecting classes idea? |
I don't see much improvement with Mish/Swish. Then I though a little bit: because the data are so important to deep learning models, the external data have improved the modeling ability of the mode. when train in the custom dataset (origin), we can see a great improvement. Especially, this the single model on fold 0, but outperform my ensemble models on 5 folds. With this observation, I will keep the pretrain model and resume it with k-fold, then ensemble it. So, at last, thanks a lot to your great repo and hard working on COCO dataset. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
@ChristopherSTAN @glenn-jocher Can Anyone explain why setting the target confidence = 1.0 hurting the accuracy ? And why the equation below is giving better accuracy ? tobj[b, a, gj, gi] = (1.0 - model.gr) + model.gr * giou.detach().clamp(0).type(tobj.dtype) # giou ratio |
@abhiagwl4262 you may want to experiment both ways. The current implementation sets object confidence to observed iou. |
@glenn-jocher I actually experimented and found that setting the target_confidence to 1.0 is giving me significant drop in accuracy. Do you have any intuition behind this observation? |
@abhiagwl4262 the intention with the current implementation is to assist NMS in reducing lower quality boxes. |
@glenn-jocher The predicted box can be (0-4.0)times of the anchor. You basically having a upper bound of 4.0 and lower bound of 1/4.0 or the anchor-GT ratio. Why you are you applying a lower bound? What is the significance of that? |
@abhiagwl4262 the matching algorithm is attempting to match targets with suitable anchors. The matches should be neither too large, nor too small, so we use upper and lower bounds on the ratio to achieve this. Without the lower bounds, all anchors would match with small objects (we only want the small anchors to match with small objects). |
@glenn-jocher Can you give a little idea of how you chose the values for yolo layer loss balacing as [4.0 - small object layer, 1.0 - medium object layer, 0.4- for large object layer] ? |
empirical results |
Thanks for the great repo!
Thanks! |
@violet17 the equation and intersection points were chosen for stability and for its suitability in replacing the unstable yolov3/v4 wh method. |
@glenn-jocher But what is the purpose of offsetting by -0.5? If you are expanding the output space from 0-1 to 0-2 and you offset by -0.5 the mid-point will be equal to 0.5 and not 0. So given x = 0 you get 0.5 output. Wouldn't it be more logical to offset by -1 to get 0 when given 0? With the current formulation, if the network predicts t_x = 0 and the cell offset is 0.5, then the output will be 1, while intuitively it seems like it should perhaps be 0.5, the 0 value of the cell. Perhaps I misunderstand? |
@joelcma you want a reference input to create a reference output for stability and ease of training. The average input (due to batchnorm) will be zero, and the average object will be in the middle (i.e. at 0.5) of a grid cell. We expand the output space to allow for predictions near 0 and 1 without stressing the sigmoid inputs to extremes. |
@glenn-jocher Thank you for taking the time to answer! And sorry because I have another question :D So what is the benefit of using a sigmoid over a bounded relu in this case? |
@joelcma the benefit of any model architecture selection would be driven by empirical results, i.e. 'it works better'. |
Signed-off-by: Glenn Jocher <glenn.jocher@ultralytics.com>
I have tried it for export !python export.py --weights /content/drive/MyDrive/best.pt --include "coreml" Could any one know how can I convert correctly and get the bounding box, score, and other things? |
@dhiman10 after running |
❔Question
Hi,
I want to figure out the intuition of bbox detection.
In yolov3, we can find that the output can be write by these:
So, in yolov5,
I look into the src code:
yolov5/models/yolo.py
Lines 21 to 38 in 1e95337
And try to formularize it:
Am I right?
The text was updated successfully, but these errors were encountered: