Want to figure out critical algorithm of Detect layer #471

TaoXieSZ · 2020-07-21T15:33:31Z

❔Question

Hi,
I want to figure out the intuition of bbox detection.
In yolov3, we can find that the output can be write by these:

So, in yolov5,
I look into the src code:

Lines 21 to 38 in 1e95337

    
           def forward(self, x): 
        
               # x = x.copy()  # for profiling 
        
               z = []  # inference output 
        
               self.training |= self.export 
        
               for i in range(self.nl): 
        
                   bs, _, ny, nx = x[i].shape  # x(bs,255,20,20) to x(bs,3,20,20,85) 
        
                   x[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous() 
        
                   if not self.training:  # inference 
        
                       if self.grid[i].shape[2:4] != x[i].shape[2:4]: 
        
                           self.grid[i] = self._make_grid(nx, ny).to(x[i].device) 
        
                       y = x[i].sigmoid() 
        
                       y[..., 0:2] = (y[..., 0:2] * 2. - 0.5 + self.grid[i].to(x[i].device)) * self.stride[i]  # xy 
        
                       y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i]  # wh 
        
                       z.append(y.view(bs, -1, self.no)) 
        
               return x if self.training else (torch.cat(z, 1), x)

And try to formularize it:

Am I right?

glenn-jocher · 2020-07-21T17:33:36Z

@ChristopherSTAN yes this looks correct! Typically this would be written as 2sigma() rather than sigma() x 2 though.

TaoXieSZ · 2020-07-21T17:37:23Z

@glenn-jocher Awesome!
How you find out out this way to get the prediction. It is so brilliant.

glenn-jocher · 2020-07-21T17:46:40Z

The original yolo/darknet box equations have a serious flaw. Width and Height are completely unbounded as they are simply out=exp(in), which is dangerous, as it can lead to runaway gradients, instabilities, NaN losses and ultimately a complete loss of training. yolov3 suffers from this problem as well as yolov4.

For yolov5 I made sure to patch this error by sigmoiding all model outputs, while also ensuring that the centerpoint remained unchanged 1=fcn(0), so nominal zero outputs from the model would cause the nominal anchor size to be used. The current eqn constrains anchor multiples from a minimum of 0 to a maximum of 4, and the anchor-target matching has also been updated to be width-height multiple based, with a nominal upper threshold hyperparameter of 4.0.

The original thread is ultralytics/yolov3#168

glenn-jocher · 2020-07-21T18:00:28Z

@ChristopherSTAN BTW, you mentioned you were experimenting with lowering hyp['anchor_t']: 4.0, # anchor-multiple threshold paired with an increase in anchor count. This is an interesting approach, but I just realized it would make sense to take this a step further and modify the actual wh function as well to reduce the range from 0-4 to 0-2, otherwise half of your output space is unused, which is a bad design decision, as your neuron outputs may lose up to half of their precision capability.

You can accomplish this by modifying the exponent in the equation to 1.0, which is mathematically equivalent to removing it altogether:

y[..., 2:4] = (y[..., 2:4] * 2) ** 1.0 * self.anchor_grid[i]  # wh
            = y[..., 2:4] * 2 * self.anchor_grid[i]  # wh

This change would need to occur in two places: 1) Detect() module, 2) compute_loss() box calculation:

yolov5/utils/utils.py

Lines 472 to 475 in 1e95337

    
           # GIoU 
        
           pxy = ps[:, :2].sigmoid() * 2. - 0.5 
        
           pwh = (ps[:, 2:4].sigmoid() * 2) ** 2 * anchors[i] 
        
           pbox = torch.cat((pxy, pwh), 1).to(device)  # predicted box

TaoXieSZ · 2020-07-21T18:21:04Z

@glenn-jocher I am afraid I have not considered so much LOL. Maybe you are talking about another DL pro.

(Apparently I am not, for now.)

But I will try!
Thanks for your explanation.

TaoXieSZ · 2020-07-21T18:58:32Z

@glenn-jocher I follow your idea, and set hyp['anchor_t'] = 3.0, will it work?

glenn-jocher · 2020-07-21T18:58:36Z

@ChristopherSTAN don't worry, the idea is pretty simple. A neuron can control outputs in a certain range defined by the above equations, default being 0-4. If you reduce the hyperparameter that controls the matching threshold to 2.0, then boxes are only matched to anchors that are less then 2x the anchor size and greater than 1/2x the anchor size. So if an anchor size is 10 pixels, then that neuron can match labels between 5-20 pixels size, but it can output a box shape from 0-40 pixels size. So it is wasting 5/8 of it's output span. It has to fit all of it's output between 5-20, which by definition gives it less fine control for tiny corrections, which will reduce mAP.

So for best results, you want the neuron to have output authority over the entire training space you want it to predict. Even with the default settings, I see I am wasting a bit of training space. With default settings, the 10 pixel anchor neuron can output sizes between 2.5 - 40, so I am currently wasting 6% of the output space.

glenn-jocher · 2020-07-21T19:08:47Z

@glenn-jocher I follow your idea, and set hyp['anchor_t'] = 3.0, will it work?

Yes, any value will work here, you just need to experiment with what produces the best mAP. If you lower these values though then it would also make sense to adjust the wh equations. For a 3.0 limit you might adjust the equation to this to fully capture the output space:
y[..., 2:4] = (y[..., 2:4] * 2) ** 1.6 * self.anchor_grid[i] # wh

TaoXieSZ · 2020-07-21T19:15:19Z

@glenn-jocher For now I am thinking whether I can adjust it to perform well on my datasets, where there are lots of overlapping and medium objects:

Can I consider decreasing this parameter (2, 1.73...)is also limiting the size of outputting bounding boxes?

glenn-jocher · 2020-07-21T19:18:31Z

You should look at your labels.png to see your size distribution. Yes, changing exponent in the box equations from 2.0 to 1.6 will limit your output space from 0-4 to 0-3. This would presumably paired with an increase in anchor count, otherwise recall would suffer.

TaoXieSZ · 2020-07-21T19:38:38Z

@glenn-jocher Here:

TaoXieSZ · 2020-07-21T19:41:37Z

And here's another dataset:

glenn-jocher · 2020-07-21T20:31:02Z

@ChristopherSTAN yes these look pretty typical. You have some very large class imbalances as well. Or wait, it looks like your bar chart is plotted incorrectly, as there are 15 bins but it only goes up to 13. Looks like a plotting bug.

TODO: Fix labels.png bar chart.

Signed-off-by: Glenn Jocher <glenn.jocher@ultralytics.com>

glenn-jocher · 2020-07-21T21:08:05Z

Pushed a commit 4ffd977 for improved plotting. No bug found in current plotting.

TaoXieSZ · 2020-07-21T23:17:28Z

Hi, dear Glenn,

I think it is a good time for your team to formularize, paperize your work, and SHOCK the world. It is really interesting to read your code.

glenn-jocher · 2020-07-22T05:08:16Z

@ChristopherSTAN haha, yes we do need to produce a publication, but we are still exploring design changes. Hopefully around the end of year we can send something to arxiv.

glenn-jocher · 2020-07-23T03:19:46Z

@ChristopherSTAN I have an idea, you could try modifying the L24 activation function in the Conv() layer from LeayReLU(0.1) to Swish() or Mish() to see if this helps wheat training. I've never tried this, but it may be possible to still start from pretraind weights when you do this:

yolov5/models/common.py

Lines 18 to 31 in 5e970d4

    
           class Conv(nn.Module): 
        
               # Standard convolution 
        
               def __init__(self, c1, c2, k=1, s=1, p=None, g=1, act=True):  # ch_in, ch_out, kernel, stride, padding, groups 
        
                   super(Conv, self).__init__() 
        
                   self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p), groups=g, bias=False) 
        
                   self.bn = nn.BatchNorm2d(c2) 
        
                   self.act = nn.LeakyReLU(0.1, inplace=True) if act else nn.Identity() 
        
               def forward(self, x): 
        
                   return self.act(self.bn(self.conv(x))) 
        
               def fuseforward(self, x): 
        
                   return self.act(self.conv(x))

EDIT: You'll have to reduce your batch size as these will consume much greater GPU RAM when training.

TaoXieSZ · 2020-07-23T03:30:19Z

@glenn-jocher Interesting!
I will try later. Now I am considering using coco dataset to increase training data by extracting intersecting classes. I think it will be a great trick to improve performance on custom datasets. If it works, I will apply a PR to see if you are interested.

Edit: I plan to upload some scripts. I am not sure how to name this operation. Maybe we can name it "Enriching data" or something else.

BTW, I am using EfficientDet on Wheat compete.
But I am using yolov5 on two different datasets.

TaoXieSZ · 2020-07-23T04:09:33Z

@glenn-jocher That's my way:

Here I have a small dataset with 3600 images. But by extracting data from coco, we can have more than 30K. I am expecting how much it affect.

TaoXieSZ · 2020-07-23T04:38:24Z

@glenn-jocher Now I understand your feeling when training on COCO.
I just use yolov5m and nearly 40K training images, it takes me 35min to run an epoch....

glenn-jocher · 2020-07-23T05:01:39Z

@ChristopherSTAN intersecting classes, that's a good term. Yes this would be very useful. OpenImages V5/6 have a lot of intersecting classes with coco.

Yes, COCO can be very slow to train on unfortunately.

dlawrences · 2020-07-23T05:05:56Z

@glenn-jocher That's my way:

Here I have a small dataset with 3600 images. But by extracting data from coco, we can have more than 30K. I am expecting how much it affect.

I would point out that this is not something you want to do on the long run, depending on the actual images of your own dataset. The COCO dataset may help the model to generalise on the objects, but usually the test dataset and the real world on which you are going to use your trained model are going to have its specifics around:

point of view from where the pictures/videos have been taken
lightning
overall environment conditions

For the problems I am solving, I have also used the COCO dataset for the specific classes I am training. However, I am also decreasing the COCO images in my dataset once I have a new batch of real images annotated. And, obviously, one thing you need to make sure is not happening is having any COCO images in your val/test set if these are not in accordance to your actual real scenarios. This can screw up your model evaluation pretty bad.

TaoXieSZ · 2020-07-23T05:10:30Z

@dlawrences Thanks for your suggestions! It is my first time to add COCO images into my train set. And I have similar thought of test set to yours, I do not add extra images in to val set. Because I still want test set and dev set have same distribution.

Thanks again!

TaoXieSZ · 2020-07-23T21:27:24Z

@glenn-jocher I plan to try what this pro said: ultralytics/yolov3#1098 (comment)

Try Leaky ReLU first and then Mish.

TaoXieSZ · 2020-07-28T20:39:12Z

@glenn-jocher Bravo! I first train yolov5x on mixed dataset (30K of COCO + 3K of a small dataset) for nearly 50 epochs. Then train 150 epochs in only 3K dataset. It gives me 0.67 -> 0.71 mAP in test set!

glenn-jocher · 2020-07-28T20:45:47Z

@ChristopherSTAN oh, that's a big jump! What was the increase due to? The COCO pretraining? Mish/Swish was also mentioned above, or perhaps you used your intersecting classes idea?

TaoXieSZ · 2020-07-28T20:54:26Z

I don't see much improvement with Mish/Swish.
The story is that:
You know I am only using Colab, and my notebook just disconnected days ago. So I was angry and resumed it without coco dataset. And observed great improvement in val set.

Then I though a little bit: because the data are so important to deep learning models, the external data have improved the modeling ability of the mode. when train in the custom dataset (origin), we can see a great improvement.

Especially, this the single model on fold 0, but outperform my ensemble models on 5 folds.

With this observation, I will keep the pretrain model and resume it with k-fold, then ensemble it.

So, at last, thanks a lot to your great repo and hard working on COCO dataset.

github-actions · 2020-08-28T00:37:51Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

abhiagwl4262 · 2020-12-10T13:35:13Z

@ChristopherSTAN @glenn-jocher Can Anyone explain why setting the target confidence = 1.0 hurting the accuracy ? And why the equation below is giving better accuracy ?

tobj[b, a, gj, gi] = (1.0 - model.gr) + model.gr * giou.detach().clamp(0).type(tobj.dtype) # giou ratio

glenn-jocher · 2020-12-10T13:40:58Z

@abhiagwl4262 you may want to experiment both ways. The current implementation sets object confidence to observed iou.

abhiagwl4262 · 2020-12-10T18:21:27Z

@glenn-jocher I actually experimented and found that setting the target_confidence to 1.0 is giving me significant drop in accuracy. Do you have any intuition behind this observation?

glenn-jocher · 2020-12-10T20:07:46Z

@abhiagwl4262 the intention with the current implementation is to assist NMS in reducing lower quality boxes.

abhiagwl4262 · 2020-12-14T13:00:55Z

@glenn-jocher The predicted box can be (0-4.0)times of the anchor. You basically having a upper bound of 4.0 and lower bound of 1/4.0 or the anchor-GT ratio. Why you are you applying a lower bound? What is the significance of that?

glenn-jocher · 2020-12-14T19:08:41Z

@abhiagwl4262 the matching algorithm is attempting to match targets with suitable anchors. The matches should be neither too large, nor too small, so we use upper and lower bounds on the ratio to achieve this. Without the lower bounds, all anchors would match with small objects (we only want the small anchors to match with small objects).

abhiagwl4262 · 2020-12-15T14:25:54Z

@glenn-jocher Can you give a little idea of how you chose the values for yolo layer loss balacing as [4.0 - small object layer, 1.0 - medium object layer, 0.4- for large object layer] ?

glenn-jocher · 2020-12-15T16:46:51Z

empirical results

violet17 · 2021-01-11T13:22:23Z

The original yolo/darknet box equations have a serious flaw. Width and Height are completely unbounded as they are simply out=exp(in), which is dangerous, as it can lead to runaway gradients, instabilities, NaN losses and ultimately a complete loss of training. yolov3 suffers from this problem as well as yolov4.

For yolov5 I made sure to patch this error by sigmoiding all model outputs, while also ensuring that the centerpoint remained unchanged 1=fcn(0), so nominal zero outputs from the model would cause the nominal anchor size to be used. The current eqn constrains anchor multiples from a minimum of 0 to a maximum of 4, and the anchor-target matching has also been updated to be width-height multiple based, with a nominal upper threshold hyperparameter of 4.0.

Thanks for the great repo!
Can I have some base questions on the width-height method?

Why multiply 2 to σ(x) and subtract 0.5 in x,y coords of bounding box? Should 2σ(x)-0.5 be in range [-0.5,0.5]?
Why multiply 2 to σ(x) in width,height of bounding box?
Why the centerpoint should be 1, i.e. f(0)=1？

Thanks!

glenn-jocher · 2021-01-11T18:48:12Z

@violet17 the equation and intersection points were chosen for stability and for its suitability in replacing the unstable yolov3/v4 wh method.

joelcma · 2021-10-24T12:24:31Z

@glenn-jocher But what is the purpose of offsetting by -0.5? If you are expanding the output space from 0-1 to 0-2 and you offset by -0.5 the mid-point will be equal to 0.5 and not 0. So given x = 0 you get 0.5 output. Wouldn't it be more logical to offset by -1 to get 0 when given 0?

With the current formulation, if the network predicts t_x = 0 and the cell offset is 0.5, then the output will be 1, while intuitively it seems like it should perhaps be 0.5, the 0 value of the cell. Perhaps I misunderstand?

glenn-jocher · 2021-10-24T20:12:52Z

@joelcma you want a reference input to create a reference output for stability and ease of training. The average input (due to batchnorm) will be zero, and the average object will be in the middle (i.e. at 0.5) of a grid cell. We expand the output space to allow for predictions near 0 and 1 without stressing the sigmoid inputs to extremes.

joelcma · 2021-10-25T18:51:06Z

@glenn-jocher Thank you for taking the time to answer! And sorry because I have another question :D So what is the benefit of using a sigmoid over a bounded relu in this case?

glenn-jocher · 2021-10-27T13:21:52Z

@joelcma the benefit of any model architecture selection would be driven by empirical results, i.e. 'it works better'.

Signed-off-by: Glenn Jocher <glenn.jocher@ultralytics.com>

dhiman10 · 2023-03-21T15:31:51Z

I have tried it for export !python export.py --weights /content/drive/MyDrive/best.pt --include "coreml"

Could any one know how can I convert correctly and get the bounding box, score, and other things?

glenn-jocher · 2023-11-15T16:50:42Z

@dhiman10 after running export.py, you can get the bounding boxes, scores, etc. by using the CoreML framework to load the exported model and perform inference with it. You can refer to the CoreML documentation or examples for guidance on how to do this.

TaoXieSZ added the question Further information is requested label Jul 21, 2020

glenn-jocher added the TODO High priority items label Jul 21, 2020

glenn-jocher self-assigned this Jul 21, 2020

glenn-jocher added a commit that referenced this issue Jul 21, 2020

plotting improvements (#471)

4ffd977

Signed-off-by: Glenn Jocher <glenn.jocher@ultralytics.com>

glenn-jocher removed the TODO High priority items label Jul 21, 2020

glenn-jocher mentioned this issue Jul 23, 2020

Memory Friendly Mish() Activation ultralytics/yolov3#1098

Closed

seekFire mentioned this issue Jul 28, 2020

Why postprocess of yolov5 is not same with yolov3 #533

Closed

github-actions bot added the Stale Stale and schedule for closing soon label Aug 28, 2020

github-actions bot closed this as completed Sep 3, 2020

zhiqwang mentioned this issue Dec 3, 2020

pxy = ps[:, :2].sigmoid() * 2. - 0.5; why the sigmoid number multiply by 2 and minus 0.5 #1585

Closed

zldrobit mentioned this issue Feb 9, 2021

Add TensorFlow and TFLite export #1127

Merged

WZMIAOMIAO mentioned this issue Mar 16, 2022

YOLOv5 (6.0/6.1) brief summary #6998

Open

BjarneKuehl pushed a commit to fhkiel-mlaip/yolov5 that referenced this issue Aug 26, 2022

plotting improvements (ultralytics#471)

043bff6

Signed-off-by: Glenn Jocher <glenn.jocher@ultralytics.com>

tjuskyzhang mentioned this issue Nov 22, 2022

WongKinYiu/PyTorch_YOLOv4-tiny，自己数据集，类别数为1，推理加速后目标上有很多框，该怎么处理？ tjuskyzhang/Scaled-YOLOv4-TensorRT#29

Open

akihiro-coder mentioned this issue Mar 3, 2023

android開発 akihiro-coder/note#23

Open

gigumay mentioned this issue Apr 5, 2024

Intuition behind box regression formula #12888

Closed

1 task

Want to figure out critical algorithm of Detect layer #471

Want to figure out critical algorithm of Detect layer #471

Comments

TaoXieSZ commented Jul 21, 2020

❔Question

glenn-jocher commented Jul 21, 2020

TaoXieSZ commented Jul 21, 2020

glenn-jocher commented Jul 21, 2020 • edited Loading

glenn-jocher commented Jul 21, 2020 • edited Loading

TaoXieSZ commented Jul 21, 2020 • edited Loading

TaoXieSZ commented Jul 21, 2020

glenn-jocher commented Jul 21, 2020

glenn-jocher commented Jul 21, 2020 • edited Loading

TaoXieSZ commented Jul 21, 2020

glenn-jocher commented Jul 21, 2020

TaoXieSZ commented Jul 21, 2020

TaoXieSZ commented Jul 21, 2020

glenn-jocher commented Jul 21, 2020

glenn-jocher commented Jul 21, 2020

TaoXieSZ commented Jul 21, 2020

glenn-jocher commented Jul 22, 2020

glenn-jocher commented Jul 23, 2020 • edited Loading

TaoXieSZ commented Jul 23, 2020 • edited Loading

TaoXieSZ commented Jul 23, 2020

TaoXieSZ commented Jul 23, 2020

glenn-jocher commented Jul 23, 2020

dlawrences commented Jul 23, 2020

TaoXieSZ commented Jul 23, 2020

TaoXieSZ commented Jul 23, 2020

TaoXieSZ commented Jul 28, 2020

glenn-jocher commented Jul 28, 2020

TaoXieSZ commented Jul 28, 2020

github-actions bot commented Aug 28, 2020

abhiagwl4262 commented Dec 10, 2020

glenn-jocher commented Dec 10, 2020

abhiagwl4262 commented Dec 10, 2020

glenn-jocher commented Dec 10, 2020

abhiagwl4262 commented Dec 14, 2020

glenn-jocher commented Dec 14, 2020

abhiagwl4262 commented Dec 15, 2020

glenn-jocher commented Dec 15, 2020

violet17 commented Jan 11, 2021 • edited Loading

glenn-jocher commented Jan 11, 2021

joelcma commented Oct 24, 2021

glenn-jocher commented Oct 24, 2021

joelcma commented Oct 25, 2021

glenn-jocher commented Oct 27, 2021

dhiman10 commented Mar 21, 2023

glenn-jocher commented Nov 15, 2023

glenn-jocher commented Jul 21, 2020 •

edited

Loading

glenn-jocher commented Jul 21, 2020 •

edited

Loading

TaoXieSZ commented Jul 21, 2020 •

edited

Loading

glenn-jocher commented Jul 21, 2020 •

edited

Loading

glenn-jocher commented Jul 23, 2020 •

edited

Loading

TaoXieSZ commented Jul 23, 2020 •

edited

Loading

violet17 commented Jan 11, 2021 •

edited

Loading