-
-
Notifications
You must be signed in to change notification settings - Fork 16.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible AutoAnchor reversal in v2.0 #447
Comments
I have a similar problem with yolov5s. Not sure why it predicts a small bounding box. Next, I will be training on default anchors instead of calculating during training. I doubt anchors might play a role here because proposed anchors for my datasets are smaller. |
Can setting higher iou-threshold help? |
@123456789mojtaba do not use a bug label for training results that you don't understand. |
@123456789mojtaba @priteshgohil first, without looking at your training results.png it is impossible to say whether you have trained properly, so displaying anecdotal evidence of improper training on a custom dataset out of context allows no one to properly help you. Second, 5s is the naturally the smallest and least accurate model. If your goal is accuracy, 5s should not be your first choice obviously. You can see a comparison in our readme table https://github.com/ultralytics/yolov5#pretrained-checkpoints |
@glenn-jocher There is no doubt on the dataset and training. The problem is even with YOLOv5l. As I have predicted, the fault was the calculated anchor boxes, because check_anchors() function is giving smaller anchor values for mine dataset. I get very good results with default anchors. I will update training results.png and prediction result by Saturday 25.07.2020. |
@priteshgohil hmm that's strange. check_anchors() is supposed to check your anchors to make sure they are aligned to your stride order. i.e. they should both be large to small or small to large depending on your head. |
@priteshgohil ah, nevermind, check_anchors() recomputes new anchors if needed based on your dataset BPR. You can disable it with python train.py --noautoanchor |
@glenn-jocher Thank you!! YOLOv5s with auto anchorsYOLOv5s without auto anchors (i.e. --noautoanchor) |
@priteshgohil ah interesting. Yes the second is definitely better. Can you report your anchors for both using: AutoAnchor (actually any anchor evolution using our code) works under the assumption that the objects are spread around a range of sizes relative to the model output strides 8, 16 and 32. In theory if your labels are composed solely of larger or smaller objects, then some output layers may be better of being completely removed or ignored than being assigned anchors far outside their receptive field size. In practice though it is difficult determining actual receptive field dimensions. |
Hi @glenn-jocher, Thank you for explaining. So we have labels.png generated during training which is really cool. Can you explain (or have any link) about how to interpret this image? I have the following values With autoAnchorsConsole output during training was
I have one question here. are these new calculated anchors? If yes then why it doesn't match with following anchors saved in the model? I think the larger anchors group is divided by 8 and smaller group by 32. Whereas it should be opposite right? correct me if I'm wrong
Without autoAnchorsThese anchors match with the values in yolov5s.yaml file.
|
@priteshgohil anchors displayed using this command are in stride units. You are using a pre v2.0 version of the repo so your anchors are reversed compared to v2.0 anchors, but this is not a problem. yolov5s.yaml: # anchors
anchors:
- [10,13, 16,30, 33,23] # P3/8
- [30,61, 62,45, 59,119] # P4/16
- [116,90, 156,198, 373,326] # P5/32 yolov5s anchors: print(torch.load('yolov5s.pt')['model'].model[-1].anchors)
tensor([[[ 1.25000, 1.62500],
[ 2.00000, 3.75000],
[ 4.12500, 2.87500]],
[[ 1.87500, 3.81250],
[ 3.87500, 2.81250],
[ 3.68750, 7.43750]],
[[ 3.62500, 2.81250],
[ 4.87500, 6.18750],
[11.65625, 10.18750]]], dtype=torch.float16) You have two anchor computations that both look similar, but they do not correspond to your autoanchor model output. Since your code is out of date, there are likely issues with it that have already been resolved. I would git clone the most recent repo and repeat your experiment, using all default settings (changing nothing except with and without autoanchor). It looks like you only need about 30 training epochs to make a comparison. |
Hi @glenn-jocher Yes you are right. Thank you :). The problem is solved with most recent pull. Results are good with latest git pull. The problem in v2.0 was with the reversed anchors and k means computed anchors were divided with wrong stride value (instead of 8, 16, 32 it was divided with 32, 16, 8). However, I am also able to get the perfect result in v2.0 by changing following line with, Line 99 in 7f8471e
m.anchors[:] = new_anchors.clone().view_as(m.anchors) / torch.flip(m.stride.to(m.anchors.device).view(-1, 1, 1),[0,1]) # loss
|
@priteshgohil I don't understand. Are you saying that utils.py L99 in 7f8471e (current master) needs changing? |
L99 is the line that divides the anchors from pixels to strides. L100 right after it is supposed to check the anchor order and reverse them if necessary. Perhaps this region of the code should be updated to make it more robust to different scenarios. For now it should work fine with the public architectures offered (I'm training several models currently that rely on autoanchor and they are training correctly). |
Hi @glenn-jocher. Sorry for creating misunderstanding. Current master (7f8471e) is perfectly fine and doesn't need any changes. The problem was when I was using old version and yolov5s.yaml was using the following order of anchors,
So, L100 in utils.py will correct the order but I guess it should be done before L99 and then divide it with correct stride value (correct me if I'm wrong). In my old version of repo, L99 had following values for the tensor, where it is necessary to flip either dividing tensor or new anchor tensor.
After L99
After L100
So do you see the problem? Anchors were divided by the wrong value and check_anchor_order at L100 only changes its order. |
@priteshgohil yes I think believe you are correct that we should adjust the order in conjunction with the strides to keep them both synchronized. The evolved anchors are sorted from small to large before being attached to the model and then divided by stride, which in v2.0 model yamls is also always small to large. But I just finished my training with a v2.0 autoanchor model, and while the training mAPs performed well (better than the official model actually), when I test the saved model I get about half the mAP expected. So it seems something is still not quite right. |
@priteshgohil I've taken a quick look, and am very confused about what could be wrong. The same EMA gets passed to test.py during training as is saved each epoch, so there should not be any differences. If the EMA performs at x mAP during training then test.py should produce the same results independently. Just to be clear, were you able to train a v2.0 model using Autoanchor, and observed good training results, and also, separately once training was complete observed good test.py results using best.pt or last.pt? |
@glenn-jocher Yes I completed training and I observe that the results are almost similar to yolov5s trained on the previous version without autoanchor. Just little boost on the specific class category which is more frequent than other object categories in my dataset. Results.png is almost same as the one I have posted earlier in this issue except for minimum objectness for both training and validation is 0.1 instead of 0.05 |
@priteshgohil ok thanks. Maybe the problem is only in my dev branch then. |
@priteshgohil trying to figure out the status of this issue. Are you still seeing any problems in the current code or would you say the original issue appears resolved now? |
@glenn-jocher I don't see any problem now. I even tried altering the order of anchors in |
@priteshgohil ok, great, thanks! |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
TODO removed as original issue appears resolved. |
@priteshgohil Hi, I found a problem in the latest version of the code. I have the same opinion as you. |
hey guys.
I have trained yolov5 on visdrone for car and pedestrian. But it detects some cars and pedestrians with 2 boundig box instead of one?
does anyone know the problem?
The text was updated successfully, but these errors were encountered: