-
Notifications
You must be signed in to change notification settings - Fork 8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Matrix Nets: A New Deep Architecture for Object Detection - mAP of 47.8@0.5...0.95 on MS COCO, #3772
Comments
https://arxiv.org/abs/1908.04646v2 xNets can be applied to any backbone, similar to FPNs. xNets map objects with different sizes and aspect ratios into layers where the sizes and the aspect ratios of the objects within their layers are nearly uniform. Hence, xNets provide a scale and aspect ratio aware architecture. We leverage xNets to enhance key-points based object detection. Our architecture achieves mAP of 47.8 on MS COCO, which is higher than any other single-shot detector while using half the number of parameters and training 3x faster than the next best architecture. |
Is this the only necessary feature for the implementation of the Matrix Net?
Or do you need something else? |
@AlexeyAB Hello, |
@WongKinYiu Hi, I added support for
Try to make some network, and if it will work fine with increasing accuracy, I will add stride_x & stride_y for convolutional layer. |
@AlexeyAB Hello, thank you very very much. |
It seems that MatrixNet (different strides) and TridentNet (different dilations) are very promising approaches for generalizing different sizes and aspect ratios of objects. |
@WongKinYiu Hi, What progress? I added stride_x=... stride_y=... for convolutional layer. So you can try to make MatrixNet.
https://arxiv.org/pdf/1908.04646v2.pdf
@WongKinYiu
|
@AlexeyAB Hello, for the version with different stride max pooling layers, now training 200k epochs. |
@WongKinYiu are you training a matrixnet with the original resnext-101 backbone? or something else? |
@WongKinYiu Thanks, do you know about "shared 3x3 convolutions"? #3772 (comment) |
@AlexeyAB Yes, I think it similar to TridentNet. |
@LukeAI Hello. |
which one? :) darknet-53 ? |
@WongKinYiu Thanks. You can try to train 3 models:
And compare results. I added |
@AlexeyAB Thank you for your advise. I will get free gpus after two weeks. |
maxpool version:
|
@WongKinYiu Thanks for results!
|
@WongKinYiu Is Original - original MatrixNet from https://arxiv.org/abs/1908.04646v2 ? |
No, original means a yolov3-based model, without adding the feature proposed by MatrixNet. You can see the figure in #3772 (comment) |
cfg files based on yolov3-tiny yolov3-tiny_3l(15.778BFLOPs).cfg.txt for conv version, just replace maxpool layers by conv layers with shared weights. |
@WongKinYiu Thanks. What mAP@0.5 did you get for yolov3-tiny_3l_maxpoolmatrixnet_addanchors(15.787BFLOPs).cfg.txt ? |
Did you anywhere meet original MatrixNet-backbone (not just yolov3/tiny with non-unoform strides), if yes - can you share it? |
@AlexeyAB Hello, I do not meet original MatrixNet-backbone. In the paper, they only show the concept of MatrixNet and use it on CornerNet without telling details. |
@WongKinYiu yes I think you are right. The darknet training set (2014) is 117k images, and tests on 5k images. The 2017 set I believe trains on 120k images and tests on 20k. So testing will take 4X longer on 2017, but I believe most of the training data is the same. |
I've updated my mAP section, so the jump between YOLOv3 and YOLOv3-SPP is more clear. We are not at the level of ASFF yet, but it's encouraging to see that we are not too far away either. https://github.com/ultralytics/yolov3#map
The main takeaway from the ASFF paper is Figure 1. This shows that while some of the new work that's coming out claims very high mAPs, they do it at the expense of inference time, so in this sense I like the ASFF approach of adding to YOLOv3 to obtain results with minimal hits to FPS. |
@glenn-jocher yes, there are too many tricks which are hard to implemented in darknet... |
What features do you mean? I added several fixes, so ASFF and BiFPN (from EfficientDet) can be implemented there - I hope there are no bugs: Instead of Softmax - we use
ASSF - like:
|
@AlexeyAB Hello, For example, Autoaugment, fast-autoaugment, autograd, deformable conv, and multi-task training... In my project, I need do detection, segmentation, and tracking simultaneously. by the way, I will create a ASFF-like cfg in thease days. |
Which of these tasks do you consider the most promising? Yes, instance segmentation - YOLACT still in todo: https://github.com/AlexeyAB/darknet/projects/6 Like as tracker: #3042 I will think can I do fast "Joint Detection and Embedding for fast multi-object tracking" Just now the simplest way just to add Monving to data augmentation, so we will can train conv-LSTM models on non-sequential datasets like MS COCO, so it will increase accuracy of detection on video, but will still require additional tracker for track_id's. The highest priorities were: #4264 #4346 #4382 #4203 #3830 Do you think Deformable Conv better than Deformable Kernels ? #4066 |
For me, "Joint Detection and Embedding for fast multi-object tracking". Oh, I treat deformable kernel as deformable conv v3. |
Did you try to compare How slow are deformable convolutions? It addes only +~1% AP but may be it is very slow? |
I have not try it. In ICCV, the author of tridentnet show the fast version of tridentnet. |
They don't use Deformable-conv in the
But we should implement To implement
It will be They use
Resnet101-TridentNet-Fast scale-aware |
@AlexeyAB Hello, They released their long paper and code. |
@WongKinYiu Thanks! MatrixNet achieves only 3-4 FPS, while CSPResNeXt-50-PANet-SPP-optimal achieves 30-40 FPS with ~ the same AP. I added something like CenterNet that uses 3 x [Gaussian_yolo] layers (left-top, right-bottom, center) for each final resolution. #3229 (comment) But I'm not sure that there will be a good result. |
@WongKinYiu Also I fixed a bug with Tensor Cores, and it seems any groups value larger than |
@AlexeyAB Thanks a lot, I am doing ablation study of Yes, I find that the bad detection usually appears on overlapped objects in my project. It seems |
I will think about it. I created an issue: #4701 Also what do you think about Repulsion Loss: Detecting Pedestrians in a Crowd (CVPR 2018) - without NMS at all ? https://arxiv.org/abs/1711.07752v2 and #3113 |
|
@WongKinYiu What normalizer you used in ASFF module? In my experiments, ASFF got NaN when use relu for weights normalize and change to softmax will resolve this, the reason could be something like dead neurons which happens in relu |
@Kyuuki93
Is it about ~2x slower? |
@WongKinYiu Also use |
Try to use
|
@AlexeyAB Thanks a lot nan is not yet solved for asff model (rfb without asff do not get nan) |
I added Try to train with |
@AlexeyAB OK, thanks a lot. |
I just checked your cfg-files:
|
still not stable enough for |
Update your code at least to this fix: 2862b28
At what number of iterations it goes to Nan? Also try |
I use https://github.com/AlexeyAB/darknet/tree/61499b27a4e24656a0f84bb83b92df95b0917f74 for training. I already clean the results, but it got nan at < 35k epochs with |
Could this repo supports max pooling layer with different x, y strides.
I would like to implement the state-of-the-art object detector.
Thanks.
The text was updated successfully, but these errors were encountered: