Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ASFF(Adaptively Spatial Feature Fusion) layers in Head for YoloV5 and some attention modules #2349

Closed
wants to merge 3 commits into from

Conversation

positive666
Copy link

@positive666 positive666 commented Mar 3, 2021

First of all thanks for the help the project brought me !Given the successful case of the paper yolov3_ASFF, paper provides interpretability and is very suitable for the YOLO series, so I added this ASFF network layer to improve the recognition performance,.
Due to limited hardware resources , I cannot perform all training evaluations under the COCO data set, but relying on the principle of the paper, adaptive weight scale learning is an alternative structure. The slightest changes have been integrated into the code. The 4 structures (including s, m, l, x) are placed in 4 yaml files, which can be freely chosen.
So ,Now the added yaml file is PanNet+ASFF, or FPN+ASFF, which can be combined freely after modularization .I applied it to cigarette detection and advertising logo detection for some small targets. Due to the lack of data sets, I can only continue to collect, further sort and analyze the solutions.
Maybe transformer's free -(anchor and nms) will be a direction of follow-up development , I will conduct more experiments and changes in the future.
For details, see in #2348

🛠️ PR Summary

Made with ❤️ by Ultralytics Actions

🌟 Summary

Integration of new architectural components and loss functions into YOLOv5 models.

📊 Key Changes

  • Introduced Involution, CoordAtt, CBAM, ASFFV5, MHSA, BottleneckTransformer, and various other neural network modules.
  • Added Focal EIoU loss and options for different IoU types in bbox_iou function (GIoU, DIoU, CIoU, and EIoU).
  • New .yaml files for YOLOv5 models with specific architectural changes like CBAM (Convolutional Block Attention Module) and CoordAtt (Coordinate Attention).

🎯 Purpose & Impact

  • 🎲 Increased flexibility by offering different architectural features for model customization.
  • 🔍 Improved accuracy potential through advanced attention mechanisms and loss functions.
  • 🚀 Enhanced capability to fine-tune models based on specific application requirements, potentially leading to better performance in real-world tasks.

These changes allow researchers and developers to experiment with cutting-edge components in object detection models and may lead to improved detection accuracy and efficiency.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👋 Hello @positive666, thank you for submitting a 🚀 PR! To allow your work to be integrated as seamlessly as possible, we advise you to:

  • ✅ Verify your PR is up-to-date with origin/master. If your PR is behind origin/master an automatic GitHub actions rebase may be attempted by including the /rebase command in a comment body, or by running the following code, replacing 'feature' with the name of your local branch:
git remote add upstream https://github.com/ultralytics/yolov5.git
git fetch upstream
git checkout feature  # <----- replace 'feature' with local branch name
git rebase upstream/master
git push -u origin -f
  • ✅ Verify all Continuous Integration (CI) checks are passing.
  • ✅ Reduce changes to the absolute minimum required for your bug fix or feature addition. "It is not daily increase but daily decrease, hack away the unessential. The closer to the source, the less wastage there is." -Bruce Lee

@glenn-jocher
Copy link
Member

@positive666 very interesting, thanks for the PR! Were you able to test the changes against the baseline models?

@positive666 positive666 changed the title Add ASFF(Adaptively Spatial Feature Fusion) layers in Head for YoloV5 Add ASFF(Adaptively Spatial Feature Fusion) layers in Head for YoloV5 and some attention modules Mar 7, 2021
@positive666
Copy link
Author

positive666 commented Mar 7, 2021

@glenn-jocher Okay, I will continue to verify the change, the current change idea: provide some plug-and-play lightweight modules. And I'm still learning about transform to replace convolutional blocks in the paper.
In addition, I selectively added the CBAM (channel and attention mechanism module). Also, I saw a detection game, through the CBAM point increase technique, I hope you can give some suggestions, and I will also train this change. And when I have a conclusion, I will ask you for advice, such as my changes in v5s, as shown below:

``
backbone:
[[-1, 1, Focus, [64, 3]], # 0-P1/2
[-1, 1, Conv, [128, 3, 2]], # 1-P2/4
[-1, 3, C3, [128]],
[-1, 1, Conv, [256, 3, 2]], # 3-P3/8
[-1, 9, C3, [256]],
[-1, 1, Conv, [512, 3, 2]], # 5-P4/16 #[128,256,3,2]
[-1, 9, C3, [512]],
[-1, 1, Conv, [1024, 3, 2]], # 7-P5/32
[-1, 1, SPP, [1024, [5, 9, 13]]],
[-1, 3, C3, [1024, False]], # 9
[-1, 3, CBAM, [1024]],
]
head:
[[-1, 1, Conv, [512, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[[-1, 6], 1, Concat, [1]], # cat backbone P4
[-1, 3, C3, [512, False]], # 14

[-1, 1, Conv, [256, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[[-1, 4], 1, Concat, [1]], # cat backbone P3
[-1, 3, C3, [256, False]], # 18 (P3/8-small)
[-1, 1, CBAM, [256]],

[-1, 1, Conv, [256, 3, 2]],
[[-1, 15], 1, Concat, [1]], # cat head P4
[-1, 3, C3, [512, False]], # 22 (P4/16-medium) [256, 256, 1, False]
[-1, 1, CBAM, [512]],

[-1, 1, Conv, [512, 3, 2]], #[256, 256, 3, 2]
[[-1, 11], 1, Concat, [1]], # cat head P5
[-1, 3, C3, [1024, False]], # 26 (P5/32-large) [512, 512, 1, False]
[-1, 1, CBAM, [1024]],

[[19, 23, 27], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5)
]
··

@glenn-jocher
Copy link
Member

glenn-jocher commented Mar 8, 2021

@positive666 sounds good. My main recommendation would be to package ASFF into the Detect() layer, since you have all of the inputs you need already inside Detect(), and rename it DetectASFF(). Then you don't need to publish new model yamls, anyone could just replace a Detect() layer with a DetectASFF() layer in any yaml.

I think that would be ideal in terms of architecture, and then in terms of testing YOLOv5m on VOC is probably a good place to start (fast but generalizes well across datasets and model sizes). The VOC training commands are here, they are only 50 epochs since they start from the COCO-trained models:
https://colab.research.google.com/github/ultralytics/yolov5/blob/master/tutorial.ipynb?hl=en#scrollTo=BSgFCAcMbk1R

@positive666 positive666 force-pushed the master branch 2 times, most recently from 3b92f43 to 70586bc Compare March 8, 2021 03:08
@positive666
Copy link
Author

@positive666 sounds good. My main recommendation would be to package ASFF into the Detect() layer, since you have all of the inputs you need already inside Detect(), and rename it DetectASFF(). Then you don't need to publish new model yamls, anyone could just replace a Detect() layer with a DetectASFF() layer in any yaml.

I think that would be ideal in terms of architecture, and then in terms of testing YOLOv5m on VOC is probably a good place to start (fast but generalizes well across datasets and model sizes). The VOC training commands are here, they are only 50 epochs since they start from the COCO-trained models:
https://colab.research.google.com/github/ultralytics/yolov5/blob/master/tutorial.ipynb?hl=en#scrollTo=BSgFCAcMbk1R

@glenn-jocher Haha, yes, after all, this is only a feature fusion layer, then I am going to modify it now, thank you for your suggestions. In addition, I will test the attention mechanism module on VOC.

@positive666 positive666 force-pushed the master branch 5 times, most recently from a3a46a2 to 1ab9a56 Compare March 8, 2021 08:55
@positive666
Copy link
Author

@glenn-jocher Hello,I replied to the previous results on issue #2348 , which are roughly similar to the conclusions of some friends. The current experiment did not improve the test map trained on V5s, but this aroused my interest, why can I learn the weight layer Unable to achieve results, I think of two difficulties in the detection problem:

  1. The definition and sampling of positive samples. I think V5 increases the introduction of positive sample anchors through cross-grid prediction and shape matching strategies. This does increase the network convergence, but whether to introduce some low-quality anchor positive samples to participate in the loss Calculation?
  2. In detection, classification and regression are of low relevance. If the classification is high, the positioning of the BOX may not be accurate enough. Similarly, if the BOX is accurate, the classification confidence may be low; in reasoning, it will cause performance degradation. Essentially, we need to have a strategy to determine the priority of BOX. In fact, good BOX is thrown away by us. I want to refer to the idea of ​​AWARE-IOU to conduct a DIOU prediction of Head to strengthen the connection between classification and regression. Sigmod compresses to 0 and 1 to predict DIOU or IOU, and then we use this weight to multiply the confidence level during inference.
    On the basis of the above, add the module of attention mechanism.

@glenn-jocher
Copy link
Member

@positive666 yes these are good questions. In object detection it is true that detection (obj and cls) and regression (box) are often competing interests, and steps to improve one may hurt the other. It's a tricky balancing act that I suppose any multi-component loss model has to deal with that simpler tasks (classification) don't have to worry about.

@positive666 positive666 force-pushed the master branch 6 times, most recently from 6979f53 to 5637868 Compare March 19, 2021 10:36
@positive666
Copy link
Author

@positive666 yes these are good questions. In object detection it is true that detection (obj and cls) and regression (box) are often competing interests, and steps to improve one may hurt the other. It's a tricky balancing act that I suppose any multi-component loss model has to deal with that simpler tasks (classification) don't have to worry about.

Yes, I can determine that the effect of directly adding ASFF and CBAM is not good after many ablation experiments. I recently added EIOU, Focal-EIOU loss and CVPR2021 new attention modules coordinate attention, haha, wait for me to finish Experiment to see the results。

@Henry0528
Copy link

@positive666 hello,l saw you replace the Ciou with Eiou in loss.py and l try this change with my own train data but find that the result is worse than the one using Ciou,l want to know whether you have gain some improvement with your own dataset

@SkalskiP SkalskiP changed the base branch from master to develop May 24, 2021 20:11
@positive666 positive666 force-pushed the master branch 2 times, most recently from 2ece9f2 to 1beb5bb Compare June 2, 2021 09:10
@positive666
Copy link
Author

positive666 commented Jun 3, 2021 via email

@positive666
Copy link
Author

positive666 commented Jun 3, 2021 via email

@glenn-jocher glenn-jocher deleted the branch ultralytics:develop June 8, 2021 08:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants