[Enhancement] Support yolo-world model

CVHub520 · Feb 27, 2024 · 83a73dd · 83a73dd
1 parent a4bba9f
commit 83a73dd
Show file tree

Hide file tree

Showing 10 changed files with 349 additions and 4 deletions.
diff --git a/README.md b/README.md
@@ -72,8 +72,9 @@
 ## 🥳 What's New [⏏️](#📄-table-of-contents)
 
 - Feb. 2024:
+  - ✨✨✨ Support [YOLO-World](https://github.com/AILab-CVC/YOLO-World) model.
   - 🤗 Release the latest version [2.3.2](https://github.com/CVHub520/X-AnyLabeling/releases/tag/v2.3.2) 🤗
-  - ✨✨✨ Support [YOLOv9](https://github.com/WongKinYiu/yolov9) model.
+  - Support [YOLOv9](https://github.com/WongKinYiu/yolov9) model.
   - Support the conversion from a horizontal bounding box to a rotated bounding box.
   - Supports label deletion and renaming. For more details, please refer to the [document](./docs/zh_cn/user_guide.md).
   - Support for quick tag correction is available; please refer to this [document](./docs/en/user_guide.md) for guidance.

diff --git a/README_zh-CN.md b/README_zh-CN.md
@@ -70,8 +70,9 @@
 ## 🥳 新功能 [⏏️](#📄-目录)
 
 - 2024年2月：
+  - ✨✨✨ 支持[YOLO-World](https://github.com/AILab-CVC/YOLO-World)模型。
   - 🤗 发布[2.3.2](https://github.com/CVHub520/X-AnyLabeling/releases/tag/v2.3.2)最新版本 🤗
-  - ✨✨✨ 支持[YOLOv9](https://github.com/WongKinYiu/yolov9)模型。
+  - 支持[YOLOv9](https://github.com/WongKinYiu/yolov9)模型。
   - 支持将水平框一键转换为旋转框。
   - 支持批量标签删除及重命名，详情可参考[用户手册](./docs/zh_cn/user_guide.md)。
   - 支持快速标签纠正功能，详情可参考[用户手册](./docs/zh_cn/user_guide.md)。

diff --git a/anylabeling/configs/auto_labeling/models.yaml b/anylabeling/configs/auto_labeling/models.yaml
@@ -84,6 +84,8 @@
   config_file: ":/yolo_nas_m.yaml"
 - model_name: "yolo-nas-s-r20230615"
   config_file: ":/yolo_nas_s.yaml"
+- model_name: "yolow_l-r20240227"
+  config_file: ":/yolow_l.yaml"
 - model_name: "yolov5l-r20230520"
   config_file: ":/yolov5l.yaml"
 - model_name: "yolov5_car_plate-r20230112"

diff --git a/anylabeling/configs/auto_labeling/yolow_l.yaml b/anylabeling/configs/auto_labeling/yolow_l.yaml
@@ -0,0 +1,86 @@
+type: yolow
+name: yolow_l-r20240227
+display_name: YOLO-World-L Tencent
+model_path: https://github.com/CVHub520/X-AnyLabeling/releases/download/v2.3.3/yolow-l.onnx
+confidence_threshold: 0.05
+classes:
+  - person
+  - bicycle
+  - car
+  - motorcycle
+  - airplane
+  - bus
+  - train
+  - truck
+  - boat
+  - traffic light
+  - fire hydrant
+  - stop sign
+  - parking meter
+  - bench
+  - bird
+  - cat
+  - dog
+  - horse
+  - sheep
+  - cow
+  - elephant
+  - bear
+  - zebra
+  - giraffe
+  - backpack
+  - umbrella
+  - handbag
+  - tie
+  - suitcase
+  - frisbee
+  - skis
+  - snowboard
+  - sports ball
+  - kite
+  - baseball bat
+  - baseball glove
+  - skateboard
+  - surfboard
+  - tennis racket
+  - bottle
+  - wine glass
+  - cup
+  - fork
+  - knife
+  - spoon
+  - bowl
+  - banana
+  - apple
+  - sandwich
+  - orange
+  - broccoli
+  - carrot
+  - hot dog
+  - pizza
+  - donut
+  - cake
+  - chair
+  - couch
+  - potted plant
+  - bed
+  - dining table
+  - toilet
+  - tv
+  - laptop
+  - mouse
+  - remote
+  - keyboard
+  - cell phone
+  - microwave
+  - oven
+  - toaster
+  - sink
+  - refrigerator
+  - book
+  - clock
+  - vase
+  - scissors
+  - teddy bear
+  - hair drier
+  - toothbrush
diff --git a/anylabeling/services/auto_labeling/model_manager.py b/anylabeling/services/auto_labeling/model_manager.py
@@ -206,6 +206,7 @@ def load_custom_model(self, config_file):
                 "rtmdet_pose",
                 "depth_anything",
                 "yolov9",
+                "yolow",
             ]
         ):
             self.new_model_status.emit(
@@ -439,6 +440,28 @@ def _load_model(self, model_id):
                     )
                 )
                 return
+        elif model_config["type"] == "yolow":
+            from .yolow import YOLOW
+
+            try:
+                model_config["model"] = YOLOW(
+                    model_config, on_message=self.new_model_status.emit
+                )
+                self.auto_segmentation_model_unselected.emit()
+            except Exception as e:  # noqa
+                self.new_model_status.emit(
+                    self.tr(
+                        "Error in loading model: {error_message}".format(
+                            error_message=str(e)
+                        )
+                    )
+                )
+                print(
+                    "Error in loading model: {error_message}".format(
+                        error_message=str(e)
+                    )
+                )
+                return
         elif model_config["type"] == "yolov5_seg":
             from .yolov5_seg import YOLOv5_Seg
 

diff --git a/anylabeling/services/auto_labeling/utils/points_conversion.py b/anylabeling/services/auto_labeling/utils/points_conversion.py
@@ -232,6 +232,31 @@ def rbox2poly(obboxes):
         *order, 8
     )
 
+def denormalize_bbox(bbox, input_shape, image_shape):
+    """
+    Denormalizes bounding box coordinates from input_shape to image_shape.
+    
+    Parameters:
+    - bbox: Normalized bounding box coordinates [xmin, ymin, xmax, ymax]
+    - input_shape: The shape of the input image used during normalization (e.g., [640, 640])
+    - image_shape: The shape of the original image (e.g., [height, width])
+    
+    Returns:
+    - Denormalized bounding box coordinates [xmin, ymin, xmax, ymax]
+    """
+    xmin, ymin, xmax, ymax = bbox
+
+    # Denormalize x-coordinates
+    denorm_xmin = int(xmin * image_shape[1] / input_shape[1])
+    denorm_xmax = int(xmax * image_shape[1] / input_shape[1])
+
+    # Denormalize y-coordinates
+    denorm_ymin = int(ymin * image_shape[0] / input_shape[0])
+    denorm_ymax = int(ymax * image_shape[0] / input_shape[0])
+
+    denormalized_bbox = [denorm_xmin, denorm_ymin, denorm_xmax, denorm_ymax]
+
+    return denormalized_bbox
 
 def rescale_box(input_shape, boxes, image_shape, kpts=False):
     """Rescale the output to the original image shape"""

diff --git a/anylabeling/services/auto_labeling/yolow.py b/anylabeling/services/auto_labeling/yolow.py
@@ -0,0 +1,57 @@
+import logging
+
+from PyQt5 import QtCore
+
+from anylabeling.app_info import __preferred_device__
+from anylabeling.views.labeling.shape import Shape
+from anylabeling.views.labeling.utils.opencv import qt_img_to_rgb_cv_img
+from .types import AutoLabelingResult
+from .__base__.yolo import YOLO
+from .utils import denormalize_bbox
+
+
+class YOLOW(YOLO):
+    """https://github.com/AILab-CVC/YOLO-World"""
+
+    def postprocess(self, outputs, image_shape):
+        num_objs, bboxes, scores, class_ids = [out[0] for out in outputs]
+        bboxes = [denormalize_bbox(bbox, self.input_shape, image_shape) for bbox in bboxes]
+        return num_objs, bboxes, scores, class_ids
+
+    def predict_shapes(self, image, image_path=None):
+        """
+        Predict shapes from image
+        """
+
+        if image is None:
+            return []
+
+        try:
+            image = qt_img_to_rgb_cv_img(image, image_path)
+        except Exception as e:  # noqa
+            logging.warning("Could not inference model")
+            logging.warning(e)
+            return []
+
+        blob = self.preprocess(image, upsample_mode="resize")
+        outputs = self.net.get_ort_inference(blob, extract=False)
+        _, bboxes, scores, class_ids = self.postprocess(outputs, image.shape[:2])
+
+        shapes = []
+        for bbox, score, cls_id in zip(bboxes, scores, class_ids):
+            if score < self.conf_thres or (int(cls_id) == -1):
+                continue
+            xmin, ymin, xmax, ymax = bbox
+            rectangle_shape = Shape(
+                label=str(self.classes[int(cls_id)]),
+                shape_type="rectangle",
+            )
+            rectangle_shape.add_point(QtCore.QPointF(xmin, ymin))
+            rectangle_shape.add_point(QtCore.QPointF(xmax, ymin))
+            rectangle_shape.add_point(QtCore.QPointF(xmax, ymax))
+            rectangle_shape.add_point(QtCore.QPointF(xmin, ymax))
+            shapes.append(rectangle_shape)
+
+        result = AutoLabelingResult(shapes, replace=True)
+
+        return result
diff --git a/docs/en/custom_model.md b/docs/en/custom_model.md
@@ -246,4 +246,30 @@ Refer to this [tutorial](https://github.com/CVHub520/sam-hq).
 
 - [InternImage](https://github.com/OpenGVLab/InternImage)
 
-InternImage introduces a large-scale convolutional neural network (CNN) model, leveraging deformable convolution as the core operator to achieve a large effective receptive field, adaptive spatial aggregation, and reduced inductive bias, leading to stronger and more robust pattern learning from massive data. It outperforms current CNNs and vision transformers on benchmarks
+InternImage introduces a large-scale convolutional neural network (CNN) model, leveraging deformable convolution as the core operator to achieve a large effective receptive field, adaptive spatial aggregation, and reduced inductive bias, leading to stronger and more robust pattern learning from massive data. It outperforms current CNNs and vision transformers on benchmarks
+
+> Paper: InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions</br>
+> Affiliation: Shanghai AI Laboratory, Tsinghua University, Nanjing University, etc.</br>
+> Published: CVPR 2023</br>
+
+Refer to this [tutorial](../../tools/export_internimage_model_onnx.py).
+
+- [EdgeSAM](https://github.com/chongzhou96/EdgeSAM)
+
+`EdgeSAM` is an accelerated variant of the Segment Anything Model (SAM), optimized for efficient execution on edge devices with minimal compromise in performance. It achieves a 40-fold speed increase compared to the original SAM, and outperforms MobileSAM, being 14 times as fast when deployed on edge devices while enhancing the mIoUs on COCO and LVIS by 2.3 and 3.2 respectively. EdgeSAM is also the first SAM variant that can run at over 30 FPS on an iPhone 14.
+
+> Paper: Prompt-In-the-Loop Distillation for On-Device Deployment of SAM</br>
+> Affiliation: S-Lab, Nanyang Technological University, Shanghai Artificial Intelligence Laboratory.</br>
+> Published: Arxiv 2023</br>
+
+Refer to this [tutorial](https://github.com/chongzhou96/EdgeSAM/blob/master/scripts/export_onnx_model.py).
+
+- [YOLO-World](https://github.com/AILab-CVC/YOLO-World)
+
+`YOLO-World` enhances the YOLO series by incorporating vision-language modeling, achieving efficient open-scenario object detection with impressive performance on various tasks.
+
+> Paper: Real-Time Open-Vocabulary Object Detection</br>
+> Affiliation: Tencent AI Lab, ARC Lab, Tencent PCG, Huazhong University of Science and Technology.</br>
+> Published: Arxiv 2024</br>
+
+Refer to this [tutorial](../../tools/export_yolow_onnx.py).
diff --git a/docs/zh_cn/custom_model.md b/docs/zh_cn/custom_model.md
@@ -264,4 +264,14 @@ InternImage introduces a large-scale convolutional neural network (CNN) model, l
 > 单位：S-Lab, Nanyang Technological University, Shanghai Artificial Intelligence Laboratory.</br>
 > 发表：Arxiv 2023</br>
 
-参考此[教程](https://github.com/chongzhou96/EdgeSAM/blob/master/scripts/export_onnx_model.py).
+参考此[教程](https://github.com/chongzhou96/EdgeSAM/blob/master/scripts/export_onnx_model.py).
+
+- [YOLO-World](https://github.com/AILab-CVC/YOLO-World)
+
+`YOLO-World` enhances the YOLO series by incorporating vision-language modeling, achieving efficient open-scenario object detection with impressive performance on various tasks.
+
+> 论文：Real-Time Open-Vocabulary Object Detection</br>
+> 单位：Tencent AI Lab, ARC Lab, Tencent PCG, Huazhong University of Science and Technology.</br>
+> 发表：Arxiv 2024</br>
+
+参考此[教程](../../tools/export_yolow_onnx.py).