Roadmap of YOLO-World #109

wondervictor · 2024-03-07T14:07:59Z

taofuyu · 2024-03-08T02:17:43Z

torch.einsum() should be replaced by torch.matmul() and torch.sum(), because einsum() is not supported by most edge devices.
For example, I rewrite the code:
x = torch.einsum('bchw,bkc->bkhw', x, w)
to
batch, channel, height, width = x.shape
_, k, _ = w.shape
x = x.permute(0, 2, 3, 1) # bchw->bhwc
x = x.reshape(batch, -1, channel) # bhwc->b(hw)c
w = w.permute(0, 2, 1) # bkc->bck
x = torch.matmul(x, w)
x = x.reshape(batch, height, width, k)
x = x.permute(0, 3, 1, 2)
Maybe it is ugly, but it can be deployed.
@wondervictor

wondervictor · 2024-03-08T02:27:28Z

@taofuyu Good idea, Got it!

mio410 · 2024-03-08T02:54:38Z

@wondervictor May I ask where should I modify if I want to try using the effect of other text encoders, such as changing the text encoder of CLIP to BEIT-3. Thank you!

wondervictor · 2024-03-08T02:58:12Z

@mio410 Good idea, we do plan to use better and stronger text encoders (e.g., CLIP-Large) now and we are queuing for computation resources to pre-train it. BEIT-3 is a good choice and we are considering it. BTW, what model size are you most in need of currently? I can prioritize that.

mio410 · 2024-03-08T03:06:39Z

@wondervictor May I ask where should I modify if I want to try using the effect of other text encoders, such as changing the text encoder of CLIP to BEIT-3. Thank you!
Besides, I'd like to try using a CLIP model in a different language to see if I can use prompts in that language for open vocabulary detection. Is this possible?

mio410 · 2024-03-08T03:16:24Z

@mio410 Good idea, we do plan to use better and stronger text encoders (e.g., CLIP-Large) now and we are queuing for computation resources to pre-train it. BEIT-3 is a good choice and we are considering it. BTW, what model size are you most in need of currently? I can prioritize that.

I'm looking forward to your work! If possible, I'd like to try open vocabulary detection in other languages. Could you help me with that?

taofuyu · 2024-03-08T03:28:31Z

@wondervictor May I ask where should I modify if I want to try using the effect of other text encoders, such as changing the text encoder of CLIP to BEIT-3. Thank you!

here

dikapiliao1 · 2024-03-26T09:29:15Z

Yolo World is based on the word embedding of clip for reparameterization. If we could replace clip with a larger model similar to ChatGPT4, would it understand more？ similar to Sora's powerful ability to understand images.

wondervictor · 2024-03-26T09:45:34Z

Yolo World is based on the word embedding of clip for reparameterization. If we could replace clip with a larger model similar to ChatGPT4, would it understand more？ similar to Sora's powerful ability to understand images.

Hi @dikapiliao1, it's a nice idea and we plan to do it.

xianhonghuang · 2024-03-28T03:40:52Z

如果我想要更改不同的視覺的backbone要在哪裡可以更改?

wondervictor · 2024-03-28T03:46:44Z

如果我想要更改不同的視覺的backbone要在哪裡可以更改?

@xianhonghuang replace the image_model config according to your demand:

backbone=dict(
    _delete_=True,
    type='MultiModalYOLOBackbone',
    image_model={{_base_.model.backbone}},
    text_model=dict(
        type='HuggingCLIPLanguageBackbone',
        model_name=text_model_name,
        frozen_modules=['all'])),

xianhonghuang · 2024-03-28T03:54:14Z

如果我想更改不同的主幹線要在哪裡可以更改？

@xianhonghuangimage_model根據您的需求替換配置：

backbone=dict(
    _delete_=True,
    type='MultiModalYOLOBackbone',
    image_model={{_base_.model.backbone}},
    text_model=dict(
        type='HuggingCLIPLanguageBackbone',
        model_name=text_model_name,
        frozen_modules=['all'])),

像是更改_base_ = ('../../third_party/mmyolo/configs/yolov8/'
'yolov8_l_syncbn_fast_8xb16-500e_coco.py')這部分嗎?
我想要先更改成Yolov7的backbone

wondervictor · 2024-03-28T04:07:04Z

Hi @xianhonghuang, you can directly override the backbone dictionary configs, e.g., change it to YOLOv7Backbone. BTW, it's suggested to open a new issue to discuss this question and this issue aims for new features and suggestions.

RudyCheng · 2024-04-19T01:15:28Z

config：yolo_world_v2_xl_vlpan_bn_2e-3_100e_4x8gpus_obj365v1_goldg_train_lvis_minival.py is not suit for its model weights

wondervictor · 2024-04-28T08:41:19Z

@RudyCheng, it has been resolved.

xiyuan27 · 2024-05-02T13:31:09Z

[target detection on document images], Are there any specialized optimization strategies or support for target detection in vertical domains, specifically for document images such as invoices and passports?

thgpddl · 2024-06-24T13:44:30Z

请问为什么image_demo.py输入的text经过","分割后，还要追加一个空字符串呢，加入text=cat,dog,man，经过代码处理后text=cat,dog,man," "

spacewalk01 · 2024-07-04T02:38:29Z

any progress on tensorrt implementation? thanks

PrinceP · 2024-08-04T10:40:23Z

@spacewalk01 https://github.com/PrinceP/tensorrt-cpp-for-onnx?tab=readme-ov-file#yolo-world without dynamic batch support

myb1314yxy · 2024-10-06T12:37:40Z

I would like to ask, how should I use yolo-world to implement unknown classes detection in my own dataset, how should the dataset be divided and made, do I need to pre-define all known classes in yaml file?

wondervictor added enhancement New feature or request help wanted Extra attention is needed discussions The issue might be helpful or contains useful information labels Mar 7, 2024

wondervictor pinned this issue Mar 7, 2024

wondervictor self-assigned this Mar 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Roadmap of YOLO-World #109

Roadmap of YOLO-World #109

wondervictor commented Mar 7, 2024 •

edited

Loading

taofuyu commented Mar 8, 2024

wondervictor commented Mar 8, 2024

mio410 commented Mar 8, 2024

wondervictor commented Mar 8, 2024

mio410 commented Mar 8, 2024

mio410 commented Mar 8, 2024

taofuyu commented Mar 8, 2024

dikapiliao1 commented Mar 26, 2024

wondervictor commented Mar 26, 2024

xianhonghuang commented Mar 28, 2024

wondervictor commented Mar 28, 2024

xianhonghuang commented Mar 28, 2024 •

edited

Loading

wondervictor commented Mar 28, 2024

RudyCheng commented Apr 19, 2024

wondervictor commented Apr 28, 2024

xiyuan27 commented May 2, 2024

thgpddl commented Jun 24, 2024

spacewalk01 commented Jul 4, 2024

PrinceP commented Aug 4, 2024

myb1314yxy commented Oct 6, 2024

Roadmap of YOLO-World #109

Roadmap of YOLO-World #109

Comments

wondervictor commented Mar 7, 2024 • edited Loading

TODO List (Community Version)

taofuyu commented Mar 8, 2024

wondervictor commented Mar 8, 2024

mio410 commented Mar 8, 2024

wondervictor commented Mar 8, 2024

mio410 commented Mar 8, 2024

mio410 commented Mar 8, 2024

taofuyu commented Mar 8, 2024

dikapiliao1 commented Mar 26, 2024

wondervictor commented Mar 26, 2024

xianhonghuang commented Mar 28, 2024

wondervictor commented Mar 28, 2024

xianhonghuang commented Mar 28, 2024 • edited Loading

wondervictor commented Mar 28, 2024

RudyCheng commented Apr 19, 2024

wondervictor commented Apr 28, 2024

xiyuan27 commented May 2, 2024

thgpddl commented Jun 24, 2024

spacewalk01 commented Jul 4, 2024

PrinceP commented Aug 4, 2024

myb1314yxy commented Oct 6, 2024

wondervictor commented Mar 7, 2024 •

edited

Loading

xianhonghuang commented Mar 28, 2024 •

edited

Loading