-
Notifications
You must be signed in to change notification settings - Fork 467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Roadmap of YOLO-World #109
Comments
torch.einsum() should be replaced by torch.matmul() and torch.sum(), because einsum() is not supported by most edge devices. |
@taofuyu Good idea, Got it! |
@wondervictor May I ask where should I modify if I want to try using the effect of other text encoders, such as changing the text encoder of CLIP to BEIT-3. Thank you! |
@mio410 Good idea, we do plan to use better and stronger text encoders (e.g., CLIP-Large) now and we are queuing for computation resources to pre-train it. BEIT-3 is a good choice and we are considering it. BTW, what model size are you most in need of currently? I can prioritize that. |
|
I'm looking forward to your work! If possible, I'd like to try open vocabulary detection in other languages. Could you help me with that? |
|
Yolo World is based on the word embedding of clip for reparameterization. If we could replace clip with a larger model similar to ChatGPT4, would it understand more? similar to Sora's powerful ability to understand images. |
Hi @dikapiliao1, it's a nice idea and we plan to do it. |
如果我想要更改不同的視覺的backbone要在哪裡可以更改? |
@xianhonghuang replace the backbone=dict(
_delete_=True,
type='MultiModalYOLOBackbone',
image_model={{_base_.model.backbone}},
text_model=dict(
type='HuggingCLIPLanguageBackbone',
model_name=text_model_name,
frozen_modules=['all'])), |
像是更改_base_ = ('../../third_party/mmyolo/configs/yolov8/' |
Hi @xianhonghuang, you can directly override the backbone dictionary configs, e.g., change it to YOLOv7Backbone. BTW, it's suggested to open a new issue to discuss this question and this issue aims for new features and suggestions. |
config:yolo_world_v2_xl_vlpan_bn_2e-3_100e_4x8gpus_obj365v1_goldg_train_lvis_minival.py is not suit for its model weights |
@RudyCheng, it has been resolved. |
[target detection on document images], Are there any specialized optimization strategies or support for target detection in vertical domains, specifically for document images such as invoices and passports? |
请问为什么image_demo.py输入的text经过","分割后,还要追加一个空字符串呢,加入text=cat,dog,man,经过代码处理后text=cat,dog,man," " |
any progress on tensorrt implementation? thanks |
@spacewalk01 https://github.com/PrinceP/tensorrt-cpp-for-onnx?tab=readme-ov-file#yolo-world without dynamic batch support |
I would like to ask, how should I use yolo-world to implement unknown classes detection in my own dataset, how should the dataset be divided and made, do I need to pre-define all known classes in yaml file? |
This issue will be kept open and pinned for a long time, as we hope to hear everyone's opinions, suggestions, and needs!
We want to make YOLO-World stronger and encourage more diverse applications, especially practical ones. We maintain an open and free attitude. YOLO-World is currently in active development and improvement, and we are trying our best to do well in upstream pre-training and downstream deployment tools. At present, our manpower is limited, so we hope you can give us some time and contribute your experience or help when you can!
If you have a good idea or need, just reply to this issue and @ me. I will respond promptly when I see it, and consider adding it to the TODO list.
TODO List (Community Version)
🎯: High priority or on-going.
torch.enisum
(👍 thank @taofuyu for replace einsum() with other ops #118)mask-refine
(YOLO-WORLD-S在coco上finetune无法复现,且validation map呈现下降趋势 #160 finetuneing on custom dataset #72 yolo-wolrd-l 在 coco上finetune 无法成功复现 #76).The text was updated successfully, but these errors were encountered: