How to detect any object using caption inputs ? #334

LuletterSoul · 2024-05-20T07:02:03Z

Hello, thank you for sharing excellent work. Currently, the model calculates the similarity between text tokens and image features, selecting the top1 as its class. If I input a single sentence as text information (similar to Grounding DINO), will the model still work correctly? If so, how should it be modified?

wondervictor · 2024-05-20T07:18:06Z

You can find it at: #315 (comment). I'll add the caption input in the demo considering many requests about caption input.

wondervictor mentioned this issue May 20, 2024

Roadmap of YOLO-World #109

Open

16 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to detect any object using caption inputs ? #334

How to detect any object using caption inputs ? #334

LuletterSoul commented May 20, 2024

wondervictor commented May 20, 2024

How to detect any object using caption inputs ? #334

How to detect any object using caption inputs ? #334

Comments

LuletterSoul commented May 20, 2024

wondervictor commented May 20, 2024