You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, thank you for sharing excellent work. Currently, the model calculates the similarity between text tokens and image features, selecting the top1 as its class. If I input a single sentence as text information (similar to Grounding DINO), will the model still work correctly? If so, how should it be modified?
The text was updated successfully, but these errors were encountered:
Hello, thank you for sharing excellent work. Currently, the model calculates the similarity between text tokens and image features, selecting the top1 as its class. If I input a single sentence as text information (similar to Grounding DINO), will the model still work correctly? If so, how should it be modified?
The text was updated successfully, but these errors were encountered: