Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OWL-ViT image-conditioned input: Is that possibale to input multiple images? #1143

Open
liu-bohan opened this issue Jan 13, 2025 · 0 comments

Comments

@liu-bohan
Copy link

I want to detect multiple image-conditioned queries on a single image at one time.

I use the code from OWL_ViT_minimal_example.ipynb for Image-conditioned detection.

target_class_predictions = class_predictor(
    image_features=feature_map.reshape(b, h * w, d),
    query_embeddings=query_embedding[None, None, ...],  # [batch, queries, d]
)

It looks like I can input multiple image queries with only one query_embedding. But, after I get predicted logits and bboxes, I can't figure out corresponding labels and bboxes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant