OWL-ViT image-conditioned input: Is that possibale to input multiple images? #1143

liu-bohan · 2025-01-13T19:11:26Z

I want to detect multiple image-conditioned queries on a single image at one time.

I use the code from OWL_ViT_minimal_example.ipynb for Image-conditioned detection.

target_class_predictions = class_predictor(
    image_features=feature_map.reshape(b, h * w, d),
    query_embeddings=query_embedding[None, None, ...],  # [batch, queries, d]
)

It looks like I can input multiple image queries with only one query_embedding. But, after I get predicted logits and bboxes, I can't figure out corresponding labels and bboxes.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OWL-ViT image-conditioned input: Is that possibale to input multiple images? #1143

OWL-ViT image-conditioned input: Is that possibale to input multiple images? #1143

liu-bohan commented Jan 13, 2025

OWL-ViT image-conditioned input: Is that possibale to input multiple images? #1143

OWL-ViT image-conditioned input: Is that possibale to input multiple images? #1143

Comments

liu-bohan commented Jan 13, 2025