👀 TopViewRS: Vision-Language Models as Top-View Spatial Reasoners

A novel evaluation benchmark for spatial reasoning of vision-language models.

📄 [Arxiv] · 🕸️ [Project Page] · 🤗 [Data]

Key takeaways

Define top-view spatial reasoning task for VLMs via 4 carefully designed tasks of increasing complexity, also encompassing 9 distinct fine-grained sub-tasks with a structured design of the questions focusing on different model abilities.

Collect TopViewRS Dataset (Top-View Reasoning in Space), comprising 11,384 multiple-choice questions with either photo-realistic or semantic top-view maps of real-world scenarios

Investigate 10 VLMs from different model families and sizes, highlighting the performance gap compared to human annotators.

Dataset

Part of the benchmark is now available on Huggingface: https://huggingface.co/datasets/chengzu/topviewrs.

Code

Coming soon.

Citation

If you find TopViewRS useful:

@misc{li2024topviewrs,
      title={TopViewRS: Vision-Language Models as Top-View Spatial Reasoners}, 
      author={Chengzu Li and Caiqi Zhang and Han Zhou and Nigel Collier and Anna Korhonen and Ivan Vulić},
      year={2024},
      eprint={2406.02537},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
figs		figs
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

👀 TopViewRS: Vision-Language Models as Top-View Spatial Reasoners

A novel evaluation benchmark for spatial reasoning of vision-language models.

Key takeaways

Dataset

Code

Citation

About

Releases

Packages

cambridgeltl/topviewrs

Folders and files

Latest commit

History

Repository files navigation

👀 TopViewRS: Vision-Language Models as Top-View Spatial Reasoners

A novel evaluation benchmark for spatial reasoning of vision-language models.

Key takeaways

Dataset

Code

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages