Skip to content

cambridgeltl/topviewrs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation


👀 TopViewRS: Vision-Language Models as Top-View Spatial Reasoners

A novel evaluation benchmark for spatial reasoning of vision-language models.

📄 [Arxiv] · 🕸️ [Project Page] · 🤗 [Data]

Key takeaways

  • Define top-view spatial reasoning task for VLMs via 4 carefully designed tasks of increasing complexity, also encompassing 9 distinct fine-grained sub-tasks with a structured design of the questions focusing on different model abilities.
  • Collect TopViewRS Dataset (Top-View Reasoning in Space), comprising 11,384 multiple-choice questions with either photo-realistic or semantic top-view maps of real-world scenarios
  • Investigate 10 VLMs from different model families and sizes, highlighting the performance gap compared to human annotators.

sicl

Dataset

Part of the benchmark is now available on Huggingface: https://huggingface.co/datasets/chengzu/topviewrs.

Code

Coming soon.

Citation

If you find TopViewRS useful:

@misc{li2024topviewrs,
      title={TopViewRS: Vision-Language Models as Top-View Spatial Reasoners}, 
      author={Chengzu Li and Caiqi Zhang and Han Zhou and Nigel Collier and Anna Korhonen and Ivan Vulić},
      year={2024},
      eprint={2406.02537},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published