Add MultipanelVQA and POPE vision-language scenarios #2517

ImKeTT · 2024-03-28T07:23:23Z

Hello, this PR is to add two vision-language scenarios to VHELM --- MultipanelVQA from https://arxiv.org/abs/2401.15847 and the POPE benchmark from https://aclanthology.org/2023.emnlp-main.20/.

There are two subjects (synthetic or real-world) and two question_type (multiple-choice or open) for MultipanelVQA, I use get_short_answer_generation_adapter_spec for open-ended generation and get_multiple_choice_joint_adapter_spec for multiple-choice type questions. For both scenarios, I use get_exact_match_metric_specs for evaluation.

Here's a screenshot after running ./pre-commit.sh

Here're several screenshots and the scenario_state.json of toy runs on two scenarios (Qwen-VL-Chat on 25 instances):
POPE

pope_scenario_state.json

MultipanelVQA-real-world

mpvqa-real-open-scenario_state.json
mpvqa-real-mc-scenario_state.json

MultipanelVQA-synthetic

mpvqa-syn-open-scenario_state.json
mpvqa-syn-mc-scenario_state.json

Please let me know how I can improve it.
Thanks!

teetone

@ImKeTT Thanks for adding these! I had a few minor comments. Could you also add the conf file you used to run in the PR description?

src/helm/benchmark/run_specs/vlm_run_specs.py

src/helm/benchmark/static/schema_vlm.yaml

ImKeTT · 2024-03-29T15:29:30Z

Thanks for reviewing @teetone ! I've re-framed POPE to the MCQA task and added more detailed descriptions for these two scenarios.

Here are the configuration files I used for this PR.
For MultipanelVQA

entries: [
    {description: "multipanelvqa:subject=synthetic,question_type=multiple-choice,model=qwen/qwen-vl-chat", priority: 1}
    {description: "multipanelvqa:subject=synthetic,question_type=open,model=qwen/qwen-vl-chat", priority: 1}
    {description: "multipanelvqa:subject=real-world,question_type=multiple-choice,model=qwen/qwen-vl-chat", priority: 1}
    {description: "multipanelvqa:subject=real-world,question_type=open,model=qwen/qwen-vl-chat", priority: 1}
    ]

For POPE

entries: [
    {description: "pope:model=qwen/qwen-vl-chat", priority: 1}
    ]

src/helm/benchmark/static/schema_vlm.yaml

teetone · 2024-03-30T07:11:26Z

Thanks for reviewing @teetone ! I've re-framed POPE to the MCQA task and added more detailed descriptions for these two scenarios.

Here are the configuration files I used for this PR. For MultipanelVQA

entries: [
    {description: "multipanelvqa:subject=synthetic,question_type=multiple-choice,model=qwen/qwen-vl-chat", priority: 1}
    {description: "multipanelvqa:subject=synthetic,question_type=open,model=qwen/qwen-vl-chat", priority: 1}
    {description: "multipanelvqa:subject=real-world,question_type=multiple-choice,model=qwen/qwen-vl-chat", priority: 1}
    {description: "multipanelvqa:subject=real-world,question_type=open,model=qwen/qwen-vl-chat", priority: 1}
    ]

For POPE

entries: [
    {description: "pope:model=qwen/qwen-vl-chat", priority: 1}
    ]

Thanks @ImKeTT! could you address one last comment in schema_vlm.yaml?

ImKeTT · 2024-03-30T09:01:39Z

Thanks @ImKeTT! could you address one last comment in schema_vlm.yaml?

Sure, I think it's ready to go now, thanks @teetone !

add multipanelvqa and pope vision-language scenario

1886a45

teetone self-requested a review March 29, 2024 07:36

teetone approved these changes Mar 29, 2024

View reviewed changes

src/helm/benchmark/run_specs/vlm_run_specs.py Outdated Show resolved Hide resolved

src/helm/benchmark/static/schema_vlm.yaml Outdated Show resolved Hide resolved

reframe pope to the mcqa task

eeb79a0

teetone reviewed Mar 30, 2024

View reviewed changes

src/helm/benchmark/static/schema_vlm.yaml Outdated Show resolved Hide resolved

modify metric_groups

4f5ee39

teetone merged commit b29fb5e into stanford-crfm:main Mar 31, 2024
6 checks passed

ImKeTT deleted the multipanel-and-pope branch April 9, 2024 15:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MultipanelVQA and POPE vision-language scenarios #2517

Add MultipanelVQA and POPE vision-language scenarios #2517

ImKeTT commented Mar 28, 2024

teetone left a comment

ImKeTT commented Mar 29, 2024

teetone commented Mar 30, 2024

ImKeTT commented Mar 30, 2024

Add MultipanelVQA and POPE vision-language scenarios #2517

Add MultipanelVQA and POPE vision-language scenarios #2517

Conversation

ImKeTT commented Mar 28, 2024

teetone left a comment

Choose a reason for hiding this comment

ImKeTT commented Mar 29, 2024

teetone commented Mar 30, 2024

ImKeTT commented Mar 30, 2024