Name		Name	Last commit message	Last commit date
parent directory ..
2415943373.jpg		2415943373.jpg
README.md		README.md
a1_perception_demo.py		a1_perception_demo.py
a1_perception_demo_perplexity.py		a1_perception_demo_perplexity.py
a2_description_demo.py		a2_description_demo.py
a3_assessment_all.py		a3_assessment_all.py
midjourney_lowstep_036.jpg		midjourney_lowstep_036.jpg

README.md

Example Code for Q-Bench

In this folder, we provide example code we use to evaluate IDEFICS, the open-source MLLM published by Huggingface. As it does not require any extra dependency other than the transformers library, we use it as an example to show how our three tasks in the Q-bench works.

Running a Demo for Perception (A1) / Description (A2) ability

To run the IDEFICS demo for Q-Bench, you can first install the dependencies:

shell pip install pillow pip install transformers>=4.33.1

and then run python example_code_for_idefics/a1_perception_demo.py or python example_code_for_idefics/a2_description_demo.py to run IDEFICS for LLVisionQA and LLDescribe respectively.

As our evaluation is submission-based, it will be very nice of incoming models to submit with similar simple files as a demo for us to conduct the evaluation. Make sure that your demo can successfully answer the visual question related to 2415943374.jpg, and describe the low-level information of midjourney_lowstep_036.jpg before your submission.

New: In addition to the normal multi-choice questions (MCQ) for evaluation, we also add a close-set inference (PPL-based) script for the perception (A1) task, and we would like to allow models to submit both results/scripts. For close-set inference, please refer to our demo scripts.

Evaluating Assessment (A3) ability

As the 7 IQA databases we use in the Q-Bench are open-source, and there is no known strategies for MLLMs to overfit on IQA scores for enormous images, the Assessment (A3) ability can be directly evaluated by any user (including publisher of MLLMs). To run the evaluation code, download all IQA datasets as listed in IQA databases, additionally install pip install scipy in shell, and run the script as follows:

python example_code_for_idefics/a3_assessment_all.py

The results will be stored in the IQA_outputs/idefics directory.

Contact

Please contact any of the first authors of this paper for queries.

Haoning Wu, haoning001@e.ntu.edu.sg
Zicheng Zhang, zzc1998@sjtu.edu.cn
Erli Zhang, ezhang005@e.ntu.edu.sg

Citation

If you find our work interesting, please feel free to cite our paper:

@article{wu2023qbench,
    title={Q-Bench: A Benchmark for General-Purpose Foundation Models on Low-level Vision},
    author={Wu, Haoning and Zhang, Zicheng and Zhang, Erli and Chen, Chaofeng and Liao, Liang and Wang, Annan and Li, Chunyi and Sun, Wenxiu and Yan, Qiong and Zhai, Guangtao and Lin, Weisi},
    year={2023},
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

example_code_for_idefics

example_code_for_idefics

README.md

Example Code for Q-Bench

Running a Demo for Perception (A1) / Description (A2) ability

Evaluating Assessment (A3) ability

Contact

Citation

Files

example_code_for_idefics

Directory actions

More options

Directory actions

More options

Latest commit

History

example_code_for_idefics

Folders and files

parent directory

README.md

Example Code for Q-Bench

Running a Demo for Perception (A1) / Description (A2) ability

Evaluating Assessment (A3) ability

Contact

Citation