Can your MLLM understand the highly abstract image aesthetics like humans? Come and test on AesBench !
Multimodal Large Language Models (MLLMs) are undergoing flourishing development, promoting human-machine interaction and collaboration in daily life. However, their capacities for understanding image aesthetics largely remain unexplored. This may impede the applications of advanced MLLMs in real-world scenarios, such as art design and image generation. To address this dilemma, we introduce AesBench, an expert benchmark to systematically evaluate the aesthetic understanding capacities of MLLMs. In this benchmark, high-quality annotations are first collected from aesthetic experts, based on which an aesthetics understanding benchmark dataset is built. In addition, we design a set of integrative criteria to evaluate MLLMs from four shallow-to-deep perspectives, including perception (AesP), empathy (AesE), assessment (AesA), and interpretation (AesI). We hope this work can encourage the community to delve into more profound investigations of the yet untapped potential of MLLMs in image aesthetics understanding.
- [2024/07/26] AesBench was reported at VIVO Imaging Event. 🎉🎉🎉
- [2024/07/18] Leaderboard is now accessible at Homepage. 🔥🔥🔥
- [2024/06/19] We have integrated AesBench to the evaluation toolkit VLMEvalKit, providing a highly convenient testing solution! 🔥🔥🔥
- [2024/01/18] Database of AesBench now support Huggingface! 🤗🤗🤗
- [2024/01/17] We have released the Evaluation Database and Codes of AesBench! Check Here for more details. 🚩🚩🚩
AesBench Leaderboard is continuously being updated.
- Supported closed-source commercial models
| GPT-4v | GPT-4o | Gemini-1.0-Pro | Claude3-Opus | BlueImage-GPT |
- Supported open-source models
| MiniCPM-L3-2.5 | Q-Instruct | InstructBLIP | MiniGPT-4 | MiniGPT-v2 |
| IDEFICS_Instruct | GLM | Otter | TinyGPT-v | Qwen-VL |
| LLaVA | LLaVA-1.5 | mPLUG-Owl2 | ShareGPT4V | SPHINX-MoE |
📌 TO DO
- ✅ BlueImage-GPT
- LLaVA-1.6
- OmniLMM-12B
Any models that you would like to test on AesBench, please contact us.
- Please see our release for details.
We sincerely thank the 32 aesthetic experts who participated in the subjective experiments. Their rich aesthetic experience and responsible attitude make the benchmark results more reliable. We highlight partial contributors as follows:
Wei Liu (educator), Xin Liu (researcher), Luxia Chen (educator), Tianjiao Gu (educator), Dahai Tian (educator), Ziyan Ou (art student)
We extend our heartfelt thanks to our team members for their invaluable assistance in collecting data and deploying the MLLMs. We highlight partial collaborators as follows:
Zhichao Duan, Pangu Xie, Xinrui Xu, Yanxin Shi
If you find our work interesting, please feel free to cite our paper:
@article{AesBench,
title={AesBench: An Expert Benchmark for Multimodal Large Language Models on Image Aesthetics Perception},
author={Huang, Yipo and Yuan, Quan and Sheng, Xiangfei and Yang, Zhichao and Wu, Haoning and Chen, Pengfei and Yang, Yuzhe and Li, Leida and Lin, Weisi},
journal={arXiv preprint arXiv:2401.08276},
year={2024},
}