-
EVEv1 - Unveiling Encoder-Free Vision-Language Models (NeurIPS 2024, 2024/09)
-
EVEv2 - EVEv2: Improved Baselines for Encoder-Free Vision-Language Models (ArXiv 2025, 2025/02)
-
Can we remove vision encoder from VLMs?
-
How to transfer an LLM to an encoder-free VLM efficiently and stably?
-
How to bridge the performance gap between encoder-free and encoder-based VLMs?
[2025/02/09] 🔥🔥🔥 The paper, weights, and code of EVEv2 are released ! 💥
[2024/09/26] Our EVE has been accepted by NeurIPS 2024 (spotlight) ! 💥
[2024/06/18] The paper, weights, and code of EVE are released ! 💥
-
🔥 Superior Capability: An originated encoder-free LVLM with arbitrary image aspect ratio, outperforming the counterparts and approaching existing modular encoder-based LVLMs.
-
🔥 Data Efficiency: Filter and recaption solely <100M publicly avaliable data from OpenImages, SAM, LAION, Datacomp for pre-training.
-
🔥 Pioneering Route: We attempt to provide an efficient, transparent, and practical training strategy and procedure for developing a pure decoder-only architecture across modalities.
If EVE is helpful for your research, please consider star ⭐ and citation 📝 :
@article{diao2024EVE,
title={Unveiling Encoder-Free Vision-Language Models},
author={Diao, Haiwen and Cui, Yufeng and Li, Xiaotong and Wang, Yueze and Lu, Huchuan and Wang, Xinlong},
journal={arXiv preprint arXiv:2406.11832},
year={2024}
}
@article{diao2025EVEv2,
title={EVEv2: Improved Baselines for Encoder-Free Vision-Language Models},
author={Diao, Haiwen and Li, Xiaotong and Cui, Yufeng and Wang, Yueze and Deng, Haoge and Pan, Ting and Wang, Wenxuan and Lu, Huchuan and Wang, Xinlong},
journal={arXiv preprint arXiv:2502.06788},
year={2025}
}
The content of this project itself is licensed under LICENSE.