🚀 Our framework encapsulates the evolution of OOD detection and related tasks in the VLM era, fostering collaborative efforts among each community 🤝 |
---|
Atsuyuki Miyai1 Jingkang Yang2,† Jingyang Zhang3 Yifei Ming4 Yueqian Lin3
Qing Yu1,5 Go Irie6 Shafiq Joty4,2 Yixuan Li7 Hai Li3 Ziwei Liu2,†
Toshihiko Yamasaki1 Kiyoharu Aizawa1
This is a repository of our survey paper. We hope that our survey can help readers and participants better understand the demanding challenges on OOD detection and related topics in the VLM era.
This repository plays the following two roles:
- This repository provides an easily accessible list of the references mentioned in the paper Table 2. This list will continue to include more promising works as new ones emerge. Please feel free to recommend relevant and good works via Pull Request.
- We hope that this repository will become a discussion panel for readers to ask questions, raise concerns, and make constructive comments. Feel free to post your ideas to Issues.
We present a generalized OOD detection v2, encapsulating the evolution of Anomaly Detection (AD), Novelty Detection (ND), Open-set Recognition (OSR), Out-of-distribution (OOD) detection, and Outlier Detection (OD) in the VLM era. Our framework reveals that, with some field inactivity and integration, the demanding challenges in the VLM era have become OOD detection and AD. In addition to the inter-field evolution, we also highlight the significant shift in the definition, problem settings, and benchmarks; our work thus features a comprehensive review of the methodology for OOD detection, including in-depth discussion over other related tasks to clarify their relationship to and influence on OOD detection. Finally, we explore the advancements in the emerging Large Vision Language Model (LVLM) era, represented by GPT-4V. We conclude this survey with open challenges and potential research directions of OOD detection in the VLM and LVLM era.
CLIP-based OOD Detection
We introduce methods for CLIP-based OOD detection and CLIP-based AD.
To provide diverse perspectives on OOD detection approaches, we have encompassed a wide range of methods, including preprints.
In the LVLM Era, OOD detection and related topics have evolved as follows:
This repository is built upon the foundation of the following resources: generalized OOD detection v1, OpenOOD codebase.
If you have questions or find any mistake, please open an issue mentioning @AtsuMiyai.
If you find our survey paper helpful for your research, please consider citing the following paper:
@article{miyai2024generalized2,
title={Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey},
author={Miyai, Atsuyuki and Yang, Jingkang and Zhang, Jingyang and Ming, Yifei and Lin, Yueqian and Yu, Qing and Irie, Go and Joty, Shafiq and Li, Yixuan and Li, Hai and Liu, Ziwei and Yamasaki, Toshihiko and Aizawa, Kiyoharu},
journal={arXiv preprint arXiv:2407.21794},
year={2024}
}
Besides, please also consider citing our other projects that are closely related to this survey.
- Our Directly Related Projects
# generalized OOD detection framework v1, survey
@article{yang2024generalized,
title={Generalized out-of-distribution detection: A survey},
author={Yang, Jingkang and Zhou, Kaiyang and Li, Yixuan and Liu, Ziwei},
journal={IJCV},
pages={1--28},
year={2024},
}
# MCM (Zero-shot OOD detection)
@inproceedings{ming2022delving,
title={Delving into out-of-distribution detection with vision-language representations},
author={Ming, Yifei and Cai, Ziyang and Gu, Jiuxiang and Sun, Yiyou and Li, Wei and Li, Yixuan},
booktitle={NeurIPS},
year={2022}
}
# GL-MCM (Zero-shot OOD detection)
@article{miyai2023zero,
title={Zero-Shot In-Distribution Detection in Multi-Object Settings Using Vision-Language Foundation Models},
author={Miyai, Atsuyuki and Yu, Qing and Irie, Go and Aizawa, Kiyoharu},
journal={arXiv preprint arXiv:2304.04521},
year={2023}
}
# PEFT-MCM (Few-shot OOD detection, Concurrent work with LoCoOp)
@article{ming2024does,
title={How Does Fine-Tuning Impact Out-of-Distribution Detection for Vision-Language Models?},
author={Ming, Yifei and Li, Yixuan},
journal={IJCV},
volume={132},
number={2},
pages={596--609},
year={2024},
}
# LoCoOp (Few-shot OOD detection, Concurrent work with PEFT-MCM)
@inproceedings{miyai2023locoop,
title={LoCoOp: Few-Shot Out-of-Distribution Detection via Prompt Learning},
author={Miyai, Atsuyuki and Yu, Qing and Irie, Go and Aizawa, Kiyoharu},
booktitle={NeurIPS},
year={2023}
}
# UPD
@article{miyai2024upd,
title={Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models},
author={Miyai, Atsuyuki and Yang, Jingkang and Zhang, Jingyang and Ming, Yifei and Yu, Qing and Irie, Go and Li, Yixuan and Li, Hai and Liu, Ziwei and Aizawa, Kiyoharu},
journal={arXiv preprint arXiv:2403.20331},
year={2024}
}
- Our Other Projects
# OpenOOD
@inproceedings{yang2022openood,
title={Openood: Benchmarking generalized out-of-distribution detection},
author={Yang, Jingkang and Wang, Pengyun and Zou, Dejian and Zhou, Zitang and Ding, Kunyuan and Peng, Wenxuan and Wang, Haoqi and Chen, Guangyao and Li, Bo and Sun, Yiyou and others},
booktitle={NeurIPS Datasets and Benchmarks Track},
year={2022}
}
# OpenOOD v1.5 report
@article{zhang2023openood,
title={OpenOOD v1.5: Enhanced Benchmark for Out-of-Distribution Detection},
author={Zhang, Jingyang and Yang, Jingkang and Wang, Pengyun and Wang, Haoqi and Lin, Yueqian and Zhang, Haoran and Sun, Yiyou and Du, Xuefeng and Zhou, Kaiyang and Zhang, Wayne and Li, Yixuan and Liu, Ziwei and Chen, Yiran and Li, Hai},
journal={arXiv preprint arXiv:2306.09301},
year={2023}
}