- [09.04] Benchmarking Large Language Models in Retrieval-Augmented Generation
- [11.14] RECALL: A Benchmark for LLMs Robustness against External Counterfactual Knowledge
- [11.16] ARES: An Automated Evaluation Framework for Retrieval-Augmented Generation Systems
- [11.24] RAGAS: Automated Evaluation of Retrieval Augmented Generation