[ACL 2024] User-friendly evaluation framework: Eval Suite & Benchmarks: UHGEval, HaluEval, HalluQA, etc.
benchmark
evaluation
dataset
openai
hallucination
huggingface
huggingface-transformers
ceval
gpt-3
openai-api
hallucinations
gpt-4
large-language-models
llm
chatgpt
qwen
hallucination-evaluation
hallucination-detection
-
Updated
Oct 8, 2024 - Python