Authors: Anisha Gunjal, Greg Durrett
Please check out our work here 📃
This work evaluates the impact of context and granularity on the factual verification of atomic claims generated by large language models (LLMs). We introduce a framework termed molecular facts, which are optimized for both completeness and brevity. The molecular facts are characterized by two principal attributes:
- Decontextuality - The ability of claims to be understood independently of additional contextual information.
- Minimality - The minimum amount of information required to ensure claims are self-sufficient.
We quantify the impact of decontextualization on minimality, then present a baseline methodology for generating molecular facts automatically, aiming to add the right amount of information.
An example of generating molecular facts is provided in demo.ipynb.
import os
openai_key = os.environ["OPENAI_API_KEY"]
- Step 1: Check for ambiguity in a claim
from src.utils import ambiguity_check
llm_response = <llm-response> # long form LLM response
claim = <claim> # extracted from LLM response
disambig_dict, _, _ = ambiguity_check(claim, openai_key=openai_key)
- Step 2: Decontextualize to generate molecular facts
from src.utils import decontextualize_ambiguity
disambig_decontext, _, _ = decontextualize_ambiguity(claim, disambig_dict, llm_response, openai_key=openai_key)
If you found our work useful, please consider citing our work.
@misc{gunjal2024molecular,
title={Molecular Facts: Desiderata for Decontextualization in LLM Fact Verification},
author={Anisha Gunjal and Greg Durrett},
year={2024},
eprint={2406.20079},
archivePrefix={arXiv},
primaryClass={cs.CL}
}