Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify APIs of RAGChecker #1

Merged
merged 6 commits into from
Jul 15, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
80 changes: 58 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,13 +34,13 @@ python -m spacy download en_core_web_sm
```


### Run the Checking Pipeline
### Run the Checking Pipeline with CLI

Please process your own data with the same format as [examples/checking_inputs.json](./examples/checking_inputs.json). The only required annotation for each query is the `ground truth answer (gt_answer)`.

```json
{
"input_data": [
"results": [
{
"query_id": "<query id>", # string
"query": "<input query>", # string
Expand All @@ -59,59 +59,95 @@ Please process your own data with the same format as [examples/checking_inputs.j
}
```

If you are using AWS Bedrock version of Llama3 70B for the claim extractor and checker, use the following command to run the checking pipeline, the checking results will be saved to `--output_path`:
If you are using AWS Bedrock version of Llama3 70B for the claim extractor and checker, use the following command to run the checking pipeline, the checking results as well as intermediate results will be saved to `--output_path`:


```bash
python ragchecker/checking.py \
python ragchecker/cli.py \
--input_path=examples/checking_inputs.json \
--output_path=examples/checking_outputs.json \
--extractor_name=bedrock/meta.llama3-70b-instruct-v1:0 \
--checker_name=bedrock/meta.llama3-70b-instruct-v1:0 \
--batch_size_extractor=64 \
--batch_size_checker=64 \
--answer2response \
--response2answer \
--retrieved2response \
--retrieved2answer
--metrics all
```

Please refer to [RefChecker's guidance](https://github.com/amazon-science/RefChecker/tree/main?tab=readme-ov-file#choose-models-for-the-extractor-and-checker) for setting up the extractor and checker models.

### Computing Metrics

Use the following command for computing the metrics:

```bash
python ragchecker/rag_eval.py --file=examples/checking_outputs.json
```

It will output the values for the metrics like follows:

```json
Results for examples/checking_outputs.json:
{
"overall_metrics": {
"overall": {
"precision": 73.3,
"recall": 62.5,
"f1": 67.5
"f1": 67.3
},
"retriever_metrics": {
"retriever": {
"claim_recall": 61.4,
"context_precision": 87.5
},
"generator_metrics": {
"generator": {
"context_utilization": 87.5,
"noise_sensitivity_in_relevant": 22.5,
"noise_sensitivity_in_irrelevant": 0.0,
"hallucination": 4.2,
"self_knowledge": 25.0,
"faithfulness": 70.8,
"claim_count": 8,
"faithfulness": 70.8
}
}
```

### Run the Checking Pipeline with Python
```python
from ragchecker import RAGResults, RAGChecker


# initialize ragresults from json/dict
with open("examples/checking_inputs.json") as fp:
rag_results = RAGResults.from_json(fp.read())

# set-up the evaluator
evaluator = RAGChecker(
extractor_name="bedrock/meta.llama3-70b-instruct-v1:0",
checker_name="bedrock/meta.llama3-70b-instruct-v1:0",
batch_size_extractor=32,
batch_size_checker=32,
)

# evaluate results with selected metrics or certain groups, e.g., "retriever", "generator", "all"
evaluator.evaluate(rag_results, "all")
print(rag_results)

"""Output
RAGResults(
2 RAG results,
Metrics:
{
"overall": {
"precision": 76.4,
"recall": 62.5,
"f1": 68.3
},
"retriever": {
"claim_recall": 61.4,
"context_precision": 87.5
},
"generator": {
"context_utilization": 87.5,
"noise_sensitivity_in_relevant": 19.1,
"noise_sensitivity_in_irrelevant": 0.0,
"hallucination": 4.5,
"self_knowledge": 27.3,
"faithfulness": 68.2
}
}
)
"""
```

## Meta-Evaluation

Please refer to [data/meta_evaluation](./data/meta_evaluation/README.md) on meta-evaluation for the effectiveness of RAGChecker.
Expand Down
2 changes: 1 addition & 1 deletion examples/checking_inputs.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{
"input_data": [
"results": [
{
"query_id": "0",
"query": "What's the longest river in the world?",
Expand Down
Loading