This repository contains code to evaluate submissions to the SciFact leaderboard, hosted at https://leaderboard.allenai.org. SciFact data and modeling code can be found at https://github.com/allenai/scifact. Description of files and directories follow.
evaluator/
: Contains evaluation code and environment.eval.py
: Evaluation script to be invoked by leaderboard. In all leaderboard code, it is invoked with the--verbose
flag, which reports P, R, and F1 (instead of just F1).Dockerfile
: Specifies Docker env to be used when runningeval.py
.
fixture/
: Contains test fixtures.predictions_dummy.jsonl
: "Dummy" prediction file for all 300 (hidden) test instances that can be submitted to the leaderboard as a test. This submission should not be publicly displayed on the leaderboard.expected_metrics_dummy.json
: Metrics forpredictions_dummy.jsonl
gold_small.jsonl
: Gold labels for first 10 dev set instances.predictions_small.jsonl
: VeriSci predictions on the first 10 dev set instances. To be used as a test to confirm correctness of the evaluation code.expected_metrics_small.json
: Expected results of runningpython evaluator/eval.py
, usinggold_small.jsonl
as thelabels_file
andpredictions_small.jsonl
as thepreds_file
.
test.sh
: Test that checks the correctness of the evaluator onpredictions_small.jsonl
.