Hover

This repo contains the source code of the baseline models described in the following paper

"HoVer: A Dataset for Many-Hop Fact Extraction And Claim Verification" in Findings of EMNLP, 2020. (paper).

The basic code structure was adapted from Transformers.

0. Preparation

Dependencies

PyTorch 1.4.0/1.6.0
See requirements.txt.

Data

Run download_data.sh to download the HoVer dataset.

1. Document Retrieval

TF-IDF Pre-retrieval

We provide the top-100 Wikipedia articles retrieved by running DRQA on the HoVer dataset. It was already downloaded in data/hover/tfidf_retrieved.

Training Neural-based Document Retrieval Model

Prepare the data by running:

python prepare_data_for_doc_retrieval.py --data_split=dev --doc_retrieve_range=20
python prepare_data_for_doc_retrieval.py --data_split=train --doc_retrieve_range=20

This will add the top-20 TF-IDF retrieved documents to the data as candidates of the following neural document retrieval stage.

Run ./train_scripts/train_doc_retrieval.sh. The model checkpoints are saved in out/hover/exp1.0/doc_retrieval.

Evaluating Neural-based Document Retrieval Model

Run the evaluation:

./eval_scripts/eval_doc_retrieval_on_dev.sh
./eval_scripts/eval_doc_retrieval_on_train.sh

This will evaluate the model on both the training set and dev set because we need both predictions to construct the training/dev set for the sentence selection.

2. Sentence Selection

Training Sentence-selection Model

First, start the Stanford Corernlp in the background. We use Corenlp to split the sentences:

java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -annotators "tokenize,ssplit,pos,lemma,parse,sentiment" -port 9000 -timeout 30000

Prepare the data by running:

python prepare_data_for_sent_retrieval.py --data_split=dev --sent_retrieve_range=5
python prepare_data_for_sent_retrieval.py --data_split=train --sent_retrieve_range=5

This will add the sentences from the top-5 retrieved documents as candidates of the following sentence selection stage.

Run ./train_scripts/train_sent_retrieval.sh. The model checkpoints are saved in out/hover/exp1.0/sent_retrieval.

Evaluating Sentence-selection Model

Run the evaluation:

./eval_scripts/eval_sent_retrieval_on_dev.sh
./eval_scripts/eval_sent_retrieval_on_train.sh

This will evaluate the model on both the training set and dev set because we need both predictions to construct the training/dev set for the claim verification.

3. Claim Verification

Training Claim-verification Model

Prepare the data by running:

python prepare_data_for_claim_verification.py --data_split=dev
python prepare_data_for_claim_verification.py --data_split=train

Run ./train_scripts/train_claim_verification.sh. The model checkpoints are saved in out/hover/exp1.0/claim_verification.

Evaluating Claim-verification Model

Run the evaluation:

./eval_scripts/eval_claim_verification_on_dev.sh

Citation

@inproceedings{jiang2020hover,
  title={{HoVer}: A Dataset for Many-Hop Fact Extraction And Claim Verification},
  author={Yichen Jiang and Shikha Bordia and Zheng Zhong and Charles Dognin and Maneesh Singh and Mohit Bansal.},
  booktitle={Findings of the Conference on Empirical Methods in Natural Language Processing ({EMNLP})},
  year={2020}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hover

0. Preparation

Dependencies

Data

1. Document Retrieval

TF-IDF Pre-retrieval

Training Neural-based Document Retrieval Model

Evaluating Neural-based Document Retrieval Model

2. Sentence Selection

Training Sentence-selection Model

Evaluating Sentence-selection Model

3. Claim Verification

Training Claim-verification Model

Evaluating Claim-verification Model

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data/hover		data/hover
eval_scripts		eval_scripts
my_transformers		my_transformers
train_scripts		train_scripts
LICENSE		LICENSE
README.md		README.md
StanfordNLP.py		StanfordNLP.py
download_data.sh		download_data.sh
prepare_data_for_claim_verification.py		prepare_data_for_claim_verification.py
prepare_data_for_doc_retrieval.py		prepare_data_for_doc_retrieval.py
prepare_data_for_sent_retrieval.py		prepare_data_for_sent_retrieval.py
requirements.txt		requirements.txt
run_hover.py		run_hover.py

License

hover-nlp/hover

Folders and files

Latest commit

History

Repository files navigation

Hover

0. Preparation

Dependencies

Data

1. Document Retrieval

TF-IDF Pre-retrieval

Training Neural-based Document Retrieval Model

Evaluating Neural-based Document Retrieval Model

2. Sentence Selection

Training Sentence-selection Model

Evaluating Sentence-selection Model

3. Claim Verification

Training Claim-verification Model

Evaluating Claim-verification Model

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages