Simran Arora, Patrick Lewis, Angela Fan, Jacob Kahn*, Christopher Ré*
Paper | Blog Post | Download | Citing
This repository contains dataset resources and code for ConcurrentQA, a textual QA benchmark to require concurrent retrieval over multiple data-distributions and privacy scopes. It also contains result analysis code and other resources for research in the private QA setting.
Clone the repository as follows.
git clone git@github.com:facebookresearch/concurrentqa.git
cd concurrentqa
cd multihop_dense_retrieval
git submodule init
git submodule update
Set up the environment as follows (according to the MDR instructions). We encourage the use of conda environments.
conda create --name cqa python=3.6
conda activate cqa
cd concurrentqa/multihop_dense_retrieval/
bash setup.sh
If you are using Cuda 11, we find the following changes to the above setup work well: 1) use python=3.7
, 2) in /multihop_dense_retrieval/setup.sh
modify the faiss-gpu and pytorch instructions to the following:
conda install faiss-gpu cudatoolkit=11.3 -c pytorch
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch
To download train, dev, and test sets along with email and Wikipedia passage corpora, and model checkpoints, run:
bash scripts/download_cqa.sh
To download retriever and reader models trained on HotpotQA data, run:
bash scripts/download_hotpot.sh
The datasets can also be downloaded via Hugging Face:
We include instructions 1) for training and evaluating models on ConcurrentQA data in the absense of privacy cocerns and 2) for evaluating performance under the PAIR privacy framework.
To run evaluation with provided model checkpoints, use the script:
cd multihop_dense_retrieval
bash CQA_Scripts/MDR_Eval_CQA.sh
Retrieval scores on test split ...
Avg PR: 0.604375
Avg P-EM: 0.190625
Avg 1-Recall: 0.276875
Path Recall: 0.184375
bridge Questions num: 1400
Avg PR: 0.5985714285714285
Avg P-EM: 0.18785714285714286
Avg 1-Recall: 0.265
Path Recall: 0.18428571428571427
comparison Questions num: 200
Avg PR: 0.645
Avg P-EM: 0.21
Avg 1-Recall: 0.36
Path Recall: 0.185
Reader scores on test split ...
'em': 0.48875,
'f1': 0.5650013858314458,
'joint_em': 0.1175,
'joint_f1': 0.3439091595024459,
'sp_em': 0.154375,
'sp_f1': 0.4496642766955267
To train your own MDR model from scratch, use the script:
cd multihop_dense_retrieval
bash CQA_Scripts/MDR_end2end_CQA.sh
Set the desired privacy mode and retrieval mode in the script and run as follows:
cd multihop_dense_retrieval
bash CQA_Scripts/MDR_PairBaselines.sh
Descriptions of privacy and retrieval modes are included in the script.
- Privacy modes include preserving document privacy (DOC_PRIVACY), query privacy (QUERY_PRIVACY), and no privacy.
- Retrieval modes include ranking the OVERALL top k after each hop (4combo_overallrank), selecting the top passages from EACH DOMAIN after each hop (4combo_separaterank).
Please use the following Bibtex when using the dataset:
@article{arora2022reasoning,
title={Reasoning over Public and Private Data in Retrieval-Based Systems},
author={Simran Arora and Patrick Lewis and Angela Fan and Jacob Kahn and Christopher Ré},
year={2023},
url={https://aclanthology.org/2023.tacl-1.51/},
journal={Transactions of the Association for Computational Linguistics},
}
If you use MDR, please also cite the Multi-Hop Dense Text Retrieval work.
ConcurrentQA and related code is under an MIT license. See LICENSE for more information.