This repository contains resources for the SIGIR 2020 paper:
Query Resolution for Conversational Search with Limited Supervision [pdf]
by N. Voskarides, D. Li, P. Ren, E. Kanoulas and M. de Rijke.
Download Data in current folder.
Download and unzip Models in current folder.
conda create -n quretec python=3.5
source activate quretec
pip install -r requirements.txt
In this example we train QuReTeC using QuAC gold resolutions.
BASE_DIR=./models/
DATA_DIR=./data/quac_canard/token_classification/
TRAIN_ON=train_gold_supervision
DEV_ON=dev_gold_supervision
MODEL_ID=XXX # provide an ID for the model to be trained here.
python -m run_ner --task_name ner --bert_model bert-large-uncased --max_seq_length 300 --train_batch_size 4 --hidden_dropout_prob 0.4 --train_on $TRAIN_ON --DEV_ON $DEV_ON --do_train --data_dir $DATA_DIR
In this example we use a trained model to generate output and perform intrinsic evaluation on the TREC CAsT 2019 test data.
BASE_DIR=./models/
# model trained on QuAC gold resolutions (as in paper)
MODEL_ID=191790_50
DATA_DIR=./data/trec_cast_2019/token_classification/
DEV_ON=test_oracle_rewrite
python -m run_ner --task_name ner --do_eval --do_lower_case --data_dir $DATA_DIR --base_dir $BASE_DIR --dev_on $DEV_ON --model_id $MODEL_ID --no_cuda
...
[Token eval] P=76.6, R=80.3, F1=78.4
The above command generates the file: ./models/191790_50/eval_results_test_oracle_rewrite_epoch0.json
In order to generate the query file for retrieval:
MODEL_OUTPUT_FILE=./models/191790_50/eval_results_test_oracle_rewrite_epoch0.json
RAW_QUERY_FILE=./data/trec_cast_2019/cast2019_test_annotated.tsv
OUTPUT_FILE=query_file_quretec.txt
python -m generate_query_files_for_trained_model --model_output_file $MODEL_OUTPUT_FILE --raw_query_file $RAW_QUERY_FILE --dataset_name cast --output_file $OUTPUT_FILE
The above script assumes the same set of qids in model_output_file
and raw_query_file
.
You can find the preprocessed data and the output of QuReTeC and the baselines here.
@inproceedings{voskarides-2020-query,
Author = {Voskarides, Nikos and Li, Dan and Ren, Pengjie and Kanoulas, Evangelos and de Rijke, Maarten},
Booktitle = {SIGIR 2020: 43rd international ACM SIGIR conference on Research and Development in Information Retrieval},
Month = {July},
Publisher = {ACM},
Title = {Query Resolution for Conversational Search with Limited Supervision},
Year = {2020}}
If you have any questions, please contact Nikos Voskarides