Explainable Conversational Question Answering over Heterogeneous Sources via Iterative Graph Neural Networks
This repository contains the code and data for our SIGIR 2023 paper on "Explainable Conversational Question Answering over Heterogeneous Sources via Iterative Graph Neural Networks", and builds upon the CONVINSE code for our SIGIR 2022 paper.
Our new approach, EXPLAIGNN follows the following general pipeline:
- Question Understanding (QU) -- creating an intent-explicit structured representation of a question and its conversational context
- Evidence Retrieval (ER) -- harnessing this frame-like representation to uniformly capture relevant evidences from different sources
- Heterogeneous Answering (HA) -- deriving the answer from this set of evidences from heterogeneous sources.
The focus of this work is on the answering phase. In this stage, a heterogeneous graph is constructed from the retrieved evidences and corresponding entities, as the basis for applying graph neural networks (GNNs). The GNNs are iteratively applied for computing the best answers and supporting evidences in a small number of steps.
Further details can be found on the EXPLAIGNN website and in the corresponding paper pre-print. An interactive demo will also follow soon!
If you use this code, please cite:
@article{christmann2023explainable,
title={Explainable Conversational Question Answering over
Heterogeneous Sources via Iterative Graph Neural Networks},
author={Christmann, Philipp and Roy, Rishiraj Saha and Weikum, Gerhard},
journal={SIGIR},
year={2023}
}
All code was tested on Linux only.
Clone the repo via: We recommend the installation via conda, and provide the corresponding environment file in conda-explaignn.yml:
git clone https://github.com/PhilippChr/EXPLAIGNN.git
cd EXPLAIGNN/
conda env create --file conda-explaignn.yml
conda activate explaignn
pip install -e .
Alternatively, you can also install the requirements via pip, using the requirements.txt file (not tested). In this case, for running the code on a GPU, further packages might be required.
EXPLAIGNN makes use of CLOCQ for retrieving relevant evidences.
CLOCQ can be conveniently integrated via the publicly available API, using the client from the repo. If efficiency is a primary concern, it is recommended to directly run the CLOCQ code on the local machine (details are given in the repo).
In either case, it can be installed via:
make install_clocq
Optional: If you want to use or compare with QuReTeC or FiD please follow the installation guides in the CONVINSE repo.
To initialize the repo (download data, benchmark, models), run:
bash scripts/initialize.sh
If you would like to reproduce the results of EXPLAIGNN for all sources (Table 1 in the SIGIR 2023 paper), or a specific source combination, run:
bash scripts/pipeline.sh --gold-answers config/convmix/explaignn.yml kb_text_table_info
the last parameter (kb_text_table_info
) specifies the sources to be used, separated by an underscore. E.g. "kb_text_info" would evaluate EXPLAIGNN using evidences from KB, text and infoboxes.
Note, that EXPLAIGNN retrieves evidences on-the-fly by default.
Given that the evidences in the information sources can change quickly (e.g. Wikipedia has many updates every day),
results can easily change.
A cache was implemented to improve the reproducability, and we provide a benchmark-related subset of Wikipedia (see details below).
For reproducing the results of EXPLAIGNN using the predicted answers for previous turns (Table 2 in the SIGIR 2023 paper), run:
bash scripts/pipeline.sh --pred-results config/convmix/explaignn.yml kb_text_table_info
Finally, for reproducing the results of EXPLAIGNN with different input source combinations (Table 3 in the SIGIR 2023 paper), run:
bash scripts/pipeline.sh --source-combinations config/convmix/explaignn.yml kb_text_table_info
The results will be logged in the out/convmix
directory, and the metrics written to _results/convmix/explaignn.res
.
To train a pipeline, just choose the config that represents the pipeline you would like to train, and run:
bash scripts/pipeline.sh --train [<PATH_TO_CONFIG>] [<SOURCES_STR>]
Example:
bash scripts/pipeline.sh --train config/convmix/explaignn.yml kb_text_table_info
If you create your own pipeline, it is recommended to test it once on an example, to verify that everything runs smoothly.
You can do that via:
bash scripts/pipeline.sh --example [<PATH_TO_CONFIG>]
and see the output file in out/<benchmark>
for potential errors.
For standard evaluation, you can simply run:
bash scripts/pipeline.sh --gold-answers [<PATH_TO_CONFIG>] [<SOURCES_STR>]
Example:
bash scripts/pipeline.sh --gold-answers config/convmix/explaignn.yml kb_text_table_info
For evaluating with all source combinations, run:
bash scripts/pipeline.sh --source-combinations [<PATH_TO_CONFIG>] [<SOURCES_STR>]
Example:
bash scripts/pipeline.sh --source-combinations config/convmix/explaignn.yml kb_text_table_info
If you want to evaluate using the predicted answers of previous turns, you can run:
bash scripts/pipeline.sh --pred-answers [<PATH_TO_CONFIG>] [<SOURCES_STR>]
Example:
bash scripts/pipeline.sh --pred-answers config/convmix/explaignn.yml kb_text_table_info
By default, the EXPLAIGNN config and all sources will be used.
The results will be logged in the following directory: out/<DATA>/<CMD>-<FUNCTION>-<CONFIG_NAME>.out
,
and the metrics are written to _results/<DATA>/<CONFIG_NAME>.res
.
For using the pipeline, e.g. for improving individual parts of the pipeline, you can simply implement your own method that inherits from the respective part of the pipeline, create a corresponding config file, and add the module to the pipeline.py file. You can then use the commands outlined above to train and test the pipeline. Please see the documentation of the individual modules for further details:
- Distant Supervision
- Question Understanding (QU)
- Evidence Retrieval and Scoring (ERS)
- Heterogeneous Answering (HA)
The ConvMix dataset can be downloaded (if not already done so via the initialize-script) via:
bash scripts/download.sh convmix
Then, the individual ConvMix data splits can be loaded via:
import json
with open ("_benchmarks/convmix/train_set_ALL.json", "r") as fp:
train_data = json.load(fp)
with open ("_benchmarks/convmix/dev_set_ALL.json", "r") as fp:
dev_data = json.load(fp)
with open ("_benchmarks/convmix/test_set_ALL.json", "r") as fp:
test_data = json.load(fp)
You could also load domain-specific versions, by replacing "ALL" by either "books", "movies", "music", "soccer" or "tvseries".
The data will have the following format:
[
// first conversation
{
"conv_id": "<INT>",
"domain": "<STRING>",
"questions": [
// question 1 (complete)
{
"turn": 0,
"question_id": "<STRING: QUESTION-ID>",
"question": "<STRING: QUESTION>",
"answers": [
{
"id": "<STRING: Wikidata ID of answer>",
"label": "<STRING: Item Label of answer>
},
]
"answer_text": "<STRING: textual form of answer>",
"answer_src": "<STRING: source the worker found the answer>",
"entities": [
{
"id": "<STRING: Wikidata ID of question entity>",
"label": "<STRING: Item Label of question entity>
},
],
"paraphrase": "<STRING: paraphrase of current question>"
},
// question 2 (incomplete)
{
"turn": 1,
"question_id": "<STRING: QUESTION-ID>",
"question": "<STRING: QUESTION>",
"answers": [
{
"id": "<STRING: Wikidata ID of answer>",
"label": "<STRING: Item Label of answer>
},
]
"answer_text": "<STRING: textual form of answer>",
"answer_src": "<STRING: source the worker found the answer>",
"entities": [
{
"id": "<STRING: Wikidata ID of question entity>",
"label": "<STRING: Item Label of question entity>
},
],
"paraphrase": "<STRING: paraphrase of current question>",
"completed": "<STRING: completed version of current incomplete question>"
]
},
// second conversation
{
...
},
// ...
]
Please make sure that...
- ...you use our dedicated Wikipedia dump, to have a comparable Wikipedia version (see further details below).
- ...you use the same Wikidata dump (2022-01-31), which can be conveniently accessed using the CLOCQ API available at https://clocq.mpi-inf.mpg.de (see further details below).
- ...you use the same evaluation method as EXPLAIGNN (as defined in explaignn/evaluation.py).
For the EXPLAIGNN and CONVINSE projects, the Wikidata dump with the timestamp 2022-01-31 was used, which is currently also accessible via the CLOCQ API. Further information on how to retrieve evidences from Wikidata can be found in the ERS documentation.
Wikipedia evidences can be retrieved on-the-fly using the WikipediaRetriever
package. However, we provide a ConvMix-related subset, that can be downloaded via:
bash scripts/download.sh wikipedia
Note that this data is also downloaded by the default initialize script.
The dump is provided as a .pickle file, and provides a mapping from Wikidata item IDs (e.g. Q38111) to Wikipedia evidences.
This ConvMix-related subset has been created as follows. We added evidences retrieved from Wikipedia in 2022-03/04 for the following Wikidata items:
- all answer entities in the ConvMix benchmark,
- all question entities in the ConvMix benchmark (as specified by crowdworkers),
- the top-20 disambiguations for each entity mention detected by CLOCQ, with the input strings being the intent-explicit forms generated for the ConvMix dataset by the EXPLAIGNN pipeline, or the baseline built upon the 'Prepend all' QU method,
- and whenever new Wikidata entities occured (e.g. for the dynamic setup running the pipeline with predicted answers), we added the corresponding evidences to the dump.
We aim to maximize the number of entities (and thus of evidences) here, to allow for fair (as far as possible) comparison with dense retrieval methods. Crawling the whole Wikipedia dump was out of scope (also Wikimedia strongly discourages this). In total we collected ~230,000 entities, for which we tried retrieving Wikipedia evidences. Note that for some of these, the result was empty.
Further information on how to retrieve evidences from Wikipedia can be found in the ERS documentation
We tried our best to document the code of this project, and for making it accessible for easy usage, and for testing your custom implementations of the individual pipeline-components. However, our strengths are not in software engineering, and there will very likely be suboptimal parts in the documentation and code. If you feel that some part of the documentation/code could be improved, or have other feedback, please do not hesitate and let us know! You can e.g. contact us via mail: pchristm@mpi-inf.mpg.de. Any feedback (also positive ;) ) is much appreciated!
The EXPLAIGNN project by Philipp Christmann, Rishiraj Saha Roy and Gerhard Weikum is licensed under MIT license.