For a given query and n passages, this code extracts triples between (inter/intra) passages and a query. The dataset used in this code is Open NQ, which consists of queries of the original Natural Questions, and evidence documents retrieved through DPR (https://github.com/facebookresearch/DPR). You can get the dataset from: get-data.sh
in https://github.com/facebookresearch/FiD.
What this code does
- Perform entity linking in given queries and passages (entity_linking.py).
- Obtain a relaton list that can exist between linked entities, using Wikidata (KB) API (NQ_triple_extractor.py).
- Filter relations using TFIDF scores.
Entity linker from https://github.com/egerber/spaCy-entity-linker
Wikidata: https://qwikidata.readthedocs.io/en/stable/readme.html
pip install -r requirements.txt
python -m spacy download en_core_web_md
python -m spacy_entity_linker "download_knowledge_base"
Download the dataset from https://github.com/facebookresearch/FiD.
( ./NQ/test.json | ./NQ/dev.json | ./NQ/train.json )
python entity_linking.py
python NQ_triple_extractor.py