This repository is a fork of https://github.com/sberbank-ai/DetIE.
The original repository contains the code for the paper 'DetIE: Multilingual Open Information Extraction Inspired by Object Detection' by Michael Vasilkovsky, Anton Alekseev, Valentin Malykh, Ilya Shenbin, Elena Tutubalina, Dmitriy Salikhov, Mikhail Stepnov, Andrei Chertok and Sergey Nikolenko.
All the results have been obtained using V100 GPU with CUDA 10.1.
Download the files bundle from here. Each of them should be put into the corresponding directory:
- folder
version_243
(DetIE_LSOIE) should be copied to:results/logs/default/version_243
; - folder
version_263
(DetIE_IMoJIE) should be copied to:results/logs/default/version_263
; - files
imojie_train_pattern.json
,lsoie_test10.json
andlsoie_train10.json
should be copied todata/wikidata
.
We suggest that you use the provided Dockerfile to deal with all the dependencies of this project.
E. g. clone this repository, then
cd DetIE/
docker build -t detie .
nvidia-docker run -v ../DetIE:/project -p 8808:8808 -it detie:latest bash
Once this docker image starts, we're ready for work.
Alternatively, you can also develop locally.
In this case, it is recommended you first create a venv
.
Then, please install pytorch with the appropriate CUDA version. Versions to be confirmed working are between >1.7.0 and <=1.11.0.
Afterwards, install all requirements using pip install --upgrade -r context/requirements.txt
.
This project uses hydra library for storing and changing the systems' metadata. The entry point
to the arguments list that will be used upon running the scripts is the config/config.yaml
file.
defaults:
- model: detie-cut
- opt: adam
- benchmark: carb
model
leads to config/model/...
subdirectory; please see detie-cut.yaml
for the parameters description.
opt/adam.yaml
and benchmark/carb.yaml
are the examples of configurations for the optimizer and the benchmark used.
If you want to change some of the parameters (e.g. max_epochs
), not modifying the *.yaml files, just run e.g.
PYTHONPATH=. python some_..._script.py model.max_epochs=2
PYTHONPATH=. python3 modules/model/train.py
PYTHONPATH=. python3 modules/model/test.py model.best_version=243
This yields time in seconds when running inference against
modules/model/evaluation/oie-benchmark-stanovsky/raw_sentences/all.txt
using batch size equal to 32.
Should be 708.6 sentences/sec. on NVIDIA Tesla V100 GPU.
To apply the model to CaRB sentences, run
PYTHONPATH=. python3 modules/model/evaluation/carb-openie6/detie_predict.py
head -5 modules/model/evaluation/carb-openie6/systems_output/detie243_output.txt
This will save the predictions into the modules/model/evaluation/carb-openie6/systems_output/
directory. The same
should be done with modules/model/evaluation/carb-openie6/detie_conj_predictions.py
.
To reproduce the DetIE numbers from the Table 3 in the paper, run
(cd modules/model/evaluation/carb-openie6/; ./eval.sh)
detie243
is a codename for DetIE_{LSOIE}detie243conj
is a codename for DetIE_{LSOIE} + IGL-CAdetie263
is a codename for DetIE_{IMoJIE}detie263conj
is a codename for DetIE_{IMoJIE} + IGL-CA
To reproduce the BenchIE predictions you can run the command below.
PYTHONPATH=. python3 modules/model/evaluation/carb-openie6/detie_benchie_predictions.py
The wikipedia triplet download does not seem to work. After only a few requests, we are already getting rate-limited (HTTP 429). If required, this code could be run offline on a data dump from Wikipedia?
To generate sentences using Wikidata's triplets, one can run the scripts
PYTHONPATH=. python3 modules/scripts/data/generate_sentences_from_triplets.py wikidata.lang=<lang>
PYTHONPATH=. python3 modules/scripts/data/download_wikidata_triplets.py wikidata.lang=<lang>
Please cite the original paper if you use this code.
@inproceedings{Vasilkovsky2022detie,
author = {Michael Vasilkovsky, Anton Alekseev, Valentin Malykh, Ilya Shenbin, Elena Tutubalina,
Dmitriy Salikhov, Mikhail Stepnov, Andrei Chertok and Sergey Nikolenko},
title = {{DetIE: Multilingual Open Information Extraction Inspired by Object Detection}},
booktitle = {
{Proceedings of the 36th {AAAI} Conference on Artificial Intelligence}
},
year = {2022}
}
Michael Vasilkovsky waytobehigh (at) gmail (dot) com