This is the code to the paper Dynamic Representations of Global Crises: A Temporal Knowledge Graph For Conflicts, Trade and Value Networks
Please Cite our Paper: Julia Gastinger, Timo Sztyler, Nils Steinert, Sabine Gruender-Fahrer, Michael Martin, Anett Schuelke, Heiner Stuckenschmidt. Dynamic Representations of Global Crises: A Temporal Knowledge Graph For Conflicts, Trade and Value Networks. Proceedings of the Third Learning on Graphs Conference (LoG 2024), PMLR 269, Virtual Event, November 26–29, 2024. Link
Authors: Julia Gastinger (julia.gastinger (at) uni-mannheim.de), Timo Sztyler (timo.sztyler (at) neclab.eu), Nils Steinert, Sabine Gruender-Fahrer, Michael Martin, Anett Schuelke, Heiner Stuckenschmidt
In the following we describe the steps needed to reproduce our results. It is split in two parts, 1. Dataset Preprocessing and 2. TKG Forecasting.
It is not required to re-run the Dataset Preprocessing Steps. We provide the output of Dataset Preprocessing in /data/crisis2023
. These files can be used for TKG Forecasting.
- Please see README.md in folder
queries
- Run
python3 ./data_preprocessing/ts_assignment_gta_star.py
andpython3 /data_preprocessing/ts_assignment_acled.py
to read the .nt files and assign timesteps - This requires the rdflib package, that can be downloaded here https://github.com/XuguangSong98/rdflib and put into the data_preprocessing folder. Processing data with this package very slow and can take hours to days.
- The output are csv files that can be found in
/data/acled
and/data/gta
respectively
- Run
python3 ./data_preprocessing/merke-tkg-from-gta-acled.py
to merge both subsets and create train, valid, test.txt - What it does:
- Specify timerange of interest. In our case this is 2023-01-01 – 2023-12-31
- Split dataset based on timesteps. Specify train/valid/test split. In our case it is 80/10/10
- Automatically stores the resulting files in
/data/crisis2023
- It produces various files:
train.txt
,valid.txt
,test.txt
: one line per quadruple, quadruples assubject_id, relation_id, object_id, timestamp
(from 0 to num_timesteps),original_dataset_id
(0: gta, 1: acled)train_names.txt
,valid_names.txt
,test_names.txt
: one line per quadruple with string description for each node and relation;subject_string, relation_string, object_string, original_dataset_id, timestamp
(from 0 to num_timesteps)id_to_node.json
andid_to_rel.json
: contains dicts with mappings from"node_id"
tonode_string
, and"relation_id"
torelation string
.node_to_id.json
andrel_to_id.json
: contains dicts with mappings fromnode_string
to"node_id"
, andrelation string
to "relation_id" .stat.txt
: two entries, number of nodes, number of distinct relations
All models for TKG Forecasting are in the folder models
. Follow the instructions in the respective README.md
.
The code for evaluating the results for TKG Forecasting are in the folder result_evaluation.py
. Follow the instructions in the respective README.md
.