LP-DIXIT

This repository contains the official code for the paper UNDER REVIEW that presents LP-DIXIT. It evaluates explanations for Link Prediction (LP) tasks on Knowledge Graphs (KGs) in RDF. It labels each explanation as bad, neutral, or good. The label is the Forward Simulatability Variation (FSV) which measures the variation of predictability of an inference (LP) caused by an explanations; an inference is predictable if a verifier can hypothesize its output given the same input and without replicating the same process. In the paper we focus on LP methods based on Knowledge Graph Embeddings (KGE) and on post-hoc methods to eXplain LP (LP-X).

LP-DIXIT employs a Large Language Model (LLM) as a verifier to automate the evaluation! It builds prompts by filling out our prompt template (sections separated by blank lines and variable parts in curly braces):

An RDF triple is a statement (subject, predicate, object). The subject and the object are entities, and the predicate is a relation between the subject and the object. Perform a Link Prediction (LP) task, specifically, given an incomplete RDF triple (subject, predicate, ?), predict the missing object that completes the triple and makes it a true statement.

Strict requirement: output solely the name of a single object entity, discard any explanation or other text.
Correct format: Italy
Incorrect format: The object entity is Italy.

({s}, {p}, ?)

{Explanation}

In the paper we propose four declinations of LP-DIXIT: LP-DIXIT, LP-DIXIT-O, LP-DIXIT-D, and LP-DIXIT-OD that combines the first two. LP-DIXIT-O includes in the prompt a set of entities computed through the KGE model and a natural language instruction stating that the LLM must pick its response from such set, LP-DIXIT-D includes in the prompt a set of examples (or demonstrations) of solved LP queries, and LP-DIXIT-OD combines both LP-DIXIT-O, and LP-DIXIT-D.

Read our paper for a detailed formalization of the approach! Make sure to also read the Report on Additional Experiments!

Installation

Code

# Clone the repository
git clone

# Navigate to the repository directory
cd lp-dixit

# Install the required dependencies
pip install -r requirements.txt

Data

LP-DIXIT currently comes with the following datasets:

FB15k-237
WN18RR
YAGO3-10
DB50K
DB100K
YAGO4-20
FRUNI
FR200K

FRUNI and FR200K are benchamrk datasets that contain ground truth explanations for each prediction.

Download the included datasets https://figshare.com/s/cfe5eced5cf283afa016

Add other datasets

KGE Models

LP-DIXIT currently provides an implementation for:

TransE
ComplEx
ConvE

We pre-trained such models on all the KGs.

Download the pre-trained models https://figshare.com/s/cfe5eced5cf283afa016

Add other models

Usage

python -m src.evaluation.llm_forward_simulatability [OPTIONS] FILE

It evaluates the explanations in FILE, which is a valid json file structured as follows:

{
    "pred": ["<subject>", "<predicate>", "<object>"],
    "explanation": "<explanation>"
}

<explanation> is any string that can be framed in a prompt, for instance, it can be a set of RDF triples.

Options:

--dataset <dataset>

--model <model>

--llm <llm> Use the LLM <llm> as a verifier, <llm> is Llama-3.1 or Mixtral

--parameters <N> Use the specified LLM in the version with <N> billions of parameters, <N> is 8 or 70 when <llm> is Llama-3.1, while <N> is 7 or 22 when <llm> is Mixtral

--include-ranking Execute LP-DIXIT-O instead of plain LP-DIXIT, this flag can also be combined with --include-examples

--include-examples Execute LP-DIXIT-D instead of plain LP-DIXIT, this flag can also be combined with --include-ranking

--help

Output:

This command writes two files:

fs_details/{FILE}_{include-ranking}_{include-examples}_{llm}_{parameters}B.json: label and intermediate results for each explanation
fs_metrics/{FILE}_{include-ranking}_{include-examples}_{llm}_{parameters}B.json: aggregate metrics

Note that the command removes .json from FILE when creating the output files.

Additional Features

Tune hyper-parameters of a KGE model

python -m src.link_prediction.tune --dataset <dataset> --model <model>

It saves the best hyper-parameters found in lp_config/{model}_{dataset}.json.

Train a KGE model

python -m src.link_prediction.train --dataset <dataset> --model <model> --valid <validation_epochs>

<valid> is the frequency (in epochs) of evaluation of the model on the validation set to determine whether to apply early stopping

The command reads the model configuration specified in lp_config/{model}_{dataset}.json, generate it through hyper-parameters tuning or create it manually.

It saves the model parameters in kge_models/{model}_{dataset}.pt.

Make predictions (compute the rank of test triples)

python -m src.link_prediction.test --dataset <dataset> --model <model>

It reads the model parameters from kge_models/{model}_{dataset}.pt and saves the predictions in preds/{model}_{dataset}.csv.

Select and sample 100 correct predictions (top-ranked test triples)

python -m src.select_preds --dataset <dataset> --model <model>

It reads the predictions from preds/{model}_{dataset}.csv and saves the correct ones in selected_preds/{model}_{dataset}; then it saves a sample of the correct prediction in sampled_selected_preds/{model}_{dataset}.

Generate explanations

python -m src.explain [OPTIONS]

Options

--dataset <dataset>

--model <model>

--method <method> Use the LP-X method <method>, it is criage, data-poisoning, kelpie, kelpie++, GEnI;

--mode <mode> Run <method> in mode <mode>, it is necessary, or sufficient; this parameter is ignored by GEnI as it runs in a single mode.

--summarization <summarization> Execute the method with <summarization> as Graph Summarization strategy, <summarization> is simulation, or bisimulation; this parameter is needed only when <method> is kelpie++.

Input and Output

It explains the predictions in sampled_selected_preds/{model}_{dataset}; hence, it saves the explanations in explanations/{method}_{mode}_{model}_{dataset}_{summarization}.json.

Generate the predictions with the previous commands or upload your own predictions!

Experiments

We used this code to perform the experiments in our paper, specifically:

we measured the alignment of LP-DIXIT with human judgement;
we adopted LP-DIXIT to compare, on several well known KGs, state-of-the-art (SOTA) LP-X methods.

We used this code for all the phases of the experiments: from hyper-parameter tuning to evaluation of explanations. Everything is reproducible, to start only the hyper-parameter ranges are needed! Even the execution logs are available!

The ground truth explanations in the benchmarks are in the explanations folder having benchmark as value for the field method in the file name.

Download https://figshare.com/s/cfe5eced5cf283afa016 experiment results, configurations, and pre/trained models.

We executed the experiments on the cluster Leonardo@Cineca with Python 3.11.6

Citation

TODO IF ACCEPTED

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
candidate_explanations		candidate_explanations
data		data
explanation_configs		explanation_configs
explanations		explanations
fs_details		fs_details
fs_metrics		fs_metrics
kge_models		kge_models
lp_configs		lp_configs
notebooks		notebooks
preds		preds
rankings		rankings
sampled_selected_preds		sampled_selected_preds
scripts		scripts
selected_preds		selected_preds
src		src
.gitignore		.gitignore
README.md		README.md
Report on Additional Experiments.pdf		Report on Additional Experiments.pdf
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LP-DIXIT

Installation

Code

Data

KGE Models

Usage

Options:

Output:

Additional Features

Tune hyper-parameters of a KGE model

Train a KGE model

Make predictions (compute the rank of test triples)

Select and sample 100 correct predictions (top-ranked test triples)

Generate explanations

Options

Input and Output

Experiments

Citation

About

Releases

Packages

Languages

rbarile17/lp-dixit

Folders and files

Latest commit

History

Repository files navigation

LP-DIXIT

Installation

Code

Data

KGE Models

Usage

Options:

Output:

Additional Features

Tune hyper-parameters of a KGE model

Train a KGE model

Make predictions (compute the rank of test triples)

Select and sample 100 correct predictions (top-ranked test triples)

Generate explanations

Options

Input and Output

Experiments

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages