This repository is a reference implementation of the global graph counterfactual explainer as described in the paper:
Global Counterfactual Explainer for Graph Neural Networks.
Mert Kosan*, Zexi Huang*, Sourav Medya, Sayan Ranu, Ambuj Singh.
ACM International Conference on Web Search and Data Mining, 2023.
https://dl.acm.org/doi/10.1145/3539597.3570376
The link contains the manuscript and the presentation video in supplement.
The easiest way to install the dependencies is via conda. Once you have conda installed, run this command:
conda env create -f environment.yml
If you want to install dependencies manually, we tested our code in Python 3.8.0 using the following main dependencies:
- PyTorch v1.8.0
- PyTorch Geometric v1.7.2
- NetworkX v3.0
- NumPY v1.23.5
- tqdm v4.65.0
All our experiments are run on a machine with 2 NVIDIA GeForce RTX 2080 GPU (8GB of RAM) and 32 Intel Xeon CPUs (2.10GHz and 128GB of RAM).
We have already provided gnn and neurosed base models. If you want to run our method using your own dataset, firstly, you have to train your own gnn and neurosed base models.
- For gnn base models, you can use our gnn.py module.
- For neurosed base models, please follow neurosed repository.
If neurosed model is hard to train, you will have to update our importance function to use your graph edit distance function.
To generate counterfactual candidates for AIDS dataset with the default hyperparameters, run this command:
python vrrw.py --dataset aids
The counterfactual candidates and meta information is saved under results/{dataset}/runs/
. You can check other available training options with:
python vrrw.py --help
To generate counterfactual summary set for AIDS dataset from the candidates with the default hyperparameters, run this command:
python summary.py --dataset aids
The coverage and cost performance under different number of summary size will be printed on screen. You can check other available summary options with:
python summary.py --help
The following table shows recourse coverage (𝜃 = 0.1) and median recourse cost comparison between GCFExplainer and baselines for a 10-graph global explanation. GCFExplainer consistently and significantly outperforms all baselines across different datasets.
To reproduce the results for GCFExplainer in the table, run the following script for each dataset and collect the performance corresponding to the top-10 explantions:
python summary.py --dataset {dataset}
The following figure illustrates global and local counterfactual explanations for the AIDS dataset. The global counterfactual graph (c) presents a high-level recourse rule—changing ketones and ethers into aldehydes (shown in blue)—to combat HIV, while the edge removals (shown in red) recommended by local counterfactual examples from baselines (b) are hard to generalize.
If you find our framework useful, please consider citing the following paper:
@inproceedings{gcfexplainer2023,
author = {Kosan, Mert and Huang, Zexi and Medya, Sourav and Ranu, Sayan and Singh, Ambuj},
title = {Global Counterfactual Explainer for Graph Neural Networks},
booktitle = {WSDM},
year = {2023}
}