genetic algorithm for Finding Subgraph
@article{GAVulExplainer,
title = {Graph-based explainable vulnerability prediction},
journal = {Information and Software Technology},
pages = {107566},
year = {2024},
issn = {0950-5849},
doi = {https://doi.org/10.1016/j.infsof.2024.107566},
url = {https://www.sciencedirect.com/science/article/pii/S095058492400171X},
author = {Hong Quy Nguyen and Thong Hoang and Hoa Khanh Dam and Aditya Ghose},
keywords = {Graph neural network, Explanation, Vulnerability}
}
1a. (CPU) create env with command `conda env create -f binder/environment.yml`
1b. (GPU) create env with command `conda env create -f binder/environment-cu11.3.yml`
2. activate env with `conda activate ga_subgraph`
3. Download Reveal dataset: https://bit.ly/3bX30ai
4. We flowed and used Joern which was provided along with Reveal paper at https://github.com/VulDetProject/ReVeal/blob/master/code-slicer/joern/README.md. Using Joern to parse data
In case you want to install yourself, below are major libs we used
1. PyTorch
2. PyTorch Geometric
3. PyTorch Lightning
4. networkx
5. DEAP
6. nltk
7. gensim
In case, you want to use our prepared example (example.py
), download data.zip
at https://drive.google.com/file/d/1eQBfx3OAOZLJrmX2wby5S_Z_HiWW0BT9/view?usp=sharing, unzip data.zip
, and weights.zip
at project level.
In order to ultilize GAVulExplainer
for other tasks, please follow below instruction
from ga_subgraph.explainer import GASubX
from ga_subgraph.fitness import classifier
from ga_subgraph.individual import Individual
k_node = 5 # explanation size
# foo_sample is PyTorch Geometric Data
ga_explainer = GASubX(saved_model, classifier, device, Individual,.)
ga_subgraph, _ = ga_explainer.explain(foo_sample, k_node, verbose=False)
Documents of GASubX
:param blackbox: PyTorch model
:param classifier: Function to get probability from model, example: `ga_subgraph.fitness.classifier`
:param device: cuda or cpu
:param IndividualCls: Class to store individual representation
:param n_gen: how many generation to perform
:param CXPB: crossover probabitliy
:param MUTPB: mutation probability
:param tournsize: factor control selection function
:param subgraph_building_method: function to construct subgraph
:param max_population: control max individual for every generation
:param offspring_population: control number of offsprint individuals
- unzip
data.zip
, andweights.zip
- run
python do_statistic 4 cuda
. 4 is explanation size, cuda is device - the script will ultilize multi-processors to perform explaination parallelly
- at
do_statistic
line 106: config DataSet - at
do_statistic
line 52: config pretrained model - we share raw result for undirect graphs at
statistics
folder, direct graphs atstatistics_undirected
- weights folder: we stored pretrain classifer here
- data: store data, word2vec model
- binder: we locked lib versions for this project
- ga_subgraph: our implementation for GAVulExplainer
- visualization: helpers for visualize
- vulexp: helpers for tranining vulnerability predictor, data processing, and SubgraphX. We demonstrate in
example.py
. vulexp/reveal_data.py
: class handle Reveal dataset