Preparing the repository

This repository contains the necessary code and documents the results of studying several methods of explainable AI (xAI) on established toptagger models. We study the TopoDNN model, model based on Multi-Body N Subjettiness (MBNS) data, and the Particle Flow Network (PFN) model. The architecture and implementation of these models (and a few more that we will investigate and add here in the near future) are reviewed and compared in the paper titled The Machine Learning landscape of top taggers by Kasieczka et al. (DOI: http://dx.doi.org/10.21468/SciPostPhys.7.1.014)

Preparing the repository

To train/retrain models and run the notebooks provided with this repository, one needs to create the right environment and install certain dependencies. First cd to the project's top directory and do

export PROJPATH=$PWD

Download Data

The dataset used by these studies has been made publicly available by Butter et al in Deep-learned Top Tagging with a Lorentz Layer (DOI: http://dx.doi.org/10.21468/SciPostPhys.5.3.028) and used for the review studies in the earlier mentioned review paper by Kasieczka et al. It is available at https://desycloud.desy.de/index.php/s/llbX3zpLhazgPJ6. The following steps need to be executed to download the data in store it in a location for processing.

cd $PROJPATH
mkdir -p datasets
cd datasets
wget https://desycloud.desy.de/index.php/s/llbX3zpLhazgPJ6/download
unzip download
mv v0/* .
rm download 
cd $PROJPATH

Make necessary directories

Run the following commands to make the necessary directories

cd $PROJPATH
mkdir -p datasets/topoprocessed datasets/n-subjettiness_data 
mkdir -p evaluation/TopoDNN/figures evaluation/Multi-Body/figures evaluation/PFN/figures
mkdir -p models/TopoDNN/trained_models models/TopoDNN/trained_model_dicts
mkdir -p models/Multi-Body/trained_models models/Multi-Body/trained_model_dicts
mkdir -p models/PFN/trained_models models/PFN/trained_model_dicts

Setup necessary environment!

This step required anaconda installation. Please visit https://anaconda.org/ and follow the instructions there to install and setup Anaconda. Then run the following commands:

cd $PROJPATH
conda create --name toptagger_env --file requirements.txt
conda activate toptagger_env
pip install tensorflow

Note: The environment toptagger_env sets up necessary dependencies for using PyTorch. We found that asking conda to install tensorflow in the same environment can be troublesome and hence, a simpler solution of using pip to install it has been adapted.

Data pre-processing

To preprocess the data for the TopoDNN and MBNS models, necessary scripts are given within the datasets directory. For the TopoDNN preprocessing, run the following commands

cd $PROJPATH/datasets
python topodnnpreprocessing.py <datasetname>

where datasetname can be either train.h5, val.h5, or test.h5. The preprocessed data will be stored in the topoprocessed subdirectory. For the MBNS model, you will need to install the FastJet package.

cd $PROJPATH
curl -O http://fastjet.fr/repo/fastjet-3.4.0.tar.gz
tar zxvf fastjet-3.4.0.tar.gz
cd fastjet-3.4.0/
./configure --prefix=$PROJPATH/fastjet-install --enable-pyext
make
make check
make install
cd $PROJPATH

PYPATH=`ls -d $PROJPATH/fastjet-install/lib/python3*`
export PYTHONPATH=$PYTHONPATH:$PYPATH/site-packages

and then simply do

cd $PROJPATH/datasets
python mb8spreprocessing.py

The preprocessed data will be stored in the n-subjettiness_data subdirectory.

Training your own models

For each model architecture, we have trained a number of alternate variants and they are hosted in the models/<model-type>/trained_models directories where model-type can be TopoDNN, Multi-Body, or PFN. The necessary metadata for each model is given as json files in the models/<model-type>/trained_model_dicts directories. If you are interested in training your own models, please follow the instructions in the README file given within each <model-type> subdirectory within the models directory. Setting up the conda environment with the command conda activate toptagger_env should setup the right environment. Training some of the models may require the FastJet package for dataloader, so a good idea is to add its path to the PYTHOPATH environment variable:

export PROJPATH=$PWD
PYPATH=`ls -d $PROJPATH/fastjet-install/lib/python3*`
export PYTHONPATH=$PYTHONPATH:$PYPATH/site-packages

Reproducing xAI results

The studies associated with explainability of each model are recorded in notebooks hosted in evaluation/<model-type> subdirectories. Each notebook is self-contained but they rely on avaliability of the pretrained models and the datasets in the way they have been setup in the previous section. The content of each notebook is explained in the README file provided in evaluation/ directory.

Reference

The studies in this repository are compiled and explained in this paper: A Detailed Study of Interpretability of Deep Neural Network based Top Taggers

To cite this work, please add-

A Khot, MS Neubauer, A Roy. A Detailed Study of Interpretability of Deep Neural Network based Top Taggers. arxiv preprint arXiv:2210.04371.

or use the following bibtex entry-

@article{khot2022detailed,
  title={A Detailed Study of Interpretability of Deep Neural Network based Top Taggers},
  author={Khot, Ayush and  Neubauer, Mark S and Roy, Avik},
  journal={arXiv preprint arXiv:2210.04371},
  year={2022}
}

Source Code

Existing resources from publicly available repositories have been adapted to implement the data preprocessing and training code. We are greatly thankful to the authors of the following works for making these repositories and resources publicly available.

Preprocessing and training for TopoDNN have been largely adopted from: https://github.com/jpearkes/topo_dnn/blob/master/topo_dnn.ipynb
Network implementation for PFN has been obtained from: https://github.com/jet-universe/particle_transformer/blob/main/networks/example_PFN.py
Implementation of Layerwise Relevance Propagation (LRP) is inspired from: https://git.tu-berlin.de/gmontavon/lrp-tutorial
Part of preprocessing code for MBNS is obtained from: https://github.com/SebastianMacaluso/TreeNiN/blob/master/code/top_reference_dataset/ReadData.py

Contact:

For comments, feedback, and suggestions: Avik Roy (avroy@illinois.edu) and Ayush Khot (akhot2@illinois.edu)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Preparing the repository

Download Data

Make necessary directories

Setup necessary environment!

Data pre-processing

Training your own models

Reproducing xAI results

Reference

Source Code

Contact:

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
datasets		datasets
evaluation		evaluation
models		models
README.md		README.md
requirements.txt		requirements.txt

FAIR4HEP/xAI4toptagger

Folders and files

Latest commit

History

Repository files navigation

Preparing the repository

Download Data

Make necessary directories

Setup necessary environment!

Data pre-processing

Training your own models

Reproducing xAI results

Reference

Source Code

Contact:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages