The repo consist codes for preforming distributed training of Bayesian Neural Network models at scale using High Performance Computing Cluster such as ALCF (Theta). The main purpose of the code is to act as a tutorial for getting started with distibuted training of BNN's on High Performace Computing clusters. The BNN's are also used in one of my works for Gravitational Wave parameters link where Neural Networks were combined with Bayesian Neural Network Layers. The dataset and the code is available on Theta and restricted to the mmadsp users only. For furthter details about ADSP contact Argonne ALCF support.
The BNN models are implemented using the Tensorflow-probability libarary. The data distribted training is performed using Horovod.
Bayesian Neutal Networks is one of approaches used to capture network uncertainity. The uncertainities in Bayesian modeling can be classified under two categories;
- Aleatoric uncertainity
- Epistemic uncertainity.
The Aleatoric uncertainity tries to capture noise inherent with the observations/data. The noise in data is associated with sensor measurement noise. Epistemic unceratinity is associated with model parameters, and with increasing the data the uncertainity can be reduced. The Aleoteric uncertainity is further divided into Homoscedastic and Heteroscedastic.
- Homoscedastic uncertainty: uncertainty which stays constant for different inputs, and heteroscedastic uncertainty.
- Heteroscedastic uncertainty: depends on the inputs to the model, with some inputs potentially having more noisy outputs than others. This is particuraly important to avoid model over-confident predictions.
The Epistemic uncertainty is modelled by putting a prior distribution over the model parameters/weights and compute how these weights varies and converges, which are done in case of Bayesian Neural Networks. While in case of Aleoteric uncertainity are modelled by putting distibutions on the output of the model. Further, details about the Bayesian Network and Variationa inference for training can be found in the Jupyter-Notebook.
- python >= 3.5
- requirements.txt
-
MNIST hand-written digit dataset sample images below.
-
CIFAR-10 The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.
- Bayesian Neural Network with Flipout Fully Connected Layer.('BNN_conv_flip')
- Bayesian Neural Network with Non-Flipout Fully Connected Layer.('BNN_conv_nonflip')
- Bayesian Neural Network with Flipout Convolutional Layers.('BNN_FC_flip')
- Bayesian Neural Network with Non-Flipout Convolutional Layers.('BNN_FC_nonflip)
- Bayesian Neural Network with Flipout Convolutional Layers (3-VGG Blocks) for CIFAR-10 data.('CIFAR10_BNN_model')
- Convolutional Neural Network ('CNN_Conv')
- Fully Connected Neural Network ('CNN_FC')
-
On the local machine Running:
horovodrun -n 2 -H localhost:2 python CNN_BNN_Model.py --flagfile=config_file.cfg
-
ALCF high performance Computing Cluster (Theta):
PPN=1 # 32,16,8 MPIRank Per Node (Process Per Node) NUM_THDS=128 aprun -n $((${COBALT_PARTSIZE} * ${PPN})) -N ${PPN} -cc depth -j 2 -d ${NUM_THDS} \ -e OMP_NUM_THREADS=${NUM_THDS} -b python <path to the code>/CNN_BNN_Model.py \ --flagfile=config_file.cfg
-
The submission script is provided in the repository.
-
Running the job with Balsam (Theta):
- Look the detail README.md
-
For other information about the configuration of running the code, use help function as follows;
python CNN_BNN_Model.py --help
-
Example Results:
-
The comparison of the BNN and CNN time to train with increasing number of nodes are shown in the Fig-1 shown below.
-
The comparison of the Speed-Up between the BNN and CNN using the training time can be also compared in Fig-2 shown below.
-
The training of the Bayesian Network is to find optimal distribution of the training parameters which done using the technique of Variational Inference(VI). As the training iteration progress the weights posterior converges. An example is shown below with the weights initialized with the Gaussian prior in Fig-3.
-
As the model is trained once and the posteriors for the weights are converged. The model is used for performing inference. The inference is perfomed by running the model over and over again (Monte-Carlo iterations). The output of the model returns the prediction distribution as shown below for MC iterations of 300 and with a BNN Fully Connected model.
-
-
Research Articles:
Papers related to Bayesian Neural Networks:
- Probabilistic Backpropagation for ScalableLearning of Bayesian Neural Networks
- Weight Uncertainty in Neural Networks
- Practical Variational Inference for Neural Networks
- A Comprehensive guide to Bayesian Convolutional NeuralNetwork with Variational Inference
- Flipout: Efficient Pseudo-Independent Weight Perturbations on Mini-Batches
Papers for Gravitational Bayesian Model:
-
Additional Resources:
-
Contact
- Himanshu Sharma (himanshu90sharma@gmail.com)
- Elise Jennings
-
Ackowledegment
This research used resources of the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357. This research was funded in part and used resources of the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357. This work describes objective technical results and analysis. Any subjective views or opinions that might be expressed in the work do not necessarily represent the views of the U.S. DOE or the United States Government. Declaration of Interests - None.