Skip to content

Distributed Training of Bayesian Neural Networks at Scale

License

Notifications You must be signed in to change notification settings

Himscipy/bnn_hvd

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bayesian Neural Network (BNN) Distributed Training

License: MIT

The repo consist codes for preforming distributed training of Bayesian Neural Network models at scale using High Performance Computing Cluster such as ALCF (Theta). The main purpose of the code is to act as a tutorial for getting started with distibuted training of BNN's on High Performace Computing clusters. The BNN's are also used in one of my works for Gravitational Wave parameters link where Neural Networks were combined with Bayesian Neural Network Layers. The dataset and the code is available on Theta and restricted to the mmadsp users only. For furthter details about ADSP contact Argonne ALCF support.

The BNN models are implemented using the Tensorflow-probability libarary. The data distribted training is performed using Horovod.

Brief Background on BNN:

Bayesian Neutal Networks is one of approaches used to capture network uncertainity. The uncertainities in Bayesian modeling can be classified under two categories;

  1. Aleatoric uncertainity
  2. Epistemic uncertainity.

The Aleatoric uncertainity tries to capture noise inherent with the observations/data. The noise in data is associated with sensor measurement noise. Epistemic unceratinity is associated with model parameters, and with increasing the data the uncertainity can be reduced. The Aleoteric uncertainity is further divided into Homoscedastic and Heteroscedastic.

  • Homoscedastic uncertainty: uncertainty which stays constant for different inputs, and heteroscedastic uncertainty.
  • Heteroscedastic uncertainty: depends on the inputs to the model, with some inputs potentially having more noisy outputs than others. This is particuraly important to avoid model over-confident predictions.

The Epistemic uncertainty is modelled by putting a prior distribution over the model parameters/weights and compute how these weights varies and converges, which are done in case of Bayesian Neural Networks. While in case of Aleoteric uncertainity are modelled by putting distibutions on the output of the model. Further, details about the Bayesian Network and Variationa inference for training can be found in the Jupyter-Notebook.

Code Dependencies:

Dataset:

  • MNIST hand-written digit dataset sample images below.

  • CIFAR-10 The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.

Models:

  • Bayesian Neural Network with Flipout Fully Connected Layer.('BNN_conv_flip')
  • Bayesian Neural Network with Non-Flipout Fully Connected Layer.('BNN_conv_nonflip')
  • Bayesian Neural Network with Flipout Convolutional Layers.('BNN_FC_flip')
  • Bayesian Neural Network with Non-Flipout Convolutional Layers.('BNN_FC_nonflip)
  • Bayesian Neural Network with Flipout Convolutional Layers (3-VGG Blocks) for CIFAR-10 data.('CIFAR10_BNN_model')
  • Convolutional Neural Network ('CNN_Conv')
  • Fully Connected Neural Network ('CNN_FC')

How to run the code: