Welcome, to learn more about implementation of Normalization and Regularization using Pytorch
, please continue reading patiently.
- Write a single model.py file that includes Group Normalization/Layer Normalization/Batch Normalization and takes an argument to decide which Normalization to include.
- Write a single notebook file to run all the 3 models above for 20 epochs each, using model.py.
- Create these graphs:
Graph 1: Test/Validation Loss for all 3 models together.
Graph 2: Test/Validation Accuracy for 3 models together.
Graphs must have proper annotation. - Find 10 misclassified images for each of the 3 models, and show them as a 5x2 image matrix in 3 separately annotated images.
Lets begin!
Lets understand a bit about the 3 Normalizations that we have used, namely, Batch Normalization, Layer Normalization and Group Normalization.
Consider the following setup
We have two layers with batch size of 4, meaning 4 images in each batch. Each of the 4 2x2 matrices under a layer represent a channel.
Here while calculating mean and variance, its calculated across the individual channels of each batch, which can be seen in the image above highlighted in blocks of same colour. We have 4 means and variances as we have 4 channels, calculations are done for each channel.
For Layer Normalization, we calculate mean and variance across all the channels of the layer, this is highlighted by the red block that spans horizontally across all channels. We have 4 means and variances here as well as we have 4 images and its calculated across all channels of an image.
In Group Normalization, each layer is divided into groups. Mean and variance are calculated for these groups, as highlighted by the dotted rectangles. The channels are grouped, and in our case its grouped into 2s. So we end up with 8 groups in all. And hence we have 8 means and variances.
If you are interested, you can check out the complete implementation of whats explained above in an excel sheet HERE
Lets now move onto the implementation part.
We have used MNIST dataset to implement Normalizations.
Pytorch implementation of our experiment is split across two scripts:
- Models with all the 3 Normalizations are implemented separately and you can find them in model.py.
- Jupyter notebook file with complete end to end implemenation of the 3 experiements which call model.py for the network. Click HERE to view code.
Number of training samples: 60000
Number of test samples: 10000
- Random Rotations
- Color Jitter
- Image Normalization
- Batch Normalization
- Group Normalization
- Layer Normalization
- L1 Regularization
Used Regularization factor of 0.0001. Used only with Batch Normalization.
- Model 1 - Group Normalization
Train Accuracy: 99.60
Test Accuracy: 99.54
- Model 2 - Layer Normalization
Train Accuracy: 99.61
Test Accuracy: 99.48
- Model 3 - Batch Normalization + L1
Train Accuracy: 99.46
Test Accuracy: 99.47
- Best Train and Test Accuracy was achieved with Group Normalization.
- Best performance with respect to least difference between Train and Test was achieved by Batch Normalization with L1 Regularization. The added regularization clearly helped reduced overfitting (minor).
- The most overfitted among the 3 models was the one with Layer Normalization, although, not by a lot.
- Layer Normalization is a special case of Group Normalization wherein we select the group count as 1. As a result, all the channels in the layer will be normalized at once.
Here, we use nn.GroupNorm(1,num_channels) after each Conv2d layer to implement Layer Normalization.
- Group Normalization
- Layer Normalization
- Batch Normalization + L1
Abhiram Gurijala
Arijit Ganguly
Rohin Sequeira