Trained all models on Kaggle.
SRCNN notebook - https://www.kaggle.com/vedantvijaydalimkar/srcnn-comparison/edit/run/88250590
SRResNet notebook - https://www.kaggle.com/vedantvijaydalimkar/srresnet-comparison
SRResNet w/ perceptual loss - https://www.kaggle.com/vedantdalimkar/srresnet-perceptual-loss
Implemented the model based on the original super resolution paper. Architecture of the model-
Used Pytorch Lightning's built in learning rate finder to find the optimal learning rate. Used the largest possible batch size that could fit in the memory which was 16. Implemented a custom dataset class, modified the getitem method which applies 2 transformations on a single image and returns 2 images - one high resolution image and one low resolution image. Had to resize all images to a common size as they had varying sizes.
The function compare() compares the performance of the model for different types of image interpolations done while preprocessing (resizing) the images.
Results: Using Bilinear interpolation gives the best results compared to Bicubic/Nearest Neighbour interpolations.
Comparison:
MSE Loss on test set:
Bilinear - 0.002147814491763711
Nearest Neighbour - 0.005143341142684221
Bicubic - 0.002530161291360855
Followed the SRGAN paper for the implementation. Used learning rate of 1e-4 as mentioned in the paper. In the paper, 'B' is defined as the number of residual blocks used in the network. In the paper, they have used B = 16. Ran the model for different B values (tried B=8,12,16), got the best results for B = 12. Architecture of the model- (k stands for kernel size, n for the number of feature maps and s stands for the stride use)
Also implemented SRResNet with a perceptual loss function where a pretrained CNN model is used as loss network. The objective of the loss function is to minimise the differences in the feature representations which are extracted from the pretrained loss network. Used a VGG 16 pretrained model as the loss network. Perceptual loss can be explained simply as: (quoting from the perceptual loss paper by J Johnson)
Rather than encouraging the pixels of the output image to exactly match the pixels of the target image y, we instead encourage them to have similar feature representations as computed by the loss network.
Trained SRResNet on some other loss functions too such as L1 loss function and Peak signal-to-noise ratio. (Trained for 10 epochs)
Comparison: