Skip to content

Latest commit

 

History

History
71 lines (43 loc) · 4.04 KB

README.md

File metadata and controls

71 lines (43 loc) · 4.04 KB

Image Super Resolution

Trained all models on Kaggle.

SRCNN notebook - https://www.kaggle.com/vedantvijaydalimkar/srcnn-comparison/edit/run/88250590

SRResNet notebook - https://www.kaggle.com/vedantvijaydalimkar/srresnet-comparison

SRResNet w/ perceptual loss - https://www.kaggle.com/vedantdalimkar/srresnet-perceptual-loss

SRCNN

Implemented the model based on the original super resolution paper. Architecture of the model-

image

Used Pytorch Lightning's built in learning rate finder to find the optimal learning rate. Used the largest possible batch size that could fit in the memory which was 16. Implemented a custom dataset class, modified the getitem method which applies 2 transformations on a single image and returns 2 images - one high resolution image and one low resolution image. Had to resize all images to a common size as they had varying sizes.

The function compare() compares the performance of the model for different types of image interpolations done while preprocessing (resizing) the images.

Results: Using Bilinear interpolation gives the best results compared to Bicubic/Nearest Neighbour interpolations.

Comparison:

High Resolution images

image

Model output for Bilinear interpolation

image

Model output for Nearest Neighbour interpolation

image

Model output for Bicubic interpolation

image

MSE Loss on test set:

Bilinear - 0.002147814491763711

Nearest Neighbour - 0.005143341142684221

Bicubic - 0.002530161291360855

SRResNet

Followed the SRGAN paper for the implementation. Used learning rate of 1e-4 as mentioned in the paper. In the paper, 'B' is defined as the number of residual blocks used in the network. In the paper, they have used B = 16. Ran the model for different B values (tried B=8,12,16), got the best results for B = 12. Architecture of the model- image (k stands for kernel size, n for the number of feature maps and s stands for the stride use)

Also implemented SRResNet with a perceptual loss function where a pretrained CNN model is used as loss network. The objective of the loss function is to minimise the differences in the feature representations which are extracted from the pretrained loss network. Used a VGG 16 pretrained model as the loss network. Perceptual loss can be explained simply as: (quoting from the perceptual loss paper by J Johnson)

Rather than encouraging the pixels of the output image to exactly match the pixels of the target image y, we instead encourage them to have similar feature representations as computed by the loss network.

Trained SRResNet on some other loss functions too such as L1 loss function and Peak signal-to-noise ratio. (Trained for 10 epochs)

Comparison:

High Resolution images

image

Model output for SRResNet trained using perceptual loss function

image

Model output for SRResNet trained using Peak signal-to-noise ratio loss function

image

Model output for SRResNet trained using MAE loss function

image