- Title: Perceptual Losses for Real-Time Style Transfer and Super-Resolution
- Authors: Justin Johnson, Alexandre Alahi, Li Fei-Fei
- Link: https://arxiv.org/pdf/1603.08155.pdf
- Tags: Super-Resolution, Content Loss, Neural Network, Styles Transfer
- Year: 2016
- Using MSE as the loss function in super-resolution and style transfer algorithms results in overly smooth images lacking high-frequency detail, as the per-pixel loss does not capture perceptual differences between output and ground-truth.
- High-quality images can be generated using perceptual loss functions based instead on differences between high-level feature representations extracted from pre-trained CNNs.
- This paper combines the benefits of feed-forward image transformation networks and perceptual loss functions for improved qualitative results in style transfer and single image super-resolution (SISR) tasks.
- Single-Image Super Resolution
- Traditional metrics used to evaluate super-resolution performance are PSNR and SSIM, both of which have been shown to correlate poorly with human assessment of "image quality".
- Since PSNR is equivalent to per-pixel loss, a model trained to optimize for PSNR should always outperform a model trained to minimize feature reconstruction loss. As a result, this method is intended to showcase the qualitative improvement in quality.
- Pre-trained CNNs for image classification help address the shortcomings of per-pixel losses as the have already learned to encode the perceptual and semantic information we'd like to measure in our loss.
- The pre-trained loss network is used to define loss functions that measure the perceptual difference between output images and ground truth examples.
- Single-Image Super Resolution
- Super-resolution network trained with feature reconstruction loss to allow transfer of semantic knowledge from loss networks
- Paper focuses on x4 and x8 scale super-resolution since larger scale factors require more semantic reasoning about input
- Single-Image Super Resolution
-
Compared to state of the art SRCNN, this model trained for feature reconstruction does a better job at reconstructing sharp details and find details, such as eyelashes on the baby image below.
-
Feature reconstruction loss results in a slight cross-hatch pattern visible under magnification, which harms its PSNR and SSIM vs SRCNN.
-
The same network with per-pixel loss results in fewer artifacts and higher PSNR, but the feature loss results in better fine details and more pleasing qualitative visual results.
-
Performance for x4 evaluated on Set5 dataset.
Method PSNR SSIM Bicubic 28.43 0.8114 SRCNN 30.48 0.8628 Per-Pixel Loss 28.40 0.8205 Perceptual Loss 27.09 0.7680
-
- SISR Model Architecture and Training
- Dataset: MS-COCO (10k images with 288x288 patches)
- Pre-processing: Blur with Gaussian kernel of width 1.0 and downsample with bicubic interpolation
- Batch size of 4 for 200k iterations using Adam with learning rate of 1*10^-3 with no weight decay or dropout
- Minimize feature reconstruction loss at layer relu2_2 from the VGG-16 loss network
- Post-process by performing histogram matching between network output and LR input