Paper Details

Summary

Using MSE as the loss function in super-resolution and style transfer algorithms results in overly smooth images lacking high-frequency detail, as the per-pixel loss does not capture perceptual differences between output and ground-truth.
High-quality images can be generated using perceptual loss functions based instead on differences between high-level feature representations extracted from pre-trained CNNs.
This paper combines the benefits of feed-forward image transformation networks and perceptual loss functions for improved qualitative results in style transfer and single image super-resolution (SISR) tasks.
Single-Image Super Resolution
- Traditional metrics used to evaluate super-resolution performance are PSNR and SSIM, both of which have been shown to correlate poorly with human assessment of "image quality".
- Since PSNR is equivalent to per-pixel loss, a model trained to optimize for PSNR should always outperform a model trained to minimize feature reconstruction loss. As a result, this method is intended to showcase the qualitative improvement in quality.

Pre-trained CNNs for image classification help address the shortcomings of per-pixel losses as the have already learned to encode the perceptual and semantic information we'd like to measure in our loss.
The pre-trained loss network is used to define loss functions that measure the perceptual difference between output images and ground truth examples.
Single-Image Super Resolution
- Super-resolution network trained with feature reconstruction loss to allow transfer of semantic knowledge from loss networks
- Paper focuses on x4 and x8 scale super-resolution since larger scale factors require more semantic reasoning about input