- Title: Unsupervised Image-to-Image Translation Networks
- Authors: Ming-Yu Liu, Thomas Breuel, Jan Kautz
- Link: https://arxiv.org/abs/1703.00848
- Tags: Neural Network, GAN
- Year: 2017
-
What
- They present a method to learn mapping functions that transform images from one style to another style. (E.g. photos from daylight to nighttime.)
- Their method only requires example images for both styles (i.e. class labels per image).
-
How
- Architecture
- Their method is based on VAEs (i.e. autoencoders) and GANs.
- Their architecture is kinda similar to an autoencoder.
- For an image style
A
, an encoderE
first transform an image to a vector representationz
. Then a generatorG
transformsz
into an image. - There are two encoders (
E1
,E2
), one per image style (A
,B
). - There are two generators (
G1
,G2
), one per image style (A
,B
). - There are two discriminators (
D1
,D2
), one per generator (and therefore style). - An image can be changed in style from
A
toB
using e.g.G2(E1(I_A))
. - The weights of the encoders are mostly tied/shared. Only the last layers are not-shared.
- The weights of the generators are mostly tied/shared. Only the last layers are not-shared.
- They use 3 convs + 4 residual blocks for the encoders and 4 residual blocks + 3 transposes convs for the generators. They use normal convs for the discriminators. Nonlinearities are LeakyReLUs.
- The encoders are VAEs and trained with common VAE-losses (i.e. lower bound optimization).
However, they only predict mean values per component in
z
, not variances. The variances are all1
. - Visualization of the architecture:
- Loss
- Their loss consists of three components:
- VAE-loss: Reconstruction loss (absolute distance) and KL term on z (to keep it close to the standard normal distribution). Most weight is put on the reconstruction loss.
- GAN-loss: Standard as in other GANs, i.e. cross-entropy.
- Cycle-Consistency-loss: For an image
I_A
, it is expected to look the same after switching back and forth between image styles, i.e.I_A = G1(E2( G2(E1(I_A)) ))
(switch from style A to B, then from B to A). The cycle consistency loss uses a reconstruction loss and two KL-terms (one for the first E(.) and one for the second).
- Their loss consists of three components:
- Architecture
-
Results
- When testing on the (satellite) map dataset:
- Weight sharing between encoders and between generators improved accuracy.
- The cycle consistency loss improved accuracy.
- Using 4-6 layers (as opposed to just 3) in the discriminator improved accuracy.
- Translations that added details (e.g. night to day) were harder for the model.
- After training, the features from each discriminator seem to be quite good for the respective dataset (i.e. unsupervised learned features).
- Example translations:
- When testing on the (satellite) map dataset: