This repository presents an implementation in TensorFlow v2 of NICE and RealNVP models, as described in the 2014 paper authored by Laurent Dinh, David Krueger, and Yoshua Bengio and in the 2016 paper authored by Laurent Dinh, Jascha Sohl-Dickstein, Samy Bengio. The NICE model serves as the foundational layer for subsequent normalizing flow models.
The main idea behind the model is to:
- Transform the inital unknown data distribution into a latent space with a known density via an invertible function.
- Train the model by maximizing the known likelihood of the mapped data distribution by the change-of-variable rule.
- Sample from the known density and invert the sampled points to reconstruct the original data space.
The following is the mathematical implementation of the previously discussed process:
- Define the "latent" hidden space known distribution as the product of independent Logistic or Gaussian univariate densities:
- Map the initial data distribution to the hidden space distribution via
$f$ , parametrized by the parameters$\theta$ :
- Compute the latent representation and density
- Compute the likelihood of the latent space variables
$h$ via the change of variable formula:
- Samples of the initial data distribution are computed by inverting the samples from the hidden space distribution:
Since
- Partition the initial data space into two partitions
$x_{a}\in\mathbb{R}^{D-b}$ and$x_{b}\in\mathbb{R}^{D-a}$ - Apply a transformation
$g$ only on one partition:
The inverse of this coupling function will be:
The jacobian of this function is lower triangular and has unit determinant since:
and the resulting determinant is:
To make the function more flexible the authors propose to multiply the output of the final coupling transformation with an invertible function which is applied element wise:
The jacobian of this function is diagonal and the resulting determinant is the product of the diagonal components:
In the paper RealNVP the authors combined the addition and scaling couplings to jointly learn to translate and scale the base density space with input dependent translation and scaling parameters. The coupling takes the following form:
The inverse coupling function will be:
where
The jacobian of this function is lower triangular and has unit determinant since:
For further details, please refer to the NICE paper.