Paper: https://arxiv.org/abs/2011.07248
Blog Post: http://keller.org/research/2020-10-21-self-normalizing-flows/
conda env create -f conda_environment.yml
Install the snf package locally for development. This allows you to run experiments with the snf
command. At the root of the project directory run:
pip install -e .
Note that this package requires Ninja to build the C++ extensions required to efficiently compute the Self-Normalizing Flow gradient, however this should be installed automatically with pytorch.
This repository uses Weight & Biases for experiment tracking. By deafult this is set to off. However, if you would like to use this (highly recommended!) functionality, all you have to do is install weight & biases as follows, and then set 'wandb': True
, 'wandb_project': YOUR_PROJECT_NAME
, and 'wandb_entity': YOUR_ENTITY_NAME
in the default experiment config at the top of snf/train/experiment.py.
To install Weights & Biases follow the quickstart guide here, or, simply run pip install wandb
(with the pip associated to your conda environment), followed by: wandb login
Make sure that you have first created a weights and biases account, and filled in the correct wandb_project
and wandb_entity
in the experiment config.
To rerun the experiments from table 1, you can run the following commands:
snf --name 'selfnorm_fc_mnist'
snf --name 'exact_fc_mnist'
snf --name 'selfnorm_cnn_mnist'
snf --name 'exact_cnn_mnist'
snf --name 'emerging_cnn_mnist'
snf --name 'exponential_cnn_mnist'
snf --name 'conv1x1_glow_mnist'
snf --name 'selfnorm_glow_mnist'
To rerun the experiments from table 2:
snf --name 'conv1x1_glow_cifar'
snf --name 'selfnorm_glow_cifar'
snf --name 'conv1x1_glow_imagenet'
snf --name 'selfnorm_glow_imagenet'
To recreate the results of figure 4, run:
snf --name 'snf_timescaling'
To recreate the results of Table A.5 (using improved constrained optimization), run:
snf --name 'geco_selfnorm_glow_mnist
- All models are built using the
FlowSequential
module (see snf/layers/flowsequential.py)- This module iterates through a list of
FlowLayer
orModifiedGradFlowLayer
modules, repeatedly transforming the input, while simultaneously accumulating the log-determinant of the jacobian of each transformation along the way. - Ultimately, this layer returns the total normalized log-probability of input by summing the log-probability of the transformed input under the base distribuiton, and the accumulated sum of log jacobian determinants (i.e. using the change of variables rule).
- This module iterates through a list of
- The
Experiment
class (see snf/train/experiment.py) handles running the training iterations, evaluation, sampling, and model saving & loading. - All experiments can be found in
snf/experiments/
, and require the specification of a model, optimizer, dataset, and config dictionary. See below for the currently implemented options for the config. - All layer implementations can be found in
snf/layers/
including the self-normalizing layer found at snf/layers/selfnorm.py
The configuration dictionary is mainly used to modify the training procedure specified in the Experiment class, but it can also be used to modify the model architecture if desired. A non-exhaustive list of config options and descriptions are given below, note that config options which modify model architecture may not have any effect if they are not explicitly incorporated in the create_model()
function of the experiments. The default configuration options are specified at the top of the experiment file. The configuration dictionary passed to the Experiment class initalizer then overwrites these defaults if the key is present.
-
'modified_grad'
: bool, if True, use the self-normalized gradient in place of the true exact gradient. This also causes self-normlizing flow layers to return 0 for their log-jacobian-determinant during training. During evaluation, this option has no effect, and log-likelihoods are computed exactly. -
'add_recon_grad'
: bool, if True, for all layers which inherit fromSelfNormConv
(includesSelfNormFC
), compute the layer-wise reconstruction loss and add it to the standard gradient. -
'lr'
: float, learning rate -
'epochs'
: int, total training epochs -
'eval_epochs'
: int, number of epochs between computing true log-likelihood on validation set. -
'eval_train'
: bool, if True, also compute true-log-likelihood on train set during eval phase. -
'max_eval_ex'
: int, maximum number of examples to run evaluation on (defaultinf
). Useful for models with extremely computationally expensive inference procedures. -
'warmup_epochs'
: int, number of epochs over which to linearly increase learning rate from$0$ tolr
. -
'batch_size'
: int, number of samples per batch -
'recon_loss_weight'
: float, Value of$\lambda$ weight on reconstruction gradient. -
'sym_recon_grad'
: Bool, if True, andadd_recon_grad
is true, use a symmetric version of the reconstruction loss for increased stability. -
'grad_clip_norm'
: float, maximum magnitude which to scale gradients to if greater than.
'activation'
: str: Name of activation function to use for FC and CNN models (one ofSLR, LLR, Spline, SELU
).'num_blocks'
: int, Number of blocks for glow-like models'block_size'
: int, Number of setps-of-flow per block in glow-like models'num_layers'
: int, Number of layers for CNN and FC models'actnorm'
: bool, if True, use ActNorm in glow-like models'split_prior'
: bool, if True, use Split Priors between each block in glow-like models
'wandb'
: bool, if True, use weights & biases logging'wandb_project'
: str, Name of weights & biases project to log to.'wandb_entity'
: str, username or team name for weights and biases.'name'
: str, experiment name file-saving, and for weights & biases'notes'
: str, experiment notes for weights & biases'sample_epochs'
: int, epochs between generating samples.'n_samples'
: int, number of samples to generate'sample_dir'
: str, directory to save samples'checkpoint_path'
: str, path of saved checkpoints (at best validation likelihood). If unspecified, and using weights and biases, this defaults tocheckpoint.tar
in the WandB run directory. If not using weights and biases, this defaults tof"./{str(self.config['name']).replace(' ', '_')}_checkpoint.tar"
.'log_interval'
: int, number of batches between printing training loss.'verbose'
: bool, if True, log the log-jacobian-determinant and reconstruction loss per layer separately to weights and biases.'sample_true_inv'
: bool, if True, generate samples from the true inverse of a self-normalizing flow model, in addition to samples from the approximate (fast) inverse.'plot_recon'
: bool, if True, plot reconstruction of training images.'log_timing'
: bool, if True, compute mean and std. of time per batch and time per sample. Print to screen and save as summary statistic of experiment.
The Robert Bosch GmbH is acknowledged for financial support.