This repository contains code for reproducing the papers "MoGlow: Probabilistic and controllable motion synthesis using normalising flows" and "Style-controllable speech-driven gesture synthesis using normalising flows". Parts of the code are based on this Glow implementation by GitHub user chaiyujin.
Please watch the following videos for an introduction to the papers:
- MoGlow: https://youtu.be/pe-YTvavbtA
- Style Gestures: https://youtu.be/egf3tjbWBQE
There is also a separate MoGlow project page.
The conda environment moglow
defined in 'environment.yml' contains the required dependencies.
Our preprocessed version of the human locomotion data is available here. Download it to the 'data/locomotion' folder. The data is pooled from the Edinburgh Locomotion MOCAP Database, CMU Motion Capture Database, and HDM05 datasets. Please see the included README file for licenses and citations.
We additionally trained MoGlow on locomotion data with joint angles parameterised using exponential maps. This allows synthesising motion for skinned characters. Here, we pooled the LaFAN1 and Kinematica datasets, retargeted the motion to a uniform skeleton using Motion Builder, and then converted the data to BVH format. Unfortunately we cannot provide the processed bvh files due to the dataset licences. To reproduce these models, please redo these steps and place the bvh files in a data/locomotion_rot/source/bvh/<some_subset>
folder. Then run python prepare_locomotion_datasets.py
from the 'data_processing' folder. Here, <some_subset>
should be replaced by the motion subset of interest; for example, we put all locomotion-related sessions in a 'loco_only' folder and all motion except wall-climbing in an 'all' folder.
We used part I of the Trinity Speech-Gesture Dataset to train our StyleGesture models. Trinity College Dublin requires interested parties to sign a license agreement and receive approval before gaining access to that material, so we cannot host it here. Once you have agreed to the data license available here, our processed version of the data, with motion data converted to BVH format (joint angles) and synchronised to the audio, can be downloaded here. This is the same data that was used for the GENEA Challenge 2020, and comes with a readme file with additional information and stipulations. Place the data in the 'data/GENEA/source' folder and then run python prepare_gesture_datasets.py
from the 'data_processing' folder.
Edit the 'hparams/xxx.json' file to modify network and training parameters. Start training by running the following command:
python train_moglow.py <hparams> <dataset>
Example 1: For training a locomotion model (joint positions):
python train_moglow.py 'hparams/preferred/locomotion.json' locomotion
Example 2: For training a locomotion model (joint angles):
python train_moglow.py 'hparams/preferred/locomotion_rot.json' locomotion_rot
Example 3: For training a gesture model:
python train_moglow.py 'hparams/preferred/style_gestures.json' trinity
Important note: Although the code allows multi-GPU training, this is not supported as it leads to incorrect results.
Output samples are generated at specified intervals during training. Sampling from a pre-trained model is done by specifying the path in the 'hparams/xxx.json' file and then running python train_moglow.py <hparams> <dataset>
.
The code also includes an option to change the latent distribution from the standard Gaussian to a Student's t distribution. We found that this change, which we term "Studentising flows", provides more robust training (gradient clipping can be removed) and gives better likelihood on held-out data. Please see our INNF+ 2020 paper "Robust model training and generalisation with Studentising flows" to read more.
If you use our code or build on our method, please credit our publications:
@article{henter2020moglow,
author = {Henter, Gustav Eje and Alexanderson, Simon and Beskow, Jonas},
doi = {10.1145/3414685.3417836},
journal = {ACM Transactions on Graphics},
number = {4},
pages = {236:1--236:14},
publisher = {ACM},
title = {{M}o{G}low: {P}robabilistic and controllable motion synthesis using normalising flows},
volume = {39},
year = {2020}
}
@article{alexanderson2020style,
title = {Style-controllable speech-driven gesture synthesis using normalising flows},
author = {Alexanderson, Simon and Henter, Gustav Eje and Kucherenko, Taras and Beskow, Jonas},
journal = {Computer Graphics Forum},
volume = {39},
number = {2},
pages = {487--496},
year = {2020},
url = {https://diglib.eg.org/handle/10.1111/cgf13946},
doi = {10.1111/cgf.13946},
publisher = {John Wiley \& Sons}
}
If you use Studentising flows, please cite the following paper:
@inproceedings{alexanderson2020robust,
author = {Alexanderson, Simon and Henter, Gustav Eje},
booktitle = {Proceedings of the ICML Workshop on Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models},
series = {INNF+'20},
pages = {25:1--25:9},
title = {Robust model training and generalisation with {S}tudentising flows},
url = {https://arxiv.org/abs/2006.06599},
volume = {2},
year = {2020}
}