AISLe

This is the official repository for the paper No Gestures Left Behind: Learning Relationships between Spoken Language and Freeform Gestures

Chaitanya Ahuja, Dong Won Lee, Ryo Ishii, Louis-Philippe Morency - EMNLP Findings 2020

Links: Paper, Dataset Website [1]

Bibtex:

@inproceedings{ahuja2020no,
  title={No Gestures Left Behind: Learning Relationships between Spoken Language and Freeform Gestures},
  author={Ahuja, Chaitanya and Lee, Dong Won and Ishii, Ryo and Morency, Louis-Philippe},
  booktitle={Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings},
  pages={1884--1895},
  year={2020}
}

Overview

This repo has information on the training code and pre-trained models.

For the dataset, we refer you to:

Dataset Website for downloading the dataset
Dataset Repo for scripts to download the audio files and other dataloader arguments.

For the purposes of this repository, we assume that the dataset is downloaded to ../data/

This repo is divided into the following sections:

Clone
Set up environment
Training
Inference
Rendering

This is followed by additional informational sections:

Experiment Files

Clone

Clone only the master branch,

git clone -b master --single-branch https://github.com/chahuja/mix-stage.git

Set up Environment

pycasper

mkdir ../pycasper
git clone https://github.com/chahuja/pycasper ../pycasper
ln -s ../pycasper/pycasper .

Create an anaconda or a virtual enviroment and activate it

pip install -r requirements.txt

Training

To train a model from scratch, run the following script after chaging directory to src,

python train.py \
 -weighted 400 \ ## argument to run AISLe for adaptive reweighting; to be used with `gan=1`; the number refers to the number of iterations per epoch
 -cpk JointLateClusterSoftTransformer12_G \ ## checkpoint name which is a part of experiment file PREFIX
 -exp 1 \ ## creates a unique experiment number
 -path2data ../data ## path to data files
 -speaker '["oliver"]' \ ## Speaker
 -model JointLateClusterSoftTransformer12_G \ ## Name of the model
 -note aisle \ ## unique identifier for the model to group results
 -save_dir save/aisle \ ## save directory
 -modalities '["pose/normalize", "text/tokens", "audio/log_mel_400"]' \ ## all modalities as a list. output modality first, then input modalities
 -repeat_text 0 \ ## tokens are not repeated to match the audio frame rate
 -fs_new '15' \ ## frame rate of each modality
 -input_modalities '["text/tokens", "audio/log_mel_400"]' \ ## List of input modalities
 -output_modalities '["pose/normalize"]' \ ## List of output modalities
 -gan 1 \ ## Flag to train with a discriminator on the output
 -loss L1Loss \ ## Choice of loss function. Any loss function torch.nn.* will work here
 -window_hop 5 \ ## Hop size of the window for the dataloader
 -render 0 \ ## flag to render. Default 0
 -batch_size 32 \ ## batch size
 -num_epochs 100 \ ## total number of epochs
 -min_epochs 50 \ ## early stopping can occur after these many epochs occur
 -overfit 0 \ ## flag to overfit (for debugging)
 -early_stopping 0 \ ## flag to perform early stopping 
 -dev_key dev_spatialNorm \ ## metric used to choose the best model
 -num_clusters 8 \ ## number of clusters in the Conditional Mix-GAN
 -feats '["pose", "velocity", "speed"]' \ ## Festures used to make the clusters
 -optim AdamW \ ## AdamW optimizer
 -lr 0.0001 \ ## Learning Rate
 -optim_separate 0.00003 \ ## Use a separate recommended optimizer and learning rate schedule for the language encoder BERT

Example scripts for training models in the paper can be found as follows,

Inference

Inference for quantitative evaluation

python sample.py \
-load <path2weights> \ ## path to PREFIX_weights.p file
-path2data ../data ## path to data

Pre-trained models (UPDATE : March 17, 2021)

Download pretrained models and unzip them in the src folder,

cd aisle/src
wget -O pretrained.zip.part-aa https://cmu.box.com/shared/static/4c2a7fax036sniupxajf7mt35osabrc7.part-aa
wget -O pretrained.zip.part-ab https://cmu.box.com/shared/static/lpnhd91xf228bx6wugf034a13cltv162.part-ab
cat pretrained.zip.part-* > pretrained.zip
unzip pretrained.zip

Once you unzip the file, all the pretrained models can be found in the save/pretrained_models.

An example of sampling gesture animations from a pretrained model:

python sample.py \
-load save/pretrained_models/aisle/lec_cosmic/exp_3233_cpk_mmsbert_lfiw_no_update3_speaker_\[\'lec_cosmic\'\]_model_JointLateClusterSoftTransformer12_G_note_mmsbert_lfiw_no_update3_weights.p \
-path2data ../data

We also release a script to extract the reported results from the pretrained models in emnlp2020-results.ipynb which requires the latest version of pycasper.

Rendering

python render.py \
-render 20 \ ## number of intervals to render
-load <path2weights> \ ## path to PREFIX_weights.p file
-render_text 1 ## if 1, render text on the video as well.
-path2data ../data ## path to data

Experiment Files

Every experiment multiple files with the same PREFIX:

Training files

PREFIX_args.args - arguments stored as a dictionary
PREFIX_res.json - results for every epoch
PREFIX_weights.p - weights of the best model
PREFIX_log.log - log file
PREFIX_name.name - name file to restore value of PREFIX

Inference files

PREFIX/ - directory containing sampled h5 files and eventually renders
PREFIX_cummMetrics.json - metrics estimated at inference which are reported in the paper

References

[1] - Ahuja, Chaitanya et al. "Style Transfer for Co-Speech Gesture Animation: A Multi-Speaker Conditional Mixture Approach" ECCV 2020.
[2] - Kucherenko, Taras, et al. "Gesticulator: A framework for semantically-aware speech-driven gesture generation." ICMI 2020.
[3] - Ginosar, Shiry, et al. "Learning individual styles of conversational gesture." CVPR 2019.

Other cool stuff

If you enjoyed this work, I would recommend the following projects which study different axes of nonverbal grounding,

Issues

All research has a tag of work in progress. If you find any issues with this code, feel free to raise issues or pull requests (even better) and I will get to it as soon as humanly possible.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
figs		figs
src		src
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AISLe

Overview

Clone

Set up Environment

Training

Inference

Inference for quantitative evaluation

Pre-trained models (UPDATE : March 17, 2021)

Rendering

Experiment Files

Training files

Inference files

References

Other cool stuff

Issues

About

Releases

Packages

Languages

License

chahuja/aisle

Folders and files

Latest commit

History

Repository files navigation

AISLe

Overview

Clone

Set up Environment

Training

Inference

Inference for quantitative evaluation

Pre-trained models (UPDATE : March 17, 2021)

Rendering

Experiment Files

Training files

Inference files

References

Other cool stuff

Issues

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages