Unsupervised: without action supervision
Generative: We use a generrative model
Disentangled: We wish to extract the action embedding only, which is object-agnostic.
Our framework can be used in several modes. In the motion transfer mode, a static image will be animated using a driving video. In the image-to-video translation mode, given a static image, the framework will predict future frames.
We support python3
. To install the dependencies run:
pip3 install -r requirements.txt
There are several configuration (config/dataset_name.yaml
) files one for each dataset
. See config/actions.yaml
to get description of each parameter.
To train a model on specific dataset run:
CUDA_VISIBLE_DEVICES=0 python3 run.py --config config/dataset_name.yaml
The code will create a folder in the log directory (each run will create a time-stamped new directory).
Checkpoints will be saved to this folder.
To check the loss values during training in see log.txt
.
You can also check training data reconstructions in the train-vis
subfolder.
In order to perform image-to-video translation run:
CUDA_VISIBLE_DEVICES=0 python3 run.py --config config/dataset_name.yaml --mode prediction --checkpoint path/to/checkpoint
-
Shapes. This dataset is saved along with repository. Training takes about 1 hour.
-
Actions. This dataset is also saved along with repository. And training takes about 4 hours.
[1]Animating Arbitrary Objects via Deep Motion Transfer by Aliaksandr Siarohin, Stéphane Lathuilière, Sergey Tulyakov, Elisa Ricci and Nicu Sebe.