Skip to content
/ CoMo Public

ECCV 2024: Controllable Motion Generation through Language Guided Pose Code Editing

Notifications You must be signed in to change notification settings

yh2371/CoMo

Repository files navigation

CoMo: Controllable Motion Generation
through Language Guided Pose Code Editing

ECCV 2024

ToDos

  • Release motion editing demo.

  • Release finegrained keyword data.

  • Release training/evaluation code.

Installation

To get started, clone this project, then setup the required dependencies using the following commands:

conda env create -f environment.yml
conda activate como
bash dataset/prepare/download_glove.sh
bash dataset/prepare/download_extractor.sh

The code was tested on Ubuntu 22.04.4 LTS.

Data

Motion-Language Data

For the HumanML3D and KIT-ML datasets, please follow find the instructions for downloading and preprocessing [here].

The resulting file directory should look like this:

./dataset/[dataset_name]/
├── new_joint_vecs/
├── new_joints/
├── texts/
├── Mean.npy 
├── Std.npy 
├── train.txt
├── val.txt
├── test.txt
├── train_val.txt
└── all.txt

Fine-grained Descriptions

We prompt GPT-4 to obtain fine-grained keywords that describe the motion of different body parts. The collected keywords and corresponding CLIP embeddings can be downloaded using the following commands:

bash dataset/prepare/download_keywords.sh

The keywords and keyword embeddings will be stored in the ./keywords and ./keyword_embeddings sub-folders, respectively, for each dataset ./dataset/[dataset_name]/. The training/evaluation code directly loads keyword embeddings. The original text is stored in dictionaries and can be read as follows:

text = np.load("./dataset/[dataset_name]/keywords/[file_id].npy", allow_pickle = True).item()

Pose Codes

We adapt [PoseScript] to parse poses into pose codes, the parsed codes will be stored in the ./codes sub-folder for each dataset ./dataset/[dataset_name]/:

bash dataset/prepare/parse_motion.sh

Although we chose to obtain pose codes through heuristic skeleton parsing throughout our framework, it is also possible to train an encoder module using the parsed pose codes as latent supervision to encode motion sequences into pose code sequences. We include the checkpoint and training details for this encoder in the sections below.

Pre-trained Models

The pretrained model checkpoints will be stored in the ./pretrained folder:

bash dataset/prepare/download_model.sh

Training

Motion Decoder

python train_dec.py \
--batch-size 256 \
--lr 1e-4 \
--total-iter 300000 \
--lr-scheduler 200000 \
--nb-code 392 \
--down-t 2 \
--depth 3 \
--dilation-growth-rate 3 \
--out-dir output \
--dataname t2m \
--vq-act relu \
--loss-vel 0.5 \
--recons-loss l1_smooth \
--exp-name Dec \
--output-emb-width 392

[Optional] Motion Encoder

python train_enc.py \
--batch-size 256 \
--lr 1e-4 \
--total-iter 300000 \
--lr-scheduler 200000 \
--nb-code 392 \
--down-t 2 \
--depth 3 \
--dilation-growth-rate 3 \
--out-dir output \
--dataname t2m \
--vq-act relu \
--loss-vel 0.5 \
--recons-loss l1_smooth \
--exp-name Enc \
--output-emb-width 392 \
--resume-pth ./pretrained/t2m/Dec/model.pth

Motion Generator

python train_t2m.py \
--exp-name Trans \
--batch-size 64 \
--num-layers 9 \ 
--nb-code 392 \
--n-head-gpt 16 \ 
--block-size 62 \
--ff-rate 4 \ 
--out-dir output \
--total-iter 300000 \
--lr-scheduler 150000 \
--lr 0.0001 \
--dataname t2m \
--down-t 2 \
--depth 3 \
--eval-iter 10000 \
--pkeep 0.5 \
--dilation-growth-rate 3 \
--output-emb-width 392 \
--resume-pth ./pretrained/t2m/Dec/model.pth 

Evaluation

Motion Decoder

python eval_dec.py \
--batch-size 256 \
--lr 2e-4 \
--total-iter 300000 \
--lr-scheduler 200000 \
--nb-code 392 \
--down-t 2 \
--depth 3 \
--dilation-growth-rate 3 \
--out-dir output \
--dataname t2m \
--vq-act relu \
--loss-vel 0.5 \
--recons-loss l1_smooth \
--exp-name TEST_Dec \
--resume-pth ./pretrained/t2m/Dec/model.pth \
--output-emb-width 392

Motion Generator

python eval_t2m.py  \
--exp-name TEST_Trans \
--batch-size 256 \
--num-layers 9 \
--embed-dim-gpt 1024 \
--nb-code 392 \
--n-head-gpt 16 \
--block-size 62 \
--ff-rate 4 \
--drop-out-rate 0.1 \
--resume-pth ./pretrained/t2m/Dec/model.pth \
--vq-name VQVAE \
--out-dir output \
--total-iter 300000 \
--lr-scheduler 150000 \
--lr 0.0001 \
--dtaname t2m \
--down-t 2 \
--depth 3 \
--eval-iter 10000 \
--pkeep 0.5 \
--dilation-growth-rate 3 \
--vq-act relu \
--output-emb-width 392 \
--resume-trans ./pretrained/t2m/Trans/model.pth

BibTeX

If you find our work helpful or use our code, please consider citing:

@misc{huang2024como,
      title={CoMo: Controllable Motion Generation through Language Guided Pose Code Editing}, 
      author={Yiming Huang and Weilin Wan and Yue Yang and Chris Callison-Burch and Mark Yatskar and Lingjie Liu},
      year={2024},
      eprint={2403.13900},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Acknowledgement

We would like to thank the following contributors whose amazing work our code is based on:

text-to-motion, MDM, T2M-GPT, PoseScript

About

ECCV 2024: Controllable Motion Generation through Language Guided Pose Code Editing

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published