Bo Peng, Xinyuan Chen, Yaohui Wang, Chaochao Lu, Yu Qiao
This is the official PyTorch implementation of paper "ConditionVideo: Training-Free Condition-Guided Text-to-Video Generation"
Our model generates realistic dynamic videos from random noise or given scene videos based on given conditions. Currently, we support openpose keypoint, canny, depth and segment condition.
canny | segment | depth |
---|---|---|
A dog, comicbook style |
A red jellyfish, pastel colours. |
A horse under a blue sky. |
pose | customized pose |
---|---|
The Astronaut, brown background |
Ironman in the sea |
To install the environments, use:
conda create -n tune-control python=3.10
check cuda version then install the corresponding pytorch package, note that we need pytorch==2.0.0
pip install -r requirements.txt
conda install xformers -c xformers
You may also need to download model checkpoints manually from hugging-face.
To run the code, use
accelerate launch --num_processes 1 conditionvideo.py --config="configs//config.yaml"
for video generation, change the configuration in config.yaml
for different generation settings.
@misc{peng2023conditionvideo,
title={ConditionVideo: Training-Free Condition-Guided Text-to-Video Generation},
author={Bo Peng and Xinyuan Chen and Yaohui Wang and Chaochao Lu and Yu Qiao},
year={2023},
eprint={2310.07697},
archivePrefix={arXiv},
primaryClass={cs.CV}
}