ExVideo is a post-tuning technique aimed at enhancing the capability of video generation models. We have extended Stable Video Diffusion to achieve the generation of long videos up to 128 frames.
- Project Page
- Technical report
- Extended models
Generate a video using a text-to-image model and our image-to-video model. See ExVideo_svd_test.py.
github_title.mp4
- Step 1: Install additional packages
pip install lightning deepspeed
-
Step 2: Download base model (from HuggingFace or ModelScope) to
models/stable_video_diffusion/svd_xt.safetensors
. -
Step 3: Prepare datasets
path/to/your/dataset
├── metadata.json
└── videos
├── video_1.mp4
├── video_2.mp4
└── video_3.mp4
where the metadata.json
is
[
{
"path": "videos/video_1.mp4"
},
{
"path": "videos/video_2.mp4"
},
{
"path": "videos/video_3.mp4"
}
]
- Step 4: Run
CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7" python -u ExVideo_svd_train.py \
--pretrained_path "models/stable_video_diffusion/svd_xt.safetensors" \
--dataset_path "path/to/your/dataset" \
--output_path "path/to/save/models" \
--steps_per_epoch 8000 \
--num_frames 128 \
--height 512 \
--width 512 \
--dataloader_num_workers 2 \
--learning_rate 1e-5 \
--max_epochs 100
- Step 5: Post-process checkpoints
Calculate Exponential Moving Average (EMA) and package it using safetensors
.
python ExVideo_ema.py --output_path "path/to/save/models/lightning_logs/version_xx" --gamma 0.9
- Step 6: Enjoy your model
The EMA model is at path/to/save/models/lightning_logs/version_xx/checkpoints/epoch=xx-step=yyy-ema.safetensors
. Load it in ExVideo_svd_test.py and then enjoy your model.