Skip to content

[NeurIPS 2024 D&B Spotlight🔥] ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation

License

Notifications You must be signed in to change notification settings

PKU-YuanGroup/ChronoMagic-Bench

Repository files navigation

If you like our project, please give us a star ⭐ on GitHub for the latest update.

hf_space hf_space arXiv Home Page Dataset Dataset Dataset Download zhihu zhihu License GitHub Repo stars

This repository is the official implementation of ChronoMagic-Bench, a benchmark for metamorphic evaluation of text-to-time-lapse video generation. The key insight is to evaluate the capabilities of Text-to-Video Generation Models in physics, biology, and chemistry by enabling the generation of time-lapse videos, which are characterized by rich physics priors, through a free-form text prompt.

💡 We also have other video generation project that may interest you ✨.

Open-Sora-Plan
PKU-Yuan Lab and Tuzhan AI etc.
github github

MagicTime
Shenghai Yuan, Jinfa Huang and Yujun Shi etc.
github github

📣 News

  • ⏳⏳⏳ Evaluate more Text-to-Video Generation Models via ChronoMagic-Bench.
  • [2024.09.30] 🔥 We have updated the calculation of the CHScore, making it more robust to temporally coherent disappearance of points. You can click here for detailed implementation.
  • [2024.09.26] ✨ Our paper is accepted by NeurIPS 2024 D&B track as a spotlight present.
  • [2024.08.13] 🔥 We further evaluate EasyAnimate-V3 and CogVideoX-2B. The results are available here.
  • [2024.06.30] 🔥 We release the code of the "Multi-Aspect Data Preprocessing", which is used to process the ChronoMagic-Pro dataset. Please click here and here to see more details.
  • [2024.06.29] 🔥 Support evaluating customized Text-to-Video models. The code and instructions are available in this repo.
  • [2024.06.28] 🔥 We release the ChronoMagic-Pro and ChronoMagic-ProH datasets. The datasets include 460K and 150K time-lapse video-text pairs respectively and can be downloaded at HF-Dataset-Pro and HF-Dataset-ProH.
  • [2024.06.27] 🔥 We release the arXiv paper and Leaderboard for ChronoMagic-Bench, and you can click here to read the paper and here to see the leaderboard.
  • [2024.06.26] 🔥 We release the testing prompts, reference videos and generated results by different models in ChronoMagic-Bench, and you can click here to see more details.
  • [2024.06.25] 🔥 All codes & datasets are coming soon! Stay tuned 👀!

😮 Highlights

ChronoMagic-Bench can reflect the physical prior capacity of Text-to-Video Generation Model.

Resources

  • ChronoMagic-Bench: including 1649 time-lapse video-text pairs. (captioned by GPT-4o)
  • ChronoMagic-Bench-150: including 150 time-lapse video-text pairs. (captioned by GPT-4o)
  • ChronoMagic: including 2265 time-lapse video-text pairs. (captioned by GPT-4V)
  • ChronoMagic-Pro: including 460K time-lapse video-text pairs. (captioned by ShareGPT4Video)
  • ChronoMagic-ProH: including 150K time-lapse video-text pairs. (captioned by ShareGPT4Video)

📣 Overview

In contrast to existing benchmarks, ChronoMagic-Bench emphasizes generating videos with high persistence and strong variation, i.e., metamorphic time-lapse videos with high physical prior content.

Backbone Type Visual Quality Text Relevance Metamorphic Amplitude Temporal Coherence
UCF-101 General ✔️ ✔️
Make-a-Video-Eval General ✔️ ✔️
MSR-VTT General ✔️ ✔️
FETV General ✔️ ✔️ ✔️
VBench General ✔️ ✔️ ✔️
T2VScore General ✔️ ✔️
ChronoMagic-Bench Time-lapse ✔️ ✔️ ✔️ ✔️

We specifically design four major categories for time-lapse videos (as shown below), including biological, human-created, meteorological, and physical videos, and extend these to 75 subcategories. Based on this, we construct ChronoMagic-Bench, comprising 1,649 prompts and their corresponding reference time-lapse videos.

Biological Human Created Meteorological Physical
Biological Human Created Meteorological Physical
"Time-lapse of microgreens germinating and growing ..." "Time-lapse of a modern house being constructed in ..." "Time-lapse of a beach sunset capturing the sun's ..." "Time-lapse of an ice cube melting on a solid ..."
Biological Human Created Meteorological Physical
"Time-lapse of microgreens germinating and growing ..." "Time-lapse of a 3D printing process: starting with ..." "Time-lapse of a solar eclipse showing the moon's ..." "Time-lapse of a cake baking in an oven, depicting ..."
Biological Human Created Meteorological Physical
"Time-lapse of a butterfly metamorphosis from ..." "Time-lapse of a busy nighttime city intersection ..." "Time-lapse of a landscape transitioning from a ..." "Time-lapse of a strawberry rotting: starting with ..."

🎓 Evaluation Results

We visualize the evaluation results of various open-source and closed-source T2V generation models across ChronoMagic-Bench.

🏆 Leaderboard

See numeric values at our Leaderboard 🥇🥈🥉

or you can run it locally:

cd LeadBoard
python app.py

⚙️ Requirements and Installation

We recommend the requirements as follows.

Environment

git clone --depth=1 https://github.com/PKU-YuanGroup/ChronoMagic-Bench.git
cd ChronoMagic-Bench
conda create -n chronomagic python=3.10
conda activate chronomagic

# install base packages
pip install -r requirements.txt

# install flash-attn
git clone https://github.com/Dao-AILab/flash-attention.git
cd flash-attention/csrc/layer_norm && pip install .
cd ../../../
rm -r flash-attention

Download Checkpoints

huggingface-cli download --repo-type model \
BestWishYsh/ChronoMagic-Bench \
--local-dir BestWishYsh/ChronoMagic-Bench

📑 Benchmark Prompts

We provide evaluation prompt lists of ChronoMagic-Bench here or here. You can use this to sample videos for evaluation of your model. We also provide the reference videos for the corresponding evaluation prompts here.

🔨 Usage

Use ChronoMagic-Bench to evaluate videos, and video generative models.

Prepare Videos for Evaluation

The generated videos should be named corresponding to the prompt ID in ChronoMagic-Bench and placed in the evaluation folder, which is structured as follows. We also provide input examples in the 'toy_video' .

# for open-source models
`-- input_video_folder
    `-- model_name_a
        |-- 1
        |   |-- 3d_printing_08.mp4
        |   `-- ...
        |-- 2
        |   |-- 3d_printing_08.mp4
        |   `-- ...
        `-- 3
            |-- 3d_printing_08.mp4
            `-- ...
    `-- model_name_b
        |-- 1
        |   |-- 3d_printing_08.mp4
        |   `-- ...
        |-- 2
        |   |-- 3d_printing_08.mp4
        |   `-- ...
        `-- 3
            |-- 3d_printing_08.mp4
            `-- ...
            
# for close-source models
-- input_video_folder
    |-- model_name_a
    |   |-- 3d_printing_08.mp4
    |   `-- animal_04.mp4
    |   `-- ...
    |-- model_name_b
    |   |-- 3d_printing_08.mp4
    |   `-- ...
    `-- ...

The filenames of all videos to be evaluated should be "videoid.mp4". For example, if the videoid is 3d_printing_08, the video filename should be "3d_printing_08.mp4". If this naming convention is not followed, the text relevance cannot be evaluated.

Get MTScore, CHScore and GPT4o-MTScore

We provide output examples in the 'results'. You can run the following commands for testing, then modify the relevant parameters (such as model_names, input_folder, model_pth and openai_api) to suit the text-to-video (T2V) generation model you want to evaluate.

python evaluate.py \
  --eval_type "open" \
  --model_names test \
  # or more than one model
  # --model_names name1 name2  \
  --input_folder toy_video \
  --output_folder results \
  --video_frames_folder video_frames_folder_temp \
  --model_pth_CHScore cotracker2.pth \
  --model_pth_MTScore InternVideo2-stage2_1b-224p-f4.pt \
  --num_workers 8 \
  --openai_api "sk-UybXXX" \

If you only want to evaluate any one of the metrics instead of calculating all of them, you can follow the step below. Before running, please modify the parameters in 'xxx.sh' as needed. (If you want to obtain the JSON to submit to the leaderboard, you can organize the output files in MTScore / CHScore / GPT4o-MTScore according to 'results' and then proceed with the following steps.)

# for MTScore
cd MTScore
bash get_chscore.sh

# for CHScore
cd CHScore
bash get_mtscore.sh

# for GPT4o-MTScore
cd GPT4o_MTScore
bash get_gp4omtscore.sh

Get UMT-FVD and UMTScore

Please refer to the folder UMT for how to compute the UMTScore.

Get File and Submit to Leaderboard

python get_uploaded_json.py \
  --input_path results/all \
  --output_path results

After completing the above steps, you will obtain ChronoMagic-Bench-Input.json, and then you need to manually fill the JSON with UMT-FVD and UMTScore (as we calculate them separately). Finally, you can submit the JSON to HuggingFace.

🏄 Sampled Videos

Dataset Download

To facilitate future research and to ensure full transparency, we release all the videos we sampled and used for ChronoMagic-Bench evaluation. You can download them on Hugging Face. We also provide detailed explanations of the sampled videos and detailed setting for the models under evaluation here.

🐳 ChronoMagicPro Dataset

ChronoMagic-Pro with 460K time-lapse videos, each accompanied by a detailed caption. We also released the 150K subset (ChronoMagic-ProH), which is a higher quality subset. All the dataset can be downloaded at here and here, or you can download it with the following command. Some samples can be found on our Project Page.

huggingface-cli download --repo-type dataset \
--resume-download BestWishYsh/ChronoMagic-Pro \  # or BestWishYsh/ChronoMagic-ProH
--local-dir BestWishYsh/ChronoMagic-Pro \  # or BestWishYsh/ChronoMagic-ProH
--local-dir-use-symlinks False

Please refer to the folder Multi-Aspect_Preprocessing for how ChronoMagic-Pro to process this data.

👍 Acknowledgement

🔒 License

  • The majority of this project is released under the Apache 2.0 license as found in the LICENSE file.
  • The service is a research preview. Please contact us if you find any potential violations. (shyuan-cs@hotmail.com)

✏️ Citation

If you find our paper and code useful in your research, please consider giving a star ⭐ and citation 📝.

@article{yuan2024chronomagic,
  title={ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation},
  author={Yuan, Shenghai and Huang, Jinfa and Xu, Yongqi and Liu, Yaoyang and Zhang, Shaofeng and Shi, Yujun and Zhu, Ruijie and Cheng, Xinhua and Luo, Jiebo and Yuan, Li},
  journal={arXiv preprint arXiv:2406.18522},
  year={2024}
}

🤝 Contributors