Name		Name	Last commit message	Last commit date
parent directory ..
assert		assert
configs		configs
example		example
models		models
utils		utils
README.md		README.md
conversation.py		conversation.py
demo.py		demo.py
requirements.txt		requirements.txt

README.md

🦜 VideoChat [paper/demo]

In this study, we initiate an exploration into video understanding by introducing VideoChat, an end-to-end chat-centric video understanding system. It integrates video foundation models and large language models via a learnable neural interface, excelling in spatiotemporal reasoning, event localization, and causal relationship inference. To instructively tune this system, we propose a video-centric instruction dataset, composed of thousands of videos matched with detailed descriptions and conversations. This dataset emphasizes spatiotemporal reasoning and causal relationships, providing a valuable asset for training chat-centric video understanding systems. Preliminary qualitative experiments reveal our system’s potential across a broad spectrum of video applications and set the standard for future research.

🔥 Updates

2023/05/11: Release the 🦜VideoChat V1, which can handle both image and video understanding!
- 🎊 Model and Data.
- 🤗 Online Demo
- 🧑‍🔧 Tuning scripts are cleaning.

⏳ Schedule

Small-scale video instuction data and tuning
Instruction tuning on BLIP+UniFormerV2+Vicuna
Large-scale and complex video instuction data
Instruction tuning on strong video foundation model
User-friendly interactions with longer videos
...

💬 Example Online🦜

Comparison with ChatGPT, MiniGPT-4, LLaVA and mPLUG-Owl.
Our VideoChat can handle both image and video understanding well!

[Video] Why the video is funny?

[Video] Spatial perception

[Video] Temporal perception

[Video] Multi-turn conversation

Image understanding

🏃 Usage

Prepare the envirment.
```
pip install -r requirements.txt
```
Download BLIP2 model:
- ViT: wget https://storage.googleapis.com/sfr-vision-language-research/LAVIS/models/BLIP2/eva_vit_g.pth
- QFormer: wget https://storage.googleapis.com/sfr-vision-language-research/LAVIS/models/BLIP2/blip2_pretrained_flant5xxl.pth
- Change the vit_model_path and q_former_model_path in config.json.

Download StabelVicuna model:

LLAMA: Download it from the original repo or hugging face.
If you download LLAMA from the original repo, please process it via the following command:

# convert_llama_weights_to_hf is copied from transformers
python src/transformers/models/llama/convert_llama_weights_to_hf.py \
  --input_dir /path/to/downloaded/llama/weights \
  --model_size 7B --output_dir /output/path

Download StableVicuna-13b-deelta and process it:

# fastchat v0.1.10
python3 apply_delta.py \
  --base /path/to/model_weights/llama-13b \
  --target stable-vicuna-13b \
  --delta CarperAI/stable-vicuna-13b-delta

Change the llama_model_path in config.json.

Download VideoChat model:
- Change the videochat_model_path in config.json.
Running demo with Gradio:
```
python demo.py
```
Another demo on Jupyter Notebook can found in demo.ipynb

📄 Citation

If you find this project useful in your research, please consider cite:

@article{2023videochat,
  title={VideoChat: Chat-Centric Video Understanding},
  author={KunChang Li, Yinan He, Yi Wang, Yizhuo Li, Wenhai Wang, Ping Luo, Yali Wang, Limin Wang, and Yu Qiao},
  journal={arXiv preprint arXiv:2305.06355},
  year={2023}
}

👍 Acknowledgement

Thanks to the open source of the following projects:

InternVideo, UniFormerV2, MiniGPT-4, LLaVA, BLIP2, StableLM.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

video_chat

video_chat

README.md

🦜 VideoChat [paper/demo]

🔥 Updates

⏳ Schedule

💬 Example Online🦜

🏃 Usage

📄 Citation

👍 Acknowledgement

Files

video_chat

Directory actions

More options

Directory actions

More options

Latest commit

History

video_chat

Folders and files

parent directory

README.md

🦜 VideoChat [paper/demo]

🔥 Updates

⏳ Schedule

💬 Example Online🦜

🏃 Usage

📄 Citation

👍 Acknowledgement