Skip to content
/ Senna Public

Bridging Large Vision-Language Models and End-to-End Autonomous Driving

License

Notifications You must be signed in to change notification settings

hustvl/Senna

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving

Bo Jiang1, Shaoyu Chen1, Bencheng Liao1, Xingyu Zhang2, Wei Yin2, Qian Zhang2, Chang Huang2, Wenyu Liu1, Xinggang Wang1,📧

1 Huazhong University of Science and Technology, 2 Horizon Robotics, 📧 corresponding author

arxiv paper 🤗 HuggingFace models

senna_demo.mp4

News

[2024-12-08]: We have released the code and weight of Senna-VLM, along with the training and evaluation scripts.

[2024-10-29]: Senna arXiv paper released. Code/Models are coming soon. Please stay tuned! ☕️

Highlights

  • Senna is an autonomous driving system that integrates a Large Vision-Language Model with an end-to-end model to improve planning safety, robustness and generalization.

  • Senna achieves SOTA planning performance and demonstrates strong cross-scenario generalization and transferability.

Getting Started

Installtion

git clone git@github.com:hustvl/Senna.git
conda create -n senna python=3.10 -y
conda activate senna
pip install -r requirements.txt

Data Preparation

We provide a script for generating QA data required for Senna training. The script uses LLaVA-v1.6-34b as the model for generating scene descriptions and planning explanations. You can run the script as follows:

sh data_tools/senna_nusc_converter.sh

Weights

Method Model Size Base LLM Input View Token per Image Download
Senna 7B vicuna-7b-v1.5 6 View 128 Hugging Face

Training

For Stage-1 Mix Pre-training:

sh train_tools/pretrain_senna_llava.sh

For Stage-2 Driving Fine-tuning and Stage-3 Planning Fine-tuning (full-parameter fine-tuning):

sh train_tools/train_senna_llava.sh

For Stage-2 Driving Fine-tuning and Stage-3 Planning Fine-tuning (LoRA fine-tuning):

sh train_tools/train_senna_llava_lora.sh

In our experiments, we observed that full-parameter fine-tuning outperforms LoRA fine-tuning. Therefore, we recommend using full-parameter fine-tuning. However, if your machine has limited GPU memory (e.g., only 24GB), you may consider using LoRA fine-tuning as an alternative.

Evaluation

You can evaluate the accuracy of Senna meta-action planning using the script below.

sh eval_tools/senna_plan_cmd_eval_multi_img.sh

Visualization

By running the visualization script below, you can overlay the predicted meta-actions and front-view scene descriptions onto the front-view image and save the results to the specified path.

sh eval_tools/senna_plan_visualization.sh

Qualitative Results

Acknowledgments

LLaVA, the codebase we built upon, we sincerely thank the contributors for their great work!

Citation

If you find Senna useful in your research or applications, please consider giving us a star 🌟 and citing it by the following BibTeX entry.

@article{jiang2024senna,
      title={Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving}, 
      author={Bo Jiang and Shaoyu Chen and Bencheng Liao and Xingyu Zhang and Wei Yin and Qian Zhang and Chang Huang and Wenyu Liu and Xinggang Wang},
      year={2024},
      eprint={2410.22313},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2410.22313}, 
}

Related Projects

VAD & VADv2, MapTR

About

Bridging Large Vision-Language Models and End-to-End Autonomous Driving

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published