Baorui Ma*, Huachen Gao*, Haoge Deng*, Zhengxiong Luo, Tiejun Huang, Lulu Tang†, Xinlong Wang†
Beijing Academy of Artificial Intelligence, BAAI
* Equal Contribution, † Corresponding Author
Benefiting from the proposed web-scale dataset WebVi3D, See3D enables both object- and scene-level 3D creation, including sparse-view-to-3D, (text-) image-to-3D, and 3D editing. It can also be used for Gaussian Splatting to extract meshes or render images.
- We present See3D, a scalable visual-conditional MVD model for open-world 3D creation, which can be trained on web-scale video collections without pose annotations.
- We curate WebVi3D, a multi-view images dataset containing static scenes with sufficient multi-view observations, and establish an automated pipeline for video data curation to train the MVD model.
- We introduce a novel warping-based 3D generation framework with See3D, which supports long-sequence generation with complex camera trajectories.
- We achieve state-of-the-art results in single and sparse views reconstruction, demonstrating remarkable zero-shot and open-world generation capability, offering a novel perspective on scalable 3D generation.
[12/13/2024] We have released the pretrained models and example test data in Huggingface🤗.
[12/10/2024] We have released the pretrained models and inference code. You can download models and example test data here
git clone https://github.com/baaivision/See3D.git
cd See3D
pip install -r requirements.txt
We provide inference code for multi-view generation based on single-view and sparse-view inputs. Please add or remove the --super_resolution
parameter according to your needs. The multi-view super-resolution model will upscale the default 512 resolution to a consistent 1024 resolution across multiple views, which requires more inference time and GPU memory. Please download the example test data here and put it in the dataset
folder.
bash single_infer.sh
bash sparse_infer.sh
- Release pretrained models.
- Release inference code.
- Release training scripts.
- Release data curation pipeline from Internet Video.
- Release 3D generation framework utilizing the warping-based pipeline.
- Release the evaluation code.
See3D is built using the awesome open-source projects: Stable Diffusion, MVDream, ViewCrafter, FrozenRecon
Thanks to the maintainers of these projects for their contribution to the community!
If you find See3D helpful, please consider citing:
@inproceedings{Ma2024See3D,
title = {You See it, You Got it: Learning 3D Creation on Pose-Free Videos at Scale},
author = {Baorui Ma and Huachen Gao and Haoge Deng and Zhengxiong Luo and Tiejun Huang and Lulu Tang and Xinlong Wang},
journal={arXiv preprint arXiv:2412.06699},
year={2024}
}