ULIP-2: Towards Scalable Multimodal Pre-training For 3D Understanding (CVPR2024)

ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding (CVPR2023)

Official implementation of ULIP-2: Towards Scalable Multimodal Pre-training For 3D Understanding

Official implementation of ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding

Project Website

News

[06/17/2024] "ULIP-2: Towards Scalable Multimodal Pre-training For 3D Understanding the cvpr2024 version (latest) of the ensembled pre-trained model with 10k xyzrgb points is uploaded here" [02/26/2024] "ULIP-2: Towards Scalable Multimodal Pre-training For 3D Understanding is accepted to CVPR2024!"

[06/09/2023] "PointBERT ULIP-2 pretrained model released, please find it in the here".

[06/09/2023] A smaller version of "ULIP - ShapeNet Triplets" are released at here, it's around 420GB now. Check this image folder "only_rgb_depth_images", you can choose to download this subset of rendered images, which are the exact images leveraged by ULIP instead of downloading the full "rendered_images" folder (more than 1TB).

[05/22/2023] "ULIP - Objaverse Triplets" and "ULIP - ShapeNet Triplets" have been uploaded here.

[05/14/2023] ULIP-2 has been released!

[02/28/2023] ULIP has been accepted by CVPR 2023! 🔥🔥🔥

Animation

What is ULIP

ULIP is a Model-agnostic Multimodal Pre-training Framework, which can leverage information from other modalities (Images, Language) to improve the ability to understand 3D data without introducing any extra latency.

Pipeline

Instructions

ULIP is a highly extensible multimodal pre-training framework, and it's model-architecture agnostic, meaning you can easily plug in any 3D backbone models and pre-train it using our framework to get a jump-start for various downstreaming tasks!

[Install environments]

We pre-train ULIP on 8 Nvidia A100 GPUs, the code is tested with CUDA==11.0 and pytorch==1.10.1
conda create -n ulip python=3.7.15
conda activate ulip
conda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 cudatoolkit=11.3 -c pytorch -c conda-forge
pip install -r requirements.txt

[optional]
If you want to pre-train PointNeXt, we embed a modified PointNeXt codebase inside the ./models/pointnext, please do the following to install it:

cd ./models/pointnext/PointNeXt \
bash update.sh \
bash install.sh \

[Download datasets and initialize models, put them in the right paths.]

Download the used datasets and initialize models from here. For now, you ONLY need to download "initialize_models", "modelnet40_normal_resampled", and "shapenet-55". You might need a gmail account to access it.
After you download the datasets and initialize models, you can choose one of the following options:
(1) Put it in or do a soft link to the data folder, by default the data folder should have the following structure:

./data |
-- ModelNet40.yaml |
-- ShapeNet-55.yaml |
-- dataset_3d.py |
-- dataset_catalog.json |
-- initialize_models |
-- labels.json |
-- modelnet40_normal_resampled |
-- shapenet-55 |
-- templates.json

(2) Change the paths accordingly (optional to do if you don't want to put/link downloaded files in the data folder):

# Change the "DATA_PATH", "PC_PATH", "IMAGE_PATH"
./data/ShapeNet-55.yaml
# Change the "DATA_PATH"
./data/ModelNet40.yaml
# Change the initialize_models address
./models/ULIP_models.py
Modify this line "pretrain_slip_model = torch.load('./data/initialize_models/slip_base_100ep.pt', map_location=torch.device('cpu'))"

[Pre-train 3D backbones]

Our framework is model architecture agonistic, currently four 3D backbones are supported:
Pointnet2(ssg)
PointBERT
PointMLP
PointNeXt

Please change the script to accommodate your system accordingly, this script is used to pre-train on 8 gpus by default. You can also modify the desired output folder in the script.

# the scripts are named by its correspoinding 3D backbone name.
bash ./scripts/(choose your pre-train script)

[Test pre-trained models for zero-shot classification on ModelNet40]

You may also change the output path in the scripts as well.

bash ./scripts/(choose your test script) /path/to/your/checkpoint.pt

You may also change the output path in the scripts as well.

[Pre-train & Test using different number of points]

Change the npoints argument in the scripts, by default its 8192.
Note: Currently we use FPS to subsample the 8192 points, which might slow down the training speed. If you'd like, you can choose to cache or save the pre-processed datasets with different number of points to speed up your pre-training.

[Pre-train your customized 3D backbones]

There are only two things you need to change to pre-train your own customized 3D backbones:
(1) Define your own 3D backbone in ./models folder.
We put a template "customized_backbone" here, you can refer to the comments to see the expected input and output shapes. You can also refer to how pointnet2 is defined here.
(2) Use or modify this "ULIP_CUSTOMIZED" class in ./models/ULIP_models.py.
Please refer to the comments in "ULIP_CUSTOMIZED" class, it should be straightforward to follow, and please be sure to change the "pc_feat_dims" accordingly (since we are agnostic to the point cloud output feature dimensions of your customized 3D backbones).

Pre-trained models for zero-shot classification

ULIP-1 models, Zero-shot classification on ModelNet40, 8k points pre-train, 8k points test, best checkpoint:

model	top1	top5
Pointnet2(ssg)	57.7	78.9
PointMLP	60.0	79.4
PointBERT	60.3	84.0
PointNeXt	56.2	77.0

ULIP-2 models, pre-trained with 10k xyzrgb point clouds on the ensembled ULIP-Objaverse + ULIP-ShapeNet, check the google drive.

Note that, ULIP-2 models are improved in the CVPR2024 version compared to the initial arxiv release, please refer to the new one for matching the cvpr2024 version numbers.

model	Objaverse-top1	Objaverse-top5	Modelnet40-top1	Modelnet40-top5
openshape-pointbert-10kxyzrgb-ensembled-objavserse-shapenet-abo-3d_future	46.8	77.0	84.4	98.0
ULIP2-PointBERT-10kxyzrgb-ensembled-objaverse-shapenet	50.6	79.1	84.7	97.1

License and term of use for the released pre-train datasets

The code is under https://github.com/salesforce/ULIP/blob/main/LICENSE.txt.

The released "ULIP - Objaverse Triplets" is under https://opendatacommons.org/licenses/by/1-0/, consistent with Objaverse's license.

The released "ULIP - ShapeNet Triplets" is under the terms of use from https://shapenet.org/terms, consistent with ShapeNet's terms of use.

Citation

@inproceedings{xue2023ulip,
  title={Ulip: Learning a unified representation of language, images, and point clouds for 3d understanding},
  author={Xue, Le and Gao, Mingfei and Xing, Chen and Mart{\'\i}n-Mart{\'\i}n, Roberto and Wu, Jiajun and Xiong, Caiming and Xu, Ran and Niebles, Juan Carlos and Savarese, Silvio},
  booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition},
  pages={1179--1189},
  year={2023}
}
@inproceedings{xue2024ulip,
  title={Ulip-2: Towards scalable multimodal pre-training for 3d understanding},
  author={Xue, Le and Yu, Ning and Zhang, Shu and Panagopoulou, Artemis and Li, Junnan and Mart{\'\i}n-Mart{\'\i}n, Roberto and Wu, Jiajun and Xiong, Caiming and Xu, Ran and Niebles, Juan Carlos and others},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={27091--27101},
  year={2024}}


# Contact
If you have any question about this project, please contact [lxue@salesforce.com](lxue@salesforce.com)

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
assets		assets
data		data
models		models
scripts		scripts
utils		utils
AI_ETHICS.md		AI_ETHICS.md
CODEOWNERS		CODEOWNERS
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.txt		LICENSE.txt
README.md		README.md
SECURITY.md		SECURITY.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ULIP-2: Towards Scalable Multimodal Pre-training For 3D Understanding (CVPR2024)

ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding (CVPR2023)

News

Animation

What is ULIP

Pipeline

Instructions

[Install environments]

[Download datasets and initialize models, put them in the right paths.]

[Pre-train 3D backbones]

[Test pre-trained models for zero-shot classification on ModelNet40]

[Pre-train & Test using different number of points]

[Pre-train your customized 3D backbones]

Pre-trained models for zero-shot classification

License and term of use for the released pre-train datasets

Citation

About

Releases

Packages

Contributors 2

Languages

License

salesforce/ULIP

Folders and files

Latest commit

History

Repository files navigation

ULIP-2: Towards Scalable Multimodal Pre-training For 3D Understanding (CVPR2024)

ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding (CVPR2023)

News

Animation

What is ULIP

Pipeline

Instructions

[Install environments]

[Download datasets and initialize models, put them in the right paths.]

[Pre-train 3D backbones]

[Test pre-trained models for zero-shot classification on ModelNet40]

[Pre-train & Test using different number of points]

[Pre-train your customized 3D backbones]

Pre-trained models for zero-shot classification

License and term of use for the released pre-train datasets

Citation

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages