OmniSeg3D: Omniversal 3D Segmentation via Hierarchical Contrastive Learning
Haiyang Ying1, Yixuan Yin1, Jinzhi Zhang1, Fan Wang2, Tao Yu1, Ruqi Huang1, Lu Fang1
1Tsinghua Univeristy 2Alibaba Group.
OmniSeg3D is a framework for multi-object, category-agnostic, and hierarchical segmentation in 3D, the original implementation is based on InstantNGP.
However, OmniSeg3D is not restricted by specific 3D representation. In this repo, we present a guassian-splatting based OmniSeg3D, which enjoys interactive 3D segmentation in real-time. The segmented objects can be saved as .ply format for further visualization and manipulation.
We follow the original environment setting of 3D Guassian-Splatting (SIGGRAPH 2023).
conda create -n gaussian_grouping python=3.8 -y
conda activate gaussian_grouping
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch
pip install plyfile==0.8.1
pip install tqdm scipy wandb opencv-python scikit-learn lpips
pip install submodules/diff-gaussian-rasterization
pip install submodules/simple-knn
Install SAM
for 2D segmentation:
git clone https://github.com/facebookresearch/segment-anything.git
cd segment-anything
pip install -e .
mkdir sam_ckpt; cd sam_ckpt
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
We typically support data prepared as COLMAP format. For more details, please refer to the guidance in our NeRF-based implementation of OmniSeg3D.
Run the sam model to get the hierarchical representation files.
python run_sam.py --ckpt_path {SAM_CKPT_PATH} --file_path {IMAGE_FOLDER}
After running, you will get three folders sam
, masks
, patches
:
sam
: stores the hierarchical representation as ".npz" filesmasks
andpatches
: used for visualization or masks quaility evaluation, not needed during training.
Ideal masks
should include object-level masks and patches
should contain part-level masks. We basically use the default parameter setting for SAM, but you can tune the parameters for customized datasets.
We train our models on a sinle NVIDIA RTX 3090 Ti GPU (24GB). Smaller scenes may require less memory. Typically, inference requires less than 8GB memory. We utilize a two-stage training strategy. See script/train_omni_360.sh as an example.
dataname=counter
gpu=1
data_path=root_path/to/the/data/folder/of/counter.
# --- Training Gaussian (Color and Density) --- #
CUDA_VISIBLE_DEVICES=${gpu} python train.py \
-s ${data_path} \
--images images_4 \
-r 1 -m output/360_${dataname}_omni_1/rgb \
--config_file config/gaussian_dataset/train_rgb.json \
--object_path sam \
--ip 127.0.0.2
# --- Training Semantic Feature Field --- #
CUDA_VISIBLE_DEVICES=${gpu} python train.py \
-s ${data_path} \
--images images_4 \
-r 1 \
-m output/360_${dataname}_omni_1/sem_hi \
--config_file config/gaussian_dataset/train_sem.json \
--object_path sam \
--start_checkpoint output/360_${dataname}_omni_1/rgb/chkpnt10000.pth \
--ip 127.0.0.2
# --- Render Views for Visualization --- #
CUDA_VISIBLE_DEVICES=${gpu} python render_omni.py \
-m output/360_${dataname}_omni_1/sem_hi \
--num_classes 256 \
--images images_4
After specifying the custom information, you can run the file by execute at the root folder:
bash script/train_omni_360.sh
Modify the path of the trained point cloud. Then run render_omni_gui.py
.
mode option
: RGB, score map, and semantic map (you can visualize the consistent global semantic feature).click mode
: select object of interestmulti-click mode
: select multiple points or objectsbinary threshold
: show binarized 2D images with the thresholdsegment3d
: segment the scene with the current threshold (saved .ply file can be found at the root dir)reload
: reload the whole scenefile selector
: load another scene (point cloud)
left drag
: rotatemid drag
: panright click
: choose point/objects
Thanks for the following project for their valuable contributions:
If you find this project helpful for your research, please consider citing the report and giving a ⭐.
@article{ying2023omniseg3d,
title={OmniSeg3D: Omniversal 3D Segmentation via Hierarchical Contrastive Learning},
author={Ying, Haiyang and Yin, Yixuan and Zhang, Jinzhi and Wang, Fan and Yu, Tao and Huang, Ruqi and Fang, Lu},
journal={arXiv preprint arXiv:2311.11666},
year={2023}
}