This is the official implementation for our CVPR'24 highlight paper "GraCo: Granularity-Controllable Interactive Segmentation".
Current IS pipelines fall into two categories: single-granularity output and multi-granularity output. The latter aims to alleviate the spatial ambiguity present in the former. However, the multi-granularity output pipeline suffers from limited interaction flexibility and produces redundant results. We introduce Granularity-Controllable Interactive Segmentation (GraCo), a novel approach that allows precise control of prediction granularity by introducing additional parameters to input. This enhances the customization of the interactive system and eliminates redundancy while resolving ambiguity. Nevertheless, the exorbitant cost of annotating multi-granularity masks and the lack of available datasets with granularity annotations make it difficult for models to acquire the necessary guidance to control output granularity. To address this problem, we design an any-granularity mask generator that exploits the semantic property of the pre-trained IS model to automatically generate abundant mask-granularity pairs without requiring additional manual annotation. Based on these pairs, we propose a granularity-controllable learning strategy that efficiently imparts the granularity controllability to the IS model.
- Install torch
# Install torch (according to your own cuda version, take 11.8 as an example)
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118
- Install other dependencies
# Install other dependencies
pip install -r requirements.txt
# running on cpu
python demo.py --checkpoint path/to/weights/sbd_vit_base.pth --lora_checkpoint path/to/checkpoints/last_checkpoint.pth --cpu
# running on gpu
python demo.py --checkpoint path/to/weights/sbd_vit_base.pth --lora_checkpoint path/to/checkpoints/last_checkpoint.pth --gpu
python any_granularity_generator.py --checkpoint weights/simpleclick/sbd_vit_base.pth \
--save-path part_output --save-name proposal.pkl --dataset-path /path/to/datasets/SBD/dataset
- Download pre-trained weights
- Train
CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py models/plainvit_base448_graco.py --load_gra \
--part_path part_output/proposals.pkl --enable_lora \
--weights weights/simpleclick/sbd_vit_base.pth \
--gpus 0,1,2,3
- Evaluation on object-level benchmarks
python evaluate.py NoBRS --datasets GrabCut,Berkeley,DAVIS,SBD \
--checkpoint weights/simpleclick/sbd_vit_base.pth \
--lora_checkpoint path/to/checkpoints/last_checkpoint.pth
- Evaluation on PartImageNet, SA-1B
python evaluate.py NoBRS --datasets PartImageNet,SA-1B \
--checkpoint weights/simpleclick/sbd_vit_base.pth \
--lora_checkpoint path/to/checkpoints/last_checkpoint.pth
- Evaluation on PascalPart (five categories)
for c in "sheep" "cat" "dog" "cow" "aeroplane" "bus";
do
python evaluate.py NoBRS --datasets PascalPart \
--checkpoint weights/simpleclick/sbd_vit_base.pth \
--lora_checkpoint path/to/checkpoints/last_checkpoint.pth --class-name $c;
done
- Download SAM
python evaluate.py NoBRS --datasets GrabCut,Berkeley,DAVIS,SBD,PartImageNet,SA-1B \
--checkpoint weights/sam/sam_vit_b_01ec64.pth \
--sam-model vit_b --sam-type SAM --oracle
for c in "sheep" "cat" "dog" "cow" "aeroplane" "bus";
do
python evaluate.py NoBRS --datasets PascalPart \
--checkpoint weights/sam/sam_vit_b_01ec64.pth \
--sam-model vit_b --sam-type SAM --oracle --class-name $c;
done
This repository is built upon SimpleClick. The project page is built using the template of Nerfies. Thank the authors of these open source repositories for their efforts. And thank the ACs and reviewers for their effort when dealing with our paper.
If you find this repository helpful, please consider citing our paper.
@inproceedings{zhao2024graco,
title={GraCo: Granularity-Controllable Interactive Segmentation},
author={Zhao, Yian and Li, Kehan and Cheng, Zesen and Qiao, Pengchong and Zheng, Xiawu and Ji, Rongrong and Liu, Chang and Yuan, Li and Chen, Jie},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={3501--3510},
year={2024}
}