Mingxuan Liu · Tyler L. Hayes · Elisa Ricci · Gabriela Csurka · Riccardo Volpi
Paper | ArXiv | Code | Poster (coming soon)
Requirements:
- Linux or macOS with Python ≥ 3.8
- PyTorch ≥ 1.8.2. Install them together at pytorch.org to make sure of this. Note, please check PyTorch version matches that is required by Detectron2.
- Detectron2: follow Detectron2 installation instructions.
- OpenAI API (optional, if you want to construct hierarchies using LMMs)
Setup environment
# Clone this project repository under your workspace folder
git clone https://github.com/naver/shine.git --recurse-submodules
cd shine
# Create conda environment and install the dependencies
conda env create -n shine -f shine.yml
# Activate the working environment
conda activate shine
# Install Detectron2 under your workspace folder
# (Please follow Detectron2 official instructions)
cd ..
git clone git@github.com:facebookresearch/detectron2.git
cd detectron2
pip install -e .
Our project uses two submodules,
CenterNet2
and
Deformable-DETR.
If you forget to add --recurse-submodules
, do git submodule init
and then git submodule update
.
Set your OpenAI API Key to the environment variable (optional: if you want to generate hierarchies)
export OPENAI_API_KEY=YOUR_OpenAI_Key
SHiNe is training-free. So we just need to download off-the-shelf OvOD models and apply SHiNe on top of them. You can download the models:
and put (or, softlink via ln -s
command) under the models
folder in this repository as:
SHiNe
└── models
├── codet
├── CoDet_OVLVIS_R5021k_4x_ft4x.pth
└── CoDet_OVLVIS_SwinB_4x_ft4x.pth
├── detic
├── coco_ovod
├── BoxSup_OVCOCO_CLIP_R50_1x.pth
├── Detic_OVCOCO_CLIP_R50_1x_caption.pth
├── Detic_OVCOCO_CLIP_R50_1x_max-size.pth
└── Detic_OVCOCO_CLIP_R50_1x_max-size_caption.pth
├── cross_eval
├── BoxSup-C2_L_CLIP_SwinB_896b32_4x.pth
├── BoxSup-C2_LCOCO_CLIP_SwinB_896b32_4x.pth
├── Detic_LCOCOI21k_CLIP_SwinB_896b32_4x_ft4x_max-size.pth
├── Detic_LI21k_CLIP_SwinB_896b32_4x_ft4x_max-size.pth
└── Detic_LI_CLIP_SwinB_896b32_4x_ft4x_max-size.pth
├── lvis_ovod
├── BoxSup-C2_Lbase_CLIP_R5021k_640b64_4x.pth
├── BoxSup-C2_Lbase_CLIP_SwinB_896b32_4x.pth
├── Detic_LbaseCCcapimg_CLIP_R5021k_640b64_4x_ft4x_max-size.pth
├── Detic_LbaseCCimg_CLIP_R5021k_640b64_4x_ft4x_max-size.pth
├── Detic_LbaseI_CLIP_R5021k_640b64_4x_ft4x_max-size.pth
└── Detic_LbaseI_CLIP_SwinB_896b32_4x_ft4x_max-size.pth
├── lvis_std
├── BoxSup-C2_L_CLIP_R5021k_640b64_4x.pth
├── BoxSup-DeformDETR_L_R50_4x.pth
├── Detic_DeformDETR_LI_R50_4x_ft4x.pth
└── Detic_LI_CLIP_R5021k_640b64_4x_ft4x_max-size.pth
├── vldet
├── lvis_base.pth
├── lvis_base_swinB.pth
├── lvis_vldet.pth
└── lvis_vldet_swinB.pth
You can download the datasets:
- iNat: 17.8 GB
- FSOD: 14.7 GB
- ImageNet-1k Val: 6.2 GB
- COCO: 37.2 GB
- LVIS: 1.8 GB
and put (or, softlink via ln -s
command) under the datasets
folder in this repository as:
SHiNe
└── datasets
├── inat
├── fsod
├── imagenet2012
├── coco
└── lvis
Example of applying SHiNe on Detic for OvOD task using iNat dataset:
# Vanilla OvOD (baseline)
bash scripts_local/Detic/inat/swin/baseline/inat_detic_SwinB_LVIS-IN-21K-COCO_baseline.sh
# SHiNe using dataset-provided hierarchy
bash scripts_local/Detic/inat/swin/shine_gt/inat_detic_SwinB_LVIS-IN-21K-COCO_shine_gt.sh
# SHiNe using LLM-generated synthetic hierarchy
bash scripts_local/Detic/inat/swin/shine_llm/inat_detic_SwinB_LVIS-IN-21K-COCO_shine_llm.sh
Example of applying SHiNe on CLIP zero-shot transfer task using ImageNet-1k dataset:
# Vanilla CLIP Zero-shot transfer (baseline)
bash scripts_local/Classification/imagenet1k/baseline/imagenet1k_vitL14_baseline.sh
# SHiNe using WordNet hierarchy
bash scripts_local/Classification/imagenet1k/shine_wordnet/imagenet1k_vitL14_shine_wordnet.sh
# SHiNe using LLM-generated synthetic hierarchy
bash scripts_local/Classification/imagenet1k/shine_llm/imagenet1k_vitL14_shine_llm.sh
Example of constructing SHiNe classifier for OvOD task using iNat dataset:
# SHiNe using dataset-provided hierarchy
bash scripts_build_nexus/inat/build_inat_nexus_gt.sh
# SHiNe using LLM-generated synthetic hierarchy
bash scripts_build_nexus/inat/build_inat_nexus_llm.sh
Example of building hierarchy trees using either dataset-provided or llm-generated hierarchy entities.
# Build hierarchy tree for iNat using dataset-provided hierarchy
bash scripts_plant_hrchy/inat/plant_inat_tree_gt.sh
# Build hierarchy tree for ImageNet-1k using WordNet hierarchy
bash scripts_plant_hrchy/imagenet1k/plant_imagenet1k_tree_wordnet.sh
# Build hierarchy tree for iNat using LLM-generated synthetic hierarchy
bash scripts_plant_hrchy/inat/plant_inat_tree_llm.sh
# Build hierarchy tree for ImageNet-1k using LLM-generated synthetic hierarchy
bash scripts_plant_hrchy/imagenet1k/plant_imagenet1k_tree_llm.sh
This project is licensed under the LICENSE file.
If you find our work useful for your research, please cite our paper using the following BibTeX entry:
@inproceedings{liu2024shine,
title={{SH}i{N}e: Semantic Hierarchy Nexus for Open-vocabulary Object Detection},
author={Liu, Mingxuan and Hayes, Tyler L. and Ricci, Elisa and Csurka, Gabriela and Volpi, Riccardo},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2024},
}
SHiNe is built upon the awesome works iNat, FSOD, BREEDS, Hierarchy-CLIP, Detic, VLDet, and CoDet. We sincerely thank them for their work and contributions.