Simple Image-level Classification Improves Open-vocabulary Object Detection (arXiv 2312.10439)

Installation

Our environment: Unbuntu18.04, cuda10.2, python3.8, pytorch1.12.0, detectron2.0.6

conda create -n cads python=3.8
conda activate cads
conda install pytorch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 cudatoolkit=10.2 -c pytorch

git clone https://github.com/facebookresearch/detectron2.git
cd detectron2
pip install -e . 
cd .. 
git clone https://github.com/openai/CLIP.git
cd CLIP 
pip install -e .
cd .. 
pip install -r requirements.txt

Dataset preparation

Please follow prepare datasets

Pretrained Models

Download ImageNet-21K pretrained ResNet-50 from MIIL

convert the pretrained models into d2 style:

python tools/convert-thirdparty-pretrained-model-to-d2.py --path models/resnet50_miil_21k.pth

Download the pretrained BoxSup\ Detic for OV-LVIS and Detic for OV-COCO

Train our MLR module

Train on LVIS-Base

python train_net.py --resume --dist-url auto --num-gpus 4 \
    --config configs/ovmlr_lvis_base.yaml OUTPUT_DIR training_dir/ovmlr_lvis_base

Train on LVIS-Base + ImageNet-21K

python train_net.py --resume --dist-url auto --num-gpus 4 \
    --config configs/ovmlr_lvis_base_in21k.yaml OUTPUT_DIR training_dir/ovmlr_lvis_base_in21k

Train on COCO-Base + COCO Caption

python train_net.py --resume --dist-url auto --num-gpus 4 \
    --config configs/ovmlr_coco_base_caption.yaml OUTPUT_DIR training_dir/ovmlr_coco_base_caption

CADS for BoxSup and Detic

Merge the weight of MLR model and pretrained OVOD models on OV-LVIS

# for Boxsup
python tools/merge_weights.py --weight_path_mlr training_dir/ovmlr_lvis_base/model_final.pth \
    --weight_path_ovod models/BoxSup-C2_Lbase_CLIP_R5021k_640b64_4x.pth \
    --output_path models/BoxSup-C2_Lbase_CLIP_R5021k_640b64_4x_mlr.pth 

# for Detic
python tools/merge_weights.py --weight_path_mlr training_dir/ovmlr_lvis_base_in21k/model_final.pth  \
    --weight_path_ovod models/Detic_LbaseI_CLIP_R5021k_640b64_4x_ft4x_max-size.pth \
    --output_path models/Detic_LbaseI_CLIP_R5021k_640b64_4x_ft4x_max-size_mlr.pth

Evaluation on OV-LVIS

python train_net.py --resume --eval-only --dist-url auto --num-gpus 4 \
    --config configs/boxsup_cads_ovlvis.yaml OUTPUT_DIR training_dir/boxsup_lvis_base_cads \
    MODEL.WEIGHTS models/BoxSup-C2_Lbase_CLIP_R5021k_640b64_4x_mlr.pth 

python train_net.py --resume --eval-only --dist-url auto --num-gpus 4 \
    --config configs/boxsup_cads_ovlvis.yaml OUTPUT_DIR training_dir/detic_lvis_cads \
    MODEL.WEIGHTS models/Detic_LbaseI_CLIP_R5021k_640b64_4x_ft4x_max-size_mlr.pth

Merge the weight of MLR model and pretrained OVOD models on OV-COCO

python tools/merge_weights.py --weight_path_mlr training_dir/ovmlr_coco_base_caption/model_final.pth \
    --weight_path_ovod models/Detic_OVCOCO_CLIP_R50_1x_max-size_caption.pth \
    --output_path models/Detic_OVCOCO_CLIP_R50_1x_max-size_caption_mlr.pth

Evaluation on OV-COCO

python train_net.py --resume --eval-only --dist-url auto --num-gpus 4 \
    --config configs/detic_cads_ovcoco.yaml OUTPUT_DIR training_dir/detic_ovcoco_cads \
    MODEL.WEIGHTS models/Detic_OVCOCO_CLIP_R50_1x_max-size_caption_mlr.pth

Acknowledgment

Our code is mainly based on Detic and Detectron2.

Citation

@inproceedings{fang2023simple,
  title={Simple Image-level Classification Improves Open-vocabulary Object Detection},
  author={Fang, Ruohuan and Pang, Guansong and Bai, Xiao},
  booktitle={The 38th Annual AAAI Conference on Artificial Intelligence},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
cads		cads
configs		configs
datasets		datasets
resources		resources
third_party		third_party
tools		tools
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
train_net.py		train_net.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Simple Image-level Classification Improves Open-vocabulary Object Detection (arXiv 2312.10439)

Installation

Dataset preparation

Pretrained Models

Train our MLR module

CADS for BoxSup and Detic

Acknowledgment

Citation

About

Releases

Packages

Contributors 2

Languages

mala-lab/SIC-CADS

Folders and files

Latest commit

History

Repository files navigation

Simple Image-level Classification Improves Open-vocabulary Object Detection (arXiv 2312.10439)

Installation

Dataset preparation

Pretrained Models

Train our MLR module

CADS for BoxSup and Detic

Acknowledgment

Citation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages