FsDet contains the official few-shot object detection implementation of the ICML 2020 paper Frustratingly Simple Few-Shot Object Detection.
In addition to the benchmarks used by previous works, we introduce new benchmarks on three datasets: PASCAL VOC, COCO, and LVIS. We sample multiple groups of few-shot training examples for multiple runs of the experiments and report evaluation results on both the base classes and the novel classes. These are described in more detail in Data Preparation.
We also provide benchmark results and pre-trained models for our two-stage fine-tuning approach (TFA). In TFA, we first train the entire object detector on the data-abundant base classes, and then only fine-tune the last layers of the detector on a small balanced training set. See Models for our provided models and Getting Started for instructions on training and evaluation.
FsDet is well-modularized so you can easily add your own datasets and models. The goal of this repository is to provide a general framework for few-shot object detection that can be used for future research.
If you find this repository useful for your publications, please consider citing our paper.
@article{wang2020few,
title={Frustratingly Simple Few-Shot Object Detection},
author={Wang, Xin and Huang, Thomas E. and Darrell, Trevor and Gonzalez, Joseph E and Yu, Fisher}
booktitle = {International Conference on Machine Learning (ICML)},
month = {July},
year = {2020}
}
FsDet is built on Detectron2.
Note that you don't need to build detectron2 seperately as this codebase is self-contained. You can follow the instructions
below to install the dependencies and build FsDet
.
Requirements
- Linux with Python >= 3.6
- PyTorch >= 1.3
- torchvision that matches the PyTorch installation
- Dependencies:
pip install -r requirements.txt
- pycocotools:
pip install cython; pip install 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'
- fvcore:
pip install 'git+https://github.com/facebookresearch/fvcore'
- OpenCV, optional, needed by demo and visualization
pip install opencv-python
- GCC >= 4.9
Build FsDet
python setup.py build develop
Note: you may need to rebuild FsDet after reinstalling a different build of PyTorch.
- configs: Configuration files
- datasets: Dataset files (see Data Preparation for more details)
- fsdet
- checkpoint: Checkpoint code.
- config: Configuration code and default configurations.
- data: Dataset code.
- engine: Contains training and evaluation loops and hooks.
- evaluation: Evaluation code for different datasets.
- layers: Implementations of different layers used in models.
- modeling: Code for models, including backbones, proposal networks, and prediction heads.
- solver: Scheduler and optimizer code.
- structures: Data types, such as bounding boxes and image lists.
- utils: Utility functions.
- tools
- train_net.py: Training script.
- test_net.py: Testing script.
- ckpt_surgery.py: Surgery on checkpoints.
- run_experiments.py: Running experiments across many seeds.
- aggregate_seeds.py: Aggregating results from many seeds.
We evaluate our models on three datasets:
- PASCAL VOC: We use the train/val sets of PASCAL VOC 2007+2012 for training and the test set of PASCAL VOC 2007 for evaluation. We randomly split the 20 object classes into 15 base classes and 5 novel classes, and we consider 3 random splits. The splits can be found in fsdet/data/datasets/builtin_meta.py.
- COCO: We use COCO 2014 and extract 5k images from the val set for evaluation and use the rest for training. We use the 20 object classes that are the same with PASCAL VOC as novel classes and use the rest as base classes.
- LVIS: We treat the frequent and common classes as the base classes and the rare categories as the novel classes.
See datasets/README.md for more details.
We provide a set of benchmark results and pre-trained models available for download in MODEL_ZOO.md.
- Pick a model and its config file from
model zoo,
for example,
COCO-detection/faster_rcnn_R_101_FPN_ft_all_1shot.yaml
. - We provide
demo.py
that is able to run builtin standard models. Run it with:
python demo/demo.py --config-file configs/COCO-detection/faster_rcnn_R_101_FPN_ft_all_1shot.yaml \
--input input1.jpg input2.jpg \
[--other-options]
--opts MODEL.WEIGHTS fsdet://coco/tfa_cos_1shot/model_final.pth
The configs are made for training, therefore we need to specify MODEL.WEIGHTS
to a model from model zoo for evaluation.
This command will run the inference and show visualizations in an OpenCV window.
For details of the command line arguments, see demo.py -h
or look at its source code
to understand its behavior. Some common arguments are:
- To run on your webcam, replace
--input files
with--webcam
. - To run on a video, replace
--input files
with--video-input video.mp4
. - To run on cpu, add
MODEL.DEVICE cpu
after--opts
. - To save outputs to a directory (for images) or a file (for webcam or video), use
--output
.
To train a model, run
python tools/train_net.py --num-gpus 8 \
--config-file configs/PascalVOC-detection/split1/faster_rcnn_R_101_base1.yaml
To evaluate the trained models, run
python tools/test_net.py --num-gpus 8 \
--config-file configs/PascalVOC-detection/split1/faster_rcnn_R_101_FPN_ft_all1_1shot.yaml \
--eval-only
For more detailed instructions on the training procedure of TFA, see TRAIN_INST.md.
For ease of training and evaluation over multiple runs, we provided several helpful scripts in tools/
.
You can use tools/run_experiments.py
to do the training and evaluation. For example, to experiment on 30 seeds of the first split of PascalVOC on all shots, run
python tools/run_experiments.py --num-gpus 8 \
--shots 1 2 3 5 10 --seeds 0 30 --split 1
After training and evaluation, you can use tools/aggregate_seeds.py
to aggregate the results over all the seeds to obtain one set of numbers. To aggregate the 3-shot results of the above command, run
python tools/aggregate_seeds.py --shots 3 --seeds 30 --split 1 \
--print --plot