Research done on object detection and segmentation during my PhD. Enjoy!
Results on the 2017 COCO validation set. The inference FLOPs and FPS are measured on the first 100 images of the 2017 COCO validation set using an NVIDIA GeForce RTX 3060 Ti GPU.
Backbone | Head | Epochs | AP | Params | GFLOPs | FPS | Script | Log | Cp |
---|---|---|---|---|---|---|---|---|---|
R50+FPN | FQDet | 12 | 43.3 | 33.9 M | 99.0 | 20.9 | script | log | cp |
R50+TPN | FQDet | 12 | 45.5 | 42.2 M | 107.8 | 13.6 | script | log | cp |
R50+DefEnc-P3 | FQDet | 12 | 47.2 | 44.1 M | 234.8 | 9.7 | script | log | cp |
Backbone | Head | Epochs | AP | Params | GFLOPs | FPS | Script | Log | Cp |
---|---|---|---|---|---|---|---|---|---|
R50+FPN | FQDetV2 | 12 | 47.0 | 37.9 M | 117.4 | 17.7 | script | log | cp |
R50+DefEnc-P3 | FQDetV2 | 12 | 50.8 | 48.1 M | 256.1 | 15.5 | script | log | cp |
R50+DefEnc-P2 | FQDetV2 | 12 | 51.7 | 48.4 M | 747.0 | 6.8 | script | log | cp |
SwL+DefEnc-P3 | FQDetV2 | 12 | 58.2 | 218.7 M | 875.4 | 5.8 | script | log | cp |
Backbone | Head | Epochs | AP | Params | GFLOPs | FPS | Script | Log | Cp |
---|---|---|---|---|---|---|---|---|---|
R50+FPN | Mask R-CNN++ | 12 | 41.3 | 40.5 M | 226.7 | 10.4 | script | log | cp |
R50+FPN | PointRend++ | 12 | 42.0 | 40.8 M | 296.2 | 6.6 | script | log | cp |
R50+FPN | RefineMask++ | 12 | 42.7 | 44.2 M | 455.1 | 6.3 | script | log | cp |
R50+FPN | EffSeg (ours) | 12 | 42.4 | 41.8 M | 262.6 | 7.5 | script | log | cp |
Backbone | Head | Epochs | PQ | Params | GFLOPs | FPS | Script | Log | Cp |
---|---|---|---|---|---|---|---|---|---|
R50+FPN | Mask R-CNN++ | 12 | 45.8 | 40.6 M | 218.6 | 9.9 | script | log | cp |
R50+FPN | PointRend++ | 12 | 47.0 | 40.9 M | 289.7 | 6.3 | script | log | cp |
R50+FPN | RefineMask++ | 12 | 47.2 | 44.2 M | 433.2 | 6.3 | script | log | cp |
R50+FPN | EffSeg (ours) | 12 | 47.0 | 41.8 M | 262.6 | 6.7 | script | log | cp |
- Trident Pyramid Networks for Object Detection by Cédric Picron and Tinne Tuytelaars.
- FQDet: Fast-converging Query-based Detector by Cédric Picron, Punarjay Chakravarty, and Tinne Tuytelaars.
- EffSeg: Efficient Fine-Grained Instance Segmentation using Structure-Preserving Sparsity by Cédric Picron, and Tinne Tuytelaars.
- Designing High-Performing Networks for Multi-Scale Computer Vision by Cédric Picron (PhD thesis).
-
Environment:
- Install the
conda
package and environment management system if not already done. - Execute
source setup_env.sh
.
- Install the
-
Data preparation:
- Download the desired datasets.
- Modify the paths in
setup_data.sh
to point to your installation directories. - Execute
source setup_data.sh
.
-
Training: Execute
python main.py
with the desired command-line arguments. Some example training scripts, which were used to obtain the results from above, are found in thescripts
directory. -
Evalutation: Execute
python main.py --eval --eval_task $TASK
with the desired command-line arguments, with $TASK chosen from:- analysis: Analyze the computional cost of the given model.
- comparison: Compare the results from two different models.
- performance: Compute the model performance on the desired benchmark.
- profile: Profile the given model.
- tide: Perform TIDE analysis of given model.
- visualize: Visualize the model predictions.