This is a list of awesome articles about object detection from video.
- Site: http://image-net.org/challenges/LSVRC/2017/#vid
- Kagge: https://www.kaggle.com/account/login?returnUrl=%2Fc%2Fimagenet-object-detection-from-video-challenge
- Site: http://aiskyeye.com/
- Date: Feb 2016
- Motivation: Smoothing the final bounding box predictions across time.
- Summary: Constructing a temporal graph from overlapping bounding box detections across the adjacent frames, and using dynamic programming to select bounding box sequences with the highest overall detection score.
- Date: Apr 2016
- Summary: Using a video object detection pipeline that involves predicting optical flow first, then propagating image level predictions according to the flow, and finally using a tracking algorithm to select temporally consistent high confidence detections.
- Performance: 73.8% mAP on ImageNet VID validation.
- Date: Apr 2016
- Date: Nov 2016
- Performance: 73.0% mAP on ImageNet VID validation at 29 fps on a Titan X GPU.
- Date: Feb 2017
- Date: Mar 2017
- Motivation: Producing powerful spatiotemporal features.
- Performance: 76.3% mAP at 1.4 fps or 78.4% (combined with Seq-NMS) at 1.1 fps on ImageNet VID validation on a Titan X GPU.
- Date: Oct 2017
- Motivation: Smoothing the final bounding box predictions across time.
- Summary: Proposing a ConvNet architecture that solves detection and tracking problems jointly and applying a Viterbi algorithm to link the detections across time.
- Performance: 79.8% mAP on ImageNet VID validation.
- Date: Nov 2017
- Motivation: Producing powerful spatiotemporal features.
- Performance: 78.6% mAP on ImageNet VID validation at 13 fps on a Titan X GPU.
[Arxiv] [Summary] [Code] [Demo]
- Date: Dec 2017
- Motivation: Producing powerful spatiotemporal features.
- Performance: 80.5% mAP on ImageNet VID validation.
- Date: Jan 2018
- Date: Apr 2018
- Motivation: Producing powerful spatiotemporal features.
- Performance: 60.2% mAP on ImageNet VID validation at 25.6 fps on mobiles.
- Date: Apr 2018
- Performance: 79.4% mAP at 20 fps or 79.0% at 62 fps on ImageNet VID validation on a Titan X GPU.
- Date: Mar 2018
- Motivation: Producing powerful spatiotemporal features.
- Performance: 78.9% mAP or 80.4% (combined with Seq-NMS) on ImageNet VID validation.
- Date: Stp. 2018
- Motivation: Producing powerful spatiotemporal features.
- Performance: 78.1% mAP or 80.3% (combined with Seq-NMS) on ImageNet VID validation.
- Date: Nov 2018
- Motivation: Smoothing the final bounding box predictions across time.
- Performance: 83.5% of mAP with FGFA and Deformable ConvNets v2 on ImageNet VID validation.
- Date: Feb 2019
- Motivation: Adaptively rescale the input image resolution to improve both accuracy and speed for video object detection.
- Performance: 75.5% of mAP on ImageNet VID validation for 4 different multi-scale training (600, 480, 360, 240).
- Date: Feb 2019
- Motivation: Smoothing the final bounding box predictions across time (box-level method).
- Performance: 80.9% of mAP (offline detection) and 78.2% of mAP (online detection) both at 38 fps on a Titan X GPU.
Paper | Date | Base detector | Backbone | Tracking? | Optical flow? | Online? | mAP(%) | FPS (Titan X) |
---|---|---|---|---|---|---|---|---|
Seq-NMS | Feb 2016 | R-FCN | ResNet101 | no | no | no | 76.8 | 2.3 |
T-CNN | Apr 2016 | RCNN | DeepIDNet+CRAFT | yes | no | no | 73.8 | - |
DFF | Nov 2016 | R-FCN | ResNet101 | no | yes | yes | 73.0 | 29 |
TPN | Feb 2017 | TPN | GoogLeNet | yes | no | no | 68.4 | - |
FGFA | Mar 2017 | R-FCN | ResNet101 | no | yes | yes | 76.3 | 1.4 |
FGFA + Seq-NMS | 29 Mar 2017 | R-FCN | ResNet101 | no | yes | no | 78.4 | 1.14 |
D&T | Oct 2017 | R-FCN (15 anchors) | ResNet101 | yes | no | no | 79.8 | 7.09 |
STMN | Dec 2017 | R-FCN | ResNet101 | no | no | no | 80.5 | - |
Scale-time-lattice | 16 Apr 2018 | Faster RCNN (15 anchors) | ResNet101 | no | no | no | 79.6 | 20 |
Scale-time-lattice | Apr 2018 | Faster RCNN (15 anchors) | ResNet101 | no | no | no | 79.0 | 62 |
SSN (per-frame baseline for STSN) | Mar 2018 | R-FCN | Deformable ResNet101 | no | no | yes | 76.0 | - |
STSN | Mar 2018 | R-FCN | Deformable ResNet101 | no | no | yes | 78.9 | - |
STSN+Seq-NMS | Mar 2018 | R-FCN | Deformable ResNet101 | no | no | no | 80.4 | - |
MANet | Sep. 2018 | R-FCN | ResNet101 | no | yes | yes | 78.1 | 5 |
MANet+Seq-NMS | Sep. 2018 | R-FCN | ResNet101 | no | yes | no | 80.3 | - |
Tracklet-Conditioned Detection | Nov 2018 | R-FCN | ResNet101 | yes | no | yes | 78.1 | - |
Tracklet-Conditioned Detection+DCNv2 | Nov 2018 | R-FCN | ResNet101 | yes | no | yes | 82.0 | - |
Tracklet-Conditioned Detection+DCNv2+FGFA | Nov 2018 | R-FCN | ResNet101 | yes | no | yes | 83.5 | - |
Seq-Bbox Matching | Feb 2019 | YOLOv3 | darknet53 | no | no | no | 80.9 | 38 |
Seq-Bbox Matching | Feb 2019 | YOLOv3 | darknet53 | no | no | yes | 78.2 | 38 |