Evaluation

Results and Models

Our experimental results are:

UCF101-24

Model	Pretrain model	FrameAP@0.5	VideoAP@0.2 \| @0.5 \| @0.75 \| 0.5:0.95	Download
ucf_dla34_K7_rgb_coco.pth	COCO	73.14	78.81 \| 51.02 \| 27.05 \| 26.51	model
ucf_dla34_K7_flow_coco.pth	COCO	68.06	76.59 \| 46.57 \| 18.96 \| 21.35	model
K7 RGB + FLOW COCO	COCO	78.01	82.81 \| 53.83 \| 29.59 \| 28.33

ucf_dla34_K7_rgb_imagenet.pth	ImageNet	70.69	75.37 \| 50.47 \| 25.61 \| 25.96	model
ucf_dla34_K7_flow_imagenet.pth	ImageNet	68.90	77.30 \| 47.94 \| 19.41 \| 21.98	model
K7 RGB + FLOW ImageNet	ImageNet	76.92	81.26 \| 54.43 \| 29.49 \| 28.42

Model name: task\_(split)\_backbone_K?\_rgb\flow_pretrain.pth

JHMDB (models for 3 splits, average results)

Model	Pretrain model	FrameAP@0.5	VideoAP@0.2 \| @0.5 \| @0.75 \| 0.5:0.95	Download
K7 RGB + FLOW COCO	COCO	70.79	77.33 \| 77.19 \| 71.69 \| 59.08	models

K7 RGB + FLOW ImageNet	ImageNet	67.95	76.23 \| 75.41 \| 68.46 \| 53.98	models

# in our Supplementary Material			(this reproduction result is slightly differnet from the original paper)
K7 RGB + FLOW UCF	UCF	73.52	81.05 \| 80.92 \| 75.10 \| 60.65	models

All these models are available at our Google drive

Copy models to ${MOC_ROOT}/experiment/result_models

Inference Step

Firstly, we will get detection results using previous models.

please run

python3 det.py --task normal --K 7 --gpus 0,1,2,3,4,5,6,7 --batch_size 94 --master_batch 10 --num_workers 8 --rgb_model ../experiment/result_model/$PATH_TO_RGB_MODEL --flow_model ../experiment/result_model/$PATH_TO_FLOW_MODEL --inference_dir $INFERENCE_DIR --flip_test --ninput 5

# handle remained chunk size
python3 det.py --task normal --K 7 --gpus 0 --batch_size 1 --master_batch 1 --num_workers 2 --rgb_model ../experiment/result_model/dla34_K7_rgb_coco.pth --flow_model ../experiment/result_model/dla34_K7_flow_coco.pth --inference_dir /data0/liyixuan/speed_test/test --flip_test --ninput 5

# ==============Args==============
#
# --task           during inference, there are three optional method: "normal", "stream", "speed", use "normal" by default
# --K              input tubelet length, 7 by default
# --gpus           gpu list, in our experiments, we use 8 NVIDIA TITAN XP
# --batch_size     total batch size 
# --master_batch   batch size in the first gpu
# --num_workers    total workers
# --rgb_model      path to rgb model
# --flow_model     path to flow model
# --inference_dir  path to save inference results, will be used in mAP step
# --flip_test      flip test during inference, will slightly improve performance but slow down the inference speed
# --ninput 5       stack frames, 1 for rgb, 5 for optical flow 

# additional scripts for jhmdb
# --dataset hmdb
# --split 1        there are 3 splits
# --hm_fusion_rgb  0.4 for jhmdb, 0.5 for ucf, 0.5 by default

$PATH_TO_RGB_MODEL is downloading rgb model.

$PATH_TO_FLOW_MODEL is downloading flow model.

$INFERENCE_DIR is path to save inference results.

More details for flip_test can be found in Tips.md #1.

[Attention] Using --N 10 and removeing --flip_test will increase the inference speed but get a lower performance. More details are in Tips.md #4.

If you want to run on JHMDB dataset, please add --dataset hmdb --split 1 --hm_fusion_rgb 0.4 for split 1.

After inference, you will get detection results in $INFERENCE_DIR.

Evaluate mAP

We use the evaluation code from ACT.

The evalution time will depend on CPU.

If you want a faster evaluation, please choose a small --N during inference step.

For frame mAP, please run:

python3 ACT.py --task frameAP --K 7 --th 0.5 --inference_dir $INFERENCE_DIR

Choose a small --N when inference will speed up this step. More details an be found in Tips.md #2.

For video mAP, please build tubes first:

python3 ACT.py --task BuildTubes --K 7 --inference_dir $INFERENCE_DIR

Then, compute video mAP:

# change --th
python3 ACT.py --task videoAP --K 7 --th 0.2 --inference_dir $INFERENCE_DIR

# 0.5:0.95
python3 ACT.py --task videoAP_all --K 7 --inference_dir $INFERENCE_DIR

[Optional] More scripts

# for jhmdb dataset, please add       '--dataset hmdb --split 1'   for split 1
# add '--exp_id XXX'  the mAP results will be saved in XXX
# add '--model_name YYY'  it can distinguish different models in XXX
# for safety, tube results will be saved in $INFERENCE_DIR, add '--redo' to ignore previous tube results

Online Spatio-Temporal Action Detection

MOC can be also applied for the real-time video stream after some engineering modifications. For video stream, MOC will use only previous K-1 frames and current frame.

Since the backbone feature can be extracted only once, we save previous K-1 frames' features in a buffer. When getting a new frame, MOC's backbone first extracts its feature and combines with the previous K-1 frames' features in buffer. Then, the K frames' features are fed into MOC's three branches to generate tubelet detections. Finally, the linking algorithm builds video-level detection results with these new tubelets at once. After that, update the buffer with current frame's feature for subsequent video stream's detection.

Now we provide a new inference method called 'stream_inference', which can handle real-time video stream online.

Each backbone feature will compute only once and save in a buffer, which avoids redundant computation.

please run:

python3 det.py --task stream --K 7 --gpus 0 --batch_size 1 --master_batch 1 --num_workers 0 --rgb_model ../experiment/result_model/$PATH_TO_RGB_MODEL --inference_dir $INFERENCE_DIR --dataset hmdb --split 1 --flip_test

We use JHMDB because its valuation set is small.

You may notice some lags during stream_inference, please see this: Tips.md #3.

It can be modified for real-time video stream.

We provide codes for testing our online detection FPS (Tubelets Per Seconds, actually).

python3 det.py --task speed_test --K 7 --gpus 0 --batch_size 1 --master_batch 1 --num_workers 0
--rgb_model ../experiment/result_model/$PATH_TO_RGB_MODEL --inference_dir $INFERENCE_DIR

This code uses fake image data for eliminating lags and we do not recommend adding --flip_test in online setting (see Tips.md #1.).

On a single NVIDIA TITAN Xp with batch size = 1, our online detection results are (on UCF dataset with RGB as input):

Bash File

We also provide bash file for evaluation. Please refer ucf_normal_inference.sh and jhmdb_stream_inference.sh.

Other Backbone

Now we support DLA-34 and ResNet-18. Please refer to Backbone.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation.md

Evaluation.md

Evaluation

Results and Models

UCF101-24

JHMDB (models for 3 splits, average results)

Inference Step

Evaluate mAP

Online Spatio-Temporal Action Detection

Bash File

Other Backbone

Files

Evaluation.md

Latest commit

History

Evaluation.md

File metadata and controls

Evaluation

Results and Models

UCF101-24

JHMDB (models for 3 splits, average results)

Inference Step

Evaluate mAP

Online Spatio-Temporal Action Detection

Bash File

Other Backbone