General overview

MMAction2 is an open-source toolbox for video understanding based on PyTorch. It is a part of the OpenMMLab project. In this repo we provide a working Dockerfile, and python scripts to process videos for action recognition using the the Action Recognition Models, and the Spatio Temporal Action Detection Models. We have performed experiments on 2 datasets- PoseML (RGB videos of drivers) and SHRP2 (Low-quality videos of drivers)

The files required to test an mmaction2 model are : checkpoint(s) (.pth), config_file (.py) and classes_file(.txt).

For details about the method and quantitative results please check the MMAction2 documentation at https://mmaction2.readthedocs.io/en/latest/

How to test

Use pre-built docker image

Sign in to the Container registry service at ghcr.io

docker pull ghcr.io/akashsonth/action-recognition:latest

docker run -it --rm --runtime=nvidia -v {{dataPath}}:/data action-recognition /bin/bash

Build from scratch

NOTE: this has been tested on a Ubuntu 18.04.6 machine, with a Tesla V100-SXM2-16GB GPU, with docker, nvidia-docker installed, and all relevant drivers.

We use in Dockerfile nvidia/cuda:11.3.0-cudnn8-devel-ubuntu20.04 as base image and recommend using the same.

git clone https://github.com/VTTI/driver-secondary-action-recognition.git

cd driver-secondary-action-recognition

In the file poseml_long_video.yaml, replace the value of the parameters- configFile, checkpoint, and label with the required model parameters. We provide 3 trained models, and have provided instructions for them below. You can also make use of the different options from https://mmaction2.readthedocs.io/en/latest/recognition_models.html

Create a checkpoints folder and download the chosen model checkpoint (options and instructions provided below) into this location.

docker build . -t action-recognition

docker run -it --rm --runtime=nvidia -v {{dataPath}}:/data action-recognition /bin/bash

( replace {{dataPath}} with the local folder on your computer containing [input folder] and where the outuput is expected to be stored)

python demo_long_video.py --input INPUT_VIDEO_PATH -- config poseml_long_video.yaml --output OUTPUT_VIDEO_PATH

Ex: python demo_long_video.py --input ./sample/input/input.mp4 --config poseml_long_video.yaml --device cuda:0 --output ./sample/output/long_video.mp4

The initial few frames are required for instantiating the model, and there are no predictions till then.

frame_no	detection	label	confidence
40	0	texting	0.56
40	1	driving car	0.23
40	1	changing oil	0.07
.	.	.	.
.	.	.	.

Currently this repo supports three Action Recognition Models-

TSN

This is the MMAction2 implementation of Temporal segment networks: Towards good practices for deep action recognition

Value of configFile in poseml_long_video.yaml for this case is configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py

This is the model pre-trained on Kinetics-400- https://mirror.vtti.vt.edu/vtti/ctbs/action_recognition/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth. Use this checkpoint only if training from scratch.

To generate prediction for your videos based on our model trained on PoseML, download https://mirror.vtti.vt.edu/vtti/ctbs/action_recognition/tsn_PoseML_epoch20.pth and move it ./checkpoints. Also, edit the value of checkpoints in poseml_long_video.yaml. Change the value of label in poseml_long_video.yaml to "label_poseml.txt"

To generate prediction for your videos based on our model trained on SHRP2, download https://mirror.vtti.vt.edu/vtti/ctbs/action_recognition/tanet_SHRP2_epoch10.pth and move it ./checkpoints. Also, edit the value of checkpoints in poseml_long_video.yaml. Change the value of label in poseml_long_video.yaml to "label_shrp2.txt"

Metrics on PoseML- Top 1 Accuracy: 76.19%, Top 3 Accuracy: 88.54%

SlowFast

This is the MMAction2 implementation of SlowFast Networks for Video Recognition

Value of configFile in poseml_long_video.yaml for this case is configs/recognition/slowfast/slowfast_r50_4x16x1_256e_kinetics400_rgb.py

This is the model pre-trained on Kinetics-400- https://mirror.vtti.vt.edu/vtti/ctbs/action_recognition/slowfast_r50_256p_4x16x1_256e_kinetics400_rgb_20200728-145f1097.pth. Use this checkpoint only if training from scratch.

To generate prediction for your videos based on our model trained on PoseML, download https://mirror.vtti.vt.edu/vtti/ctbs/action_recognition/slowfast_PoseML6sec_epoch65.pth and move it ./checkpoints. Also, edit the value of checkpoints in poseml_long_video.yaml. Change the value of label in poseml_long_video.yaml to "label_poseml.txt"

To generate prediction for your videos based on our model trained on SHRP2, download https://mirror.vtti.vt.edu/vtti/ctbs/action_recognition/tanet_SHRP2_epoch95.pth and move it ./checkpoints. Also, edit the value of checkpoints in poseml_long_video.yaml. Change the value of label in poseml_long_video.yaml to "label_shrp2.txt"

Metrics on PoseML- Top 1 Accuracy: 71.48%, Top 3 Accuracy: 87.97%

TANet

This is the MMAction2 implementation of TAM: Temporal Adaptive Module for Video Recognition

Value of configFile in poseml_long_video.yaml for this case is configs/recognition/tanet/tanet_r50_dense_1x1x8_100e_kinetics400_rgb.py

This is the model pre-trained on Kinetics-400- https://mirror.vtti.vt.edu/vtti/ctbs/action_recognition/tanet_r50_dense_1x1x8_100e_kinetics400_rgb_20210219-032c8e94.pth. Use this checkpoint only if training from scratch.

To generate prediction for your videos based on our model trained on PoseML, download https://mirror.vtti.vt.edu/vtti/ctbs/action_recognition/tanet_PoseML6sec_epoch35.pth and move it ./checkpoints. Also, edit the value of checkpoints in poseml_long_video.yaml. Change the value of label in poseml_long_video.yaml to "label_poseml.txt"

To generate prediction for your videos based on our model trained on SHRP2, download https://mirror.vtti.vt.edu/vtti/ctbs/action_recognition/tanet_SHRP2_epoch30.pth and move it ./checkpoints. Also, edit the value of checkpoints in poseml_long_video.yaml. Change the value of label in poseml_long_video.yaml to "label_shrp2.txt"

Metrics on PoseML- Top 1 Accuracy: 80.41%, Top 3 Accuracy: 90.72%

Training one of the MMAction2 models

Firsly, prepare a folder train containing all the video files to be used for training. Create an empty text file train.txt. In each line of this text file, you wll have the video name, followed by a space, followed by its class index. Perform a similar action for the validation dataset (val video directory and val.txt text file) Ex-

VID00031_0001.mp4 1
VID00031_0002.mp4 8
VID00031_0003.mp4 8
        .         .
        .         .

In the Docker container, execute the command python train.py CONFIG_FILE

Make the following changes in the train.py file-

Edit cfg.model.cls_head.num_classes = 10 to the number of classes in your dataset
Modify the path cfg.work_dir to your required folder where all the model weights will be saved
Modify the paths of train videos, val videos, and their corresponding text files

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
sample		sample
Dockerfile		Dockerfile
README.md		README.md
base.py		base.py
demo_long_video.py		demo_long_video.py
label_poseml.txt		label_poseml.txt
label_shrp2.txt		label_shrp2.txt
mp4_converter_gui.py		mp4_converter_gui.py
poseml_long_video.yaml		poseml_long_video.yaml
runall_demo_long_video.py		runall_demo_long_video.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

General overview

How to test

Use pre-built docker image

Build from scratch

TSN

SlowFast

TANet

Training one of the MMAction2 models

About

Releases

Packages

Languages

VTTI/driver-secondary-action-recognition

Folders and files

Latest commit

History

Repository files navigation

General overview

How to test

Use pre-built docker image

Build from scratch

TSN

SlowFast

TANet

Training one of the MMAction2 models

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages