Easily implement SOTA video understanding methods with PyTorch on multiple machines and GPUs
X-Temporal is an open source video understanding codebase from Sensetime X-Lab group that provides state-of-the-art video classification models, including papers "Temporal Segment Networks", "Temporal Interlacing Network", "Temporal Shift Module", "ResNet 3D", "SlowFast Networks for Video Recognition", and "Non-local Neural Networks".
This repo includes all models and codes used in our 1st place solution in ICCV19-Multi Moments in Time Challenge Challenge Website
- Support popular video understanding frameworks
- SlowFast
- R(2+1)D
- R3D
- TSN
- TIN
- TSM
- Support various datasets (Kinetics, Something2Something, Multi-Moments in Time...)
- Take raw video as input
- Take video RGB frames as input
- Take video Flow frames as input
- Support Multi-label dataset
- High-performance and modular design can help rapid implementation and evaluation of novel video research ideas.
v0.1.0 (08/04/2020)
X-Temporal is online!
The code is built with following libraries:
- PyTorch 1.0 or higher
- TensorboardX
- tqdm
- sklearn
- scikit-learn
- decord
For extracting frames from video data, you may need ffmpeg.
- clone repo
git clone https://github.com/Sense-X/X-Temporal.git X-Temporal
cd X-Temporal
- run the install script
./easy_setup.sh
Each row in the meta file of the data set represents a video, which is divided into 3 columns, which are the picture folder, frame number, and category id after the frame extraction. For example as shown below:
abseiling/Tdd9inAW1VY_000361_000371 300 0
zumba/x0KPHFRbzDo_000087_000097 300 599
You can also directly read the original video file. Decor library is used in X-Temporal code for real-time video frame extraction.
abseiling/Tdd9inAW1VY_000361_000371.mkv 300 0
zumba/x0KPHFRbzDo_000087_000097.mkv 300 599
In the tools folder, scripts for extracting frames and generating data set meta files are provided.
The format of the multi-category data set is as follows, which are the video path, the number of frames, and the categories included.
trimming/getty-cutting-meat-cleaver-video-id163936215_13.mp4 90 144,246
exercising/meta-935267_68.mp4 92 69
cooking/yt-SSLy25MQb9g_307.mp4 91 264,311,7,188,246
YAML config:
trainer:
loss_type: bce
dataset:
multi_class: True
- Create a folder for the experiment.
cd /path/to/X-Temporal
mkdir -p experiments/test
- New or copy config from existing experiment config.
cp experiments/r2plus1d/default.config experiments/test
cp experiments/r2plus1d/run.sh experiments/test
- Set up training scripts, where ROOT and cfg fiile may need to be changed according to specific settings
T=`date +%m%d%H%M`
ROOT=../..
cfg=default.yaml
export PYTHONPATH=$ROOT:$PYTHONPATH
python $ROOT/x_temporal/train.py --config $cfg | tee log.train.$T
- Start training.
./train.sh
- Set the resume_model path in config.
saver: # Required.
resume_model: checkpoints/ckpt_e13.pth # checkpoint to test
- Set the parameters in the evaluate in config, such as the need to use multiple crops on the spatial and temporal during the test to modify the specific parameters. (it is recommended to reduce the batchsize by the same proportion)
evaluate:
spatial_crops: 3
temporal_samples: 10
- Modify run.sh or create new test.sh, the main modification is to change train.py to test.py. The sample is as follows:
T=`date +%m%d%H%M`
ROOT=../..
cfg=default.yaml
export PYTHONPATH=$ROOT:$PYTHONPATH
python $ROOT/x_temporal/test.py --config $cfg | tee log.test.$T
- Start Testing
./test.sh
X-Temporal is released under the MIT license.
Kindly cite our publications if this repo and algorithms help in your research.
@article{zhang2020top,
title={Top-1 Solution of Multi-Moments in Time Challenge 2019},
author={Zhang, Manyuan and Shao, Hao and Song, Guanglu and Liu, Yu and Yan, Junjie},
journal={arXiv preprint arXiv:2003.05837},
year={2020}
}
@article{shao2020temporal,
title={Temporal Interlacing Network},
author={Hao Shao and Shengju Qian and Yu Liu},
year={2020},
journal={AAAI},
}
X-Temporal is maintained by Hao Shao and ManYuan Zhang and Yu Liu.