Official Pytorch Implementation of 'Background Suppression Network for Weakly-supervised Temporal Action Localization' (AAAI 2020 Spotlight)
Background Suppression Network for Weakly-supervised Temporal Action Localization
Pilhyeon Lee (Yonsei Univ.), Youngjung Uh (Clova AI, NAVER Corp.), Hyeran Byun (Yonsei Univ.)Paper: https://arxiv.org/abs/1911.09963
Abstract: Weakly-supervised temporal action localization is a very challenging problem because frame-wise labels are not given in the training stage while the only hint is video-level labels: whether each video contains action frames of interest. Previous methods aggregate frame-level class scores to produce video-level prediction and learn from video-level action labels. This formulation does not fully model the problem in that background frames are forced to be misclassified as action classes to predict video-level labels accurately. In this paper, we design Background Suppression Network (BaS-Net) which introduces an auxiliary class for background and has a two-branch weight-sharing architecture with an asymmetrical training strategy. This enables BaS-Net to suppress activations from background frames to improve localization performance. Extensive experiments demonstrate the effectiveness of BaS-Net and its superiority over the state-of-the-art methods on the most popular benchmarks - THUMOS'14 and ActivityNet.
- Python 3.5
- Pytorch 1.0
- Tensorflow 1.15 (for Tensorboard)
You can set up the environments by using $ pip3 install -r requirements.txt
.
-
Prepare THUMOS'14 dataset.
- We excluded three test videos (270, 1292, 1496) as previous work did.
-
Extract features with two-stream I3D networks
-
Place the features inside the
dataset
folder.- Please ensure the data structure is as below.
├── dataset
└── THUMOS14
├── gt.json
├── split_train.txt
├── split_test.txt
└── features
├── train
├── rgb
├── video_validation_0000051.npy
├── video_validation_0000052.npy
└── ...
└── flow
├── video_validation_0000051.npy
├── video_validation_0000052.npy
└── ...
└── test
├── rgb
├── video_test_0000004.npy
├── video_test_0000006.npy
└── ...
└── flow
├── video_test_0000004.npy
├── video_test_0000006.npy
└── ...
You can easily train and evaluate BaS-Net by running the script below.
If you want to try other training options, please refer to options.py
.
$ bash run.sh
The pre-trained model can be found here. You can evaluate the model by running the command below.
$ bash run_eval.sh
We referenced the repos below for the code.
If you find this code useful, please cite our paper.
@inproceedings{lee2020BaS-Net,
title={Background Suppression Network for Weakly-supervised Temporal Action Localization},
author={Lee, Pilhyeon and Uh, Youngjung and Byun, Hyeran},
booktitle={The 34th AAAI Conference on Artificial Intelligence},
pages={11320--11327},
year={2020}
}
If you have any question or comment, please contact the first author of the paper - Pilhyeon Lee (lph1114@yonsei.ac.kr).