Paper's Link:Two-Stream Convolutional Networks for Action Recognition
The backbone of each stream is ResNet-50
Stream | Accuracy |
---|---|
RGB | - |
Optical Flow | - |
Fusion (Two Stream) | 73.53% (only stack 4 optical flow images:2 x_direction 2 y_direction) |
- Ubuntu 16.04.7 LTS
- CUDA Version: 10.1
- PyTorch 1.3.1
- torchvision 0.4.2
- numpy 1.19.2
- pillow 8.0.1
- python 3.6.12
Original Dataset:UCF101
or
By the way, I write a matlab code to generate the optical flow images and the RGB images.
-
For the optical flow images, I call the
Horn–Schunck Algorithm
function in matlab to calculate it. The video frame interval for calculating the optical flow images is set to2
to generate sufficient data. -
For the RGB images, I just randomly sampled
one
single frame from each video.
Generating Data Code (Matlab):calOpticalFlow.m
downloading processed data:Link password:peyu
After downloading processed data, you should unrar the processedData.rar
and build a directory named data
Project
│--- data
│------ RGB
│------ OpticalFlow
│--- other files
Before training, you should new a directory named model
to save checkpoint file.
python3 trainTwoStreamNet.py
This is a demo video for test. I randomly set the test_video_id = 1000
from testset
to run this demo python file. What's more, I use the checkpoint file saved in 9000-th
iteration as the demo model.
You can change the test_video_id
at here:
# set the test video id in testset
test_video_id = 1000
print('Video Name:', LoadUCF101Data.TestVideoNameList[test_video_id])
You can change the checkpoint_file_path
at here:
# load the chekpoint file
state = torch.load('model/checkpoint-9000.pth')
twoStreamNet.load_state_dict(state['model'])
run demo.py
file
CUDA_VISIBLE_DEVICES=0 python3 demo.py
output:
Video Name: v_Drumming_g01_c05
actual class is Drumming
predicted class is Drumming , probability is 99.9534
I recorded some problems and solutions when writing the code. Really so sorry that I only write in Chinese! Here is the Link