This is the official implementation of the paper "An End-to-End Visual-Audio Attention Network for Emotion Recognition in User-Generated Videos".
If you use this code, please cite the following:
@inproceedings{Zhao2020AnEV,
title={An End-to-End Visual-Audio Attention Network for Emotion Recognition in User-Generated Videos},
author={Sicheng Zhao and Yunsheng Ma and Yang Gu and Jufeng Yang and Tengfei Xing and Pengfei Xu and Runbo Hu and Hua Chai and Kurt Keutzer},
booktitle={AAAI},
year={2020}
}
- PyTorch (ver. 0.4+ required)
- FFmpeg
- Python3
- Download the videos here.
- Convert from mp4 to jpg files using
/tools/video2jpg.py
- Add n_frames information using
/tools/n_frames.py
- Generate annotation file in json format using
/tools/ve8_json.py
- Convert from mp4 to mp3 files using
/tools/video2mp3.py
Assume the strcture of data directories is the following:
~/
VideoEmotion8--imgs
.../ (directories of class names)
.../ (directories of video names)
.../ (jpg files)
VideoEmotion8--mp3
.../ (directories of class names)
.../ (mp3 files)
results
ve8_01.json
Confirm all options in ~/opts.py
.
python main.py