Audio-Adaptive Activity Recognition Across Video Domains (CVPR 2022)

More info can be found on our project page

🌟 We won the 2nd place in the UDA track, EPIC-Kitchens Challenge @CVPR 2022. 🌟

Demo Video

Pretrained weights we used

Audio model: link
SlowFast model for RGB modality: link
Slow-Only model for optical flow modality: link

EPIC-Kitchens

- There are two streams in total, one is the audio-adaptive model with RGB and audio modalities, and the other is the audio-adaptive model with optical flow and audio modalities.
- We average the predictions from the two streams in the end for an mean accuracy of 61.0%.

Prepare the audio files (.wav) from the videos:

python generate_sound_files.py

Environments:

PyTorch 1.7.0
mmcv-full 1.2.7
mmaction2 0.13.0
cudatoolkit 10.1.243

The directory structure should be modified to match:

├── rgb
|   ├── train
|   |   ├── D1
|   |   |   ├── P08_01
|   |   |   |     ├── frame_0000000000.jpg
|   |   |   |     ├── ...
|   |   |   ├── P08_02
|   |   |   ├── ...
|   |   ├── D2
|   |   ├── D3
|   ├── test
|   |   ├── D1
|   |   ├── D2
|   |   ├── D3


├── flow
|   ├── train
|   |   ├── D1
|   |   |   ├── P08_01 
|   |   |   |   ├── u
|   |   |   |   |   ├── frame_0000000000.jpg
|   |   |   |   |   ├── ...
|   |   |   |   ├── v
|   |   |   ├── P08_02
|   |   |   ├── ...
|   |   ├── D2
|   |   ├── D3
|   ├── test
|   |   ├── D1
|   |   ├── D2
|   |   ├── D3

RGB and audio

This is the demo code for training the audio-adaptive model with RGB (SlowFast backbone) and audio modalities on EPIC-Kitchens dataset, reproducing an mean accuracy of 59.2%.

Before running the code, you need to change the data paths to yours in dataloader_*.py, train_*.py, test_*.py and get_*.py.
First download the data following the code provided by an existing work https://github.com/jonmun/MM-SADA-code
Go to the sub-directory

cd EPIC-rgb-audio

To run the code on 4 NVIDIA 1080Ti GPUs:

sh bash.sh

Optical flow and audio

This is the demo code for training the audio-adaptive model with optical flow (Slow-Only backbone) and audio modalities on EPIC-Kitchens dataset, reproducing an mean accuracy of 53.9%.

Before running the code, you need to change the data paths to yours in dataloader_*.py, train_*.py, test_*.py and get_*.py.

Note that the clusters and absent-pseudo labels generated by audio are the same as those in the "RGB and audio" code

Go to the sub-directory

cd EPIC-flow-audio

To run the code on 4 NVIDIA 1080Ti GPUs:

sh bash.sh

CharadesEgo

This code conducts semi-supervised domain adaptation with all the source (3rd-person view) data and half of the target (1st-person view) data, based on RGB (SlowFast backbone) and audio modalities, reproducing an mAP of 26.3%.

The directory structure should be modified to match:

├── CharadesEgo
|   ├── audio
|   |   ├── 005BUEGO.wav
|   |   ├── 005BU.wav
|   |   ├── ...
|   ├── CharadesEgo_v1_rgb
|   |   ├── 005BU
|   |   |   ├── 005BU-000001.jpg
|   |   |   ├── 005BU-000002.jpg
|   |   |   ├── ...
|   |   ├── 005BUEGO
|   |   ├── ...
|   ├── Labels
|   |   ├── 005BU
|   |   |   ├── frame_0000000001_0000000174.csv
|   |   |   ├── ...
|   |   ├── 005BUEGO
|   |   ├── ...
|   ├── CharadesEgo_v1_train_only1st.csv
|   ├── CharadesEgo_v1_train_only3rd.csv
|   ├── CharadesEgo_v1_test_only1st.csv
|   ├── CharadesEgo_v1_test_only3rd.csv

Here the "Labels" directory contains the labels that we generated by ourselves according to the csv files provided by the CharadesEgo dataset. You can directly download it from this link or run generate_labels.py to create it by yourself.

Before running the code, you need to change the data paths to yours in dataloader_*.py, train_*.py, test_*.py and get_*.py.
Go to the sub-directory

cd CharadesEgo

To run the code on 4 NVIDIA 1080Ti GPUs:

sh bash.sh

ActorShift Dataset

This dataset can be downloaded at https://uvaauas.figshare.com/articles/dataset/ActorShift_zip/19387046

Contact

If you have any questions, you can send an email to y.zhang9@uva.nl

Citation

If you find the code useful in your research please cite:

@inproceedings{ZhangCVPR2022,
title = {Audio-Adaptive Activity Recognition Across Video Domains},
author = {Yunhua Zhang and Hazel Doughty and Ling Shao and Cees G M Snoek},
year = {2022},
date = {2022-06-02},
urldate = {2022-06-01},
booktitle = {CVPR},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
CharadesEgo		CharadesEgo
EPIC-flow-audio		EPIC-flow-audio
EPIC-rgb-audio		EPIC-rgb-audio
.DS_Store		.DS_Store
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Audio-Adaptive Activity Recognition Across Video Domains (CVPR 2022)

🌟 We won the 2nd place in the UDA track, EPIC-Kitchens Challenge @CVPR 2022. 🌟

Demo Video

Pretrained weights we used

EPIC-Kitchens

RGB and audio

Optical flow and audio

CharadesEgo

ActorShift Dataset

Contact

Citation

About

Releases

Packages

Languages

xiaobai1217/DomainAdaptation

Folders and files

Latest commit

History

Repository files navigation

Audio-Adaptive Activity Recognition Across Video Domains (CVPR 2022)

🌟 We won the 2nd place in the UDA track, EPIC-Kitchens Challenge @CVPR 2022. 🌟

Demo Video

Pretrained weights we used

EPIC-Kitchens

RGB and audio

Optical flow and audio

CharadesEgo

ActorShift Dataset

Contact

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages