This repository contains the official codebase for Learning Visual Styles from Audio-Visual Associations. We manipulate the style of an image to match a sound. After training with an unlabeled dataset of egocentric hiking videos, our model learns visual styles for a variety of ambient sounds, such as light and heavy rain, as well as physical interactions, such as footsteps. We thank Taesung and Junyan for sharing codes of CUT.
Learning Visual Styles from Audio-Visual Associations
Tingle Li, Yichen Liu, Andrew Owens, Hang Zhao
Tsinghua University, University of Michigan and Shanghai Qi Zhi Institute
In ECCV 2022
- Linux or macOS
- Python 3
- NVIDIA GPU + CUDA CuDNN
-
Clone this repo:
git clone https://github.com/Tinglok/avstyle avstyle cd avstyle
-
Install PyTorch 1.7.1 and other dependencies.
For pip users, please type the command
pip install -r requirements.txt
.For Conda users, you can create a new Conda environment using
conda env create -f environment.yaml
.
We provide Youtube ID in dataset/Into-the-Wild/metadata.xlsx
. Please see youtube-dl to download the videos to dataset/Into-the-Wild/youtube
first.
Then process them using:
python ./dataset/Into-the-Wild/split.py
so that the videos are split into 3s video clips.
Then run the command:
python ./dataset/Into-the-Wild/video2jpg.py
to extract the corresponding images.
Finally download trainA and trainB to dataset\Into-the-Wild
.
Please follow the instruction from Visually Indicated Sounds to download this dataset.
- Train our model on the Into the Wild dataset:
python train.py --dataroot ./datasets/Into-the-Wild --name hiking
The checkpoints will be stored at ./checkpoints/hiking/
.
- Train our model on the Greatest Hits dataset:
python train.py --dataroot ./datasets/Greatest-Hits --name material
The checkpoints will be stored at ./checkpoints/material/
.
- Test our model on the Into the Wild dataset:
python test.py --dataroot ./datasets/Into-the-Wild --name hiking --eval
The test results will be saved to a html file at ./results/hiking/latest_train/index.html
.
- Test our model on the Greatest Hits dataset:
python test.py --dataroot ./datasets/Greatest-Hits --name material --eval
The test results will be saved to a html file at ./results/material/latest_train/index.html
.
Pre-trained models on Into-the-Wild and the Greatest Hits datasets are avaliable at this URL.
If you use this code for your research, please consider citing our paper.
@inproceedings{li2021learning,
author={Tingle Li and Yichen Liu and Andrew Owens and Hang Zhao},
title={{Learning Visual Styles from Audio-Visual Associations}},
year=2022,
booktitle={European Conference on Computer Vision (ECCV)}
}