Learning Visual Styles from Audio-Visual Associations

Video | Website | Paper

This repository contains the official codebase for Learning Visual Styles from Audio-Visual Associations. We manipulate the style of an image to match a sound. After training with an unlabeled dataset of egocentric hiking videos, our model learns visual styles for a variety of ambient sounds, such as light and heavy rain, as well as physical interactions, such as footsteps. We thank Taesung and Junyan for sharing codes of CUT.

Learning Visual Styles from Audio-Visual Associations
Tingle Li, Yichen Liu, Andrew Owens, Hang Zhao
Tsinghua University, University of Michigan and Shanghai Qi Zhi Institute
In ECCV 2022

Prerequisites

Linux or macOS
Python 3
NVIDIA GPU + CUDA CuDNN

Quick Start

Clone this repo:

git clone https://github.com/Tinglok/avstyle avstyle
cd avstyle

Install PyTorch 1.7.1 and other dependencies.

For pip users, please type the command pip install -r requirements.txt.

For Conda users, you can create a new Conda environment using conda env create -f environment.yaml.

Datasets

Into the wild

We provide Youtube ID in dataset/Into-the-Wild/metadata.xlsx. Please see youtube-dl to download the videos to dataset/Into-the-Wild/youtube first.

Then process them using:

python ./dataset/Into-the-Wild/split.py

so that the videos are split into 3s video clips.

Then run the command:

python ./dataset/Into-the-Wild/video2jpg.py

to extract the corresponding images.

Finally download trainA and trainB to dataset\Into-the-Wild.

The Greatest Hits

Please follow the instruction from Visually Indicated Sounds to download this dataset.

Training and Test

Train our model on the Into the Wild dataset:

python train.py --dataroot ./datasets/Into-the-Wild --name hiking

The checkpoints will be stored at ./checkpoints/hiking/.

Train our model on the Greatest Hits dataset:

python train.py --dataroot ./datasets/Greatest-Hits --name material

The checkpoints will be stored at ./checkpoints/material/.

Test our model on the Into the Wild dataset:

python test.py --dataroot ./datasets/Into-the-Wild --name hiking --eval

The test results will be saved to a html file at ./results/hiking/latest_train/index.html.

Test our model on the Greatest Hits dataset:

python test.py --dataroot ./datasets/Greatest-Hits --name material --eval

The test results will be saved to a html file at ./results/material/latest_train/index.html.

Pre-trained Model

Pre-trained models on Into-the-Wild and the Greatest Hits datasets are avaliable at this URL.

Citation

If you use this code for your research, please consider citing our paper.

@inproceedings{li2021learning,
  author={Tingle Li and Yichen Liu and Andrew Owens and Hang Zhao},
  title={{Learning Visual Styles from Audio-Visual Associations}},
  year=2022,
  booktitle={European Conference on Computer Vision (ECCV)}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Learning Visual Styles from Audio-Visual Associations

Video | Website | Paper

Prerequisites

Quick Start

Datasets

Into the wild

The Greatest Hits

Training and Test

Pre-trained Model

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
checkpoints		checkpoints
data		data
dataset/Into-the-Wild		dataset/Into-the-Wild
figs		figs
models		models
options		options
util		util
LICENSE.md		LICENSE.md
README.md		README.md
environment.yaml		environment.yaml
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py

License

Tinglok/avstyle

Folders and files

Latest commit

History

Repository files navigation

Learning Visual Styles from Audio-Visual Associations

Video | Website | Paper

Prerequisites

Quick Start

Datasets

Into the wild

The Greatest Hits

Training and Test

Pre-trained Model

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages