Mix and Localize: Localizing Sound Sources in Mixtures

Xixi Hu, Ziyang Chen,Andrew Owens
University of Michigan
CVPR 2022

This repository contains the official codebase for Mix and Localize: Localizing Sound Sources in Mixtures. [Project Page]

Cycle-consistent multi-source localization

MUSIC Dataset

Download the MUSIC dataset here: MUSIC repo

Postprocess the MUSIC dataset and extract the frames and audio clips. The structure of the dataset folder is as follow.

data
  └──MUSIC 
  │    ├──data-splits
  │    ├──MUSIC_raw
  │           ├──duet
  │           ├──solo
  │                └── [class_label]
  │                         └── [ytid]
  │                               ├── audio
  │                               │      ├──audio_clips
  │                               │             ├── 00000.wav       # 1 second audio clips
  │                               │             ├── 00001.wav
  │                               │             ├── ...
  │                               └── frames
  │                                      ├── 00000.jpg              # fps = 4
  │                                      ├── ...

Training on MUSIC dataset

python train.py --setting="music_multi_nodes" --exp="exp_music" --batch_size=128 --epoch=30

You can also download the pretrained model for MUSIC dataset here

VoxCeleb Dataset

Download the VoxCeleb2 dataset here: VoxCeleb repo

Postprocess the VoxCeleb2 dataset and extract the frames and audio clips. The structure of the dataset folder is as follow.

data
  └── VoxCeleb 
  │    ├──data-splits
  │    ├──VoxCeleb2
  │            └── [idxxxxx]
  │                      └── [video_clip_name]  # 5s clip 
  │                               ├── audio
  │                               │      └── audio.wav
  │                               └── frames
  │                                      ├── frame000001.jpg              # fps = 10
  │                                      ├── ...

Training on VoxCeleb dataset

python train.py --setting="voxceleb_multi_nodes" --exp="exp_voxceleb" --batch_size=128 --lr=1e-4 --epoch=30

You can also download the pretrained model for VoxCeleb2 dataset here

VGGSound annotations

We filtered and annotated segmentation masks for 446 high-quality video frames in VGGSound-Instruments. The annotations can be found at here.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
config		config
data		data
images		images
models		models
.gitignore		.gitignore
README.md		README.md
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mix and Localize: Localizing Sound Sources in Mixtures

Xixi Hu, Ziyang Chen,Andrew Owens
University of Michigan
CVPR 2022

MUSIC Dataset

Training on MUSIC dataset

VoxCeleb Dataset

Training on VoxCeleb dataset

VGGSound annotations

About

Releases

Packages

Contributors 2

Languages

hxixixh/mix-and-localize

Folders and files

Latest commit

History

Repository files navigation

Mix and Localize: Localizing Sound Sources in Mixtures

Xixi Hu*, Ziyang Chen*,Andrew Owens University of Michigan CVPR 2022

MUSIC Dataset

Training on MUSIC dataset

VoxCeleb Dataset

Training on VoxCeleb dataset

VGGSound annotations

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Xixi Hu, Ziyang Chen,Andrew Owens
University of Michigan
CVPR 2022

Packages