pix2pix

Description

This is a Tensorflow implementaion of Audio source separation (mixture to vocal) using the pix2pix. I pre-processed raw data(mixture and vocal pair dataset) to spectrogram that can be treated as 2-dimensional image, then train the model. See the file hyperparams.py for the detailed hyperparameters.

Requirements

NumPy >= 1.11.1
TensorFlow >= 1.0.0
librosa

Data

I used DSD100 dataset which consists of pairs of mixture audio files and vocal audio files. The complete dataset (~14 GB) can be downloaded here.

File description

hyperparams.py includes all hyper parameters that are needed.
data.py loads training data and preprocess it into units of raw data sequences.
modules.py contains all methods, building blocks and skip connections for networks.
networks.py builds networks.
train.py is for training.

Training the network

STEP 1. Adjust hyper parameters in hyperparams.py if necessary.
STEP 2. Download and extract DSD100 data as mentioned above at 'data' directory, and run data.py.
STEP 3. Run train.py.

Notes

I didn't implement evaluation code yet, but i will update soon.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
README.md		README.md
data.py		data.py
hyperparams.py		hyperparams.py
module.py		module.py
network.py		network.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pix2pix

Description

Requirements

Data

File description

Training the network

Notes

About

Releases

Packages

Languages

soobinseo/pix2pix

Folders and files

Latest commit

History

Repository files navigation

pix2pix

Description

Requirements

Data

File description

Training the network

Notes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages