- This is a Tensorflow implementaion of Audio source separation (mixture to vocal) using the pix2pix. I pre-processed raw data(mixture and vocal pair dataset) to spectrogram that can be treated as 2-dimensional image, then train the model. See the file
hyperparams.py
for the detailed hyperparameters.
- NumPy >= 1.11.1
- TensorFlow >= 1.0.0
- librosa
I used DSD100 dataset which consists of pairs of mixture audio files and vocal audio files. The complete dataset (~14 GB) can be downloaded here.
hyperparams.py
includes all hyper parameters that are needed.data.py
loads training data and preprocess it into units of raw data sequences.modules.py
contains all methods, building blocks and skip connections for networks.networks.py
builds networks.train.py
is for training.
- STEP 1. Adjust hyper parameters in
hyperparams.py
if necessary. - STEP 2. Download and extract DSD100 data as mentioned above at 'data' directory, and run
data.py
. - STEP 3. Run
train.py
.
- I didn't implement evaluation code yet, but i will update soon.