This is the implementation of the source separation neural network which employs the U-net structure. The implementation refers to Singing Voice Separation with Deep U-Net Convolutional Networks.
The encoder and decoder blocks of the U-net each contains 6 convolutional blocks. The model takes the STFT magnitude spectrogram of the input signal and outputs masked STFT spectrogram.
source_separation
+--README.md
+--mask_data
| +--mixtures
| +--train
| +--val
| +--test
| +--targets
| +--train
| +--val
| +--test
+--model
+--pickle_data
| +--train
| +--val
| +--test
+--src_formatted
+--test_result
To run the code, python
, pytorch
, torchaudio
, numpy
, and librosa
are required.
- Have the data in the
mask_data
folder as the structure above. Every sample in training and validation set must of equal length for batch processing. - Run
serialize.py
to obtain the pickle data. - Run
mask_main.py
to execute the training and inference.
Diep Luong