For this project, we attempt to do style transfer with audio. We trained a Very Deep Convolutional Neural Network with raw audio to classify pitches. In addition to the above network we trained an example implementation of the SoundNet. We used the NSynth dataset from Google's Magenta team for this project.
We then use these networks to perform style transfer.
All of our models were trained on an NVIDIA V100 GPU on Paperspace.
We used the following:
- pytorch (for training and deep learning)
- Librosa (for sound loading and STFTs in notebook)
- scikit image (for weight denoising in our notebook)
- Tensorboard X for logging
We recommend training on a GPU. Our networks are large enough to a point where it takes a V100 GPU nearly 15 minutes to run through an epoch.
- Download the nsynth dataset. Please use the JSON/WAV version of the dataset.
- Extract your
train
,valid
andtest
tar archives into the `./data/nsynth/`` folder. - The fastest way to start training is to use
train.sh
. Just run./train.sh
to get going. - Parameters such as batch-size and number of epochs are accessible as arguments to
train.py
. - To run style transfer, use
StyleTransferProject.ipynb
.
train.py
is our training routine file.utils.py
contains the code for validation, dataset loading and helpers for saving files.VGG_pytorch.py
andAlvin.py
are our model files. Runningalvin_big
from the models folder requiresVGG_pytorch.py
.Alvin.py
contains both models for training.
- The
data
folder is where you will be storing the nsynth dataset. - The
models
folder contains the pth models we prepared for submission. Additionally, it's where models are saved after every epoch. - The
test_data
folder contains some example files you can use for style transfer. Some of the examples are from the nsynth dataset. - The
runs
folder is where you'd be storing your tensorboard runs.
Ashvala Vinay, Avneesh Sarwate, Antoine de Meeus