Skip to content

jmsmdy/sound-parse

Repository files navigation

sound-shift

Automated music transcription via LSTM in Tensorflow!

Prior Work

Check out my previous project Instrument Classifier for some prior work.

Data Augmentation to Generate Training Data

Random "compositions" were created by generating piano roll notation, then using this piano roll notation to generate corresponding audio files by compositing samples from University of Iowa Electronic Music Studios Musical Instrument Sample Database

Network Architecture

Input audio is divided into overlapping 23ms segments, and the Discrete Fourier Transforms of these segments are fed into an LSTM, which makes predictions for which notes are playing.

Why do Fourier Preprocessing?

Although in principle a neural network could learn to isolate pitches without Fourier pre-processing, a spectral representation is more amenable to it, because it turns the problem into a peak-finding exercise. The following images of a (log transformed) STFT and power spectrogram show how the fundamental frequency can be more easily isolated in a spectral representation:

fourier spectrogram

Limitations

Due to concerns with accurately detecting extended notes and different instruments, we restricted to transcribing marimba, as their percussive attacks are easier to identify (as discovered by previous work), and among pitched percussion instruments we found pitch detection performed best on marimba. The following plot of the first two principal components of samples colored by instrument shows why we chose marimba as our instrument of choice to transcribe instead of opting for instrument detection:

PCA

We hope in the future to expand the features we can extract / parse from an input audio file.

Training

Training was completed over the course of a few hours on a local machine. The loss is based on "cosine similarity", which experimentally performed better than mean squared error or note-wise binary cross-entropy.

Reuslts

After training, the network was able to accurately reconstruct the original piano roll notation, with only slight errors in note onset.

results

Caveat: this should be taken with a grain of salt, since test data was constructed using the same samples as training data, so it is not clear how well this would generalize.

About

neural network-powered music transcription

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published