sound-shift

Automated music transcription via LSTM in Tensorflow!

Prior Work

Check out my previous project Instrument Classifier for some prior work.

Data Augmentation to Generate Training Data

Random "compositions" were created by generating piano roll notation, then using this piano roll notation to generate corresponding audio files by compositing samples from University of Iowa Electronic Music Studios Musical Instrument Sample Database

Network Architecture

Input audio is divided into overlapping 23ms segments, and the Discrete Fourier Transforms of these segments are fed into an LSTM, which makes predictions for which notes are playing.

Why do Fourier Preprocessing?

Although in principle a neural network could learn to isolate pitches without Fourier pre-processing, a spectral representation is more amenable to it, because it turns the problem into a peak-finding exercise. The following images of a (log transformed) STFT and power spectrogram show how the fundamental frequency can be more easily isolated in a spectral representation:

Limitations

Due to concerns with accurately detecting extended notes and different instruments, we restricted to transcribing marimba, as their percussive attacks are easier to identify (as discovered by previous work), and among pitched percussion instruments we found pitch detection performed best on marimba. The following plot of the first two principal components of samples colored by instrument shows why we chose marimba as our instrument of choice to transcribe instead of opting for instrument detection:

We hope in the future to expand the features we can extract / parse from an input audio file.

Training

Training was completed over the course of a few hours on a local machine. The loss is based on "cosine similarity", which experimentally performed better than mean squared error or note-wise binary cross-entropy.

Reuslts

After training, the network was able to accurately reconstruct the original piano roll notation, with only slight errors in note onset.

Caveat: this should be taken with a grain of salt, since test data was constructed using the same samples as training data, so it is not clear how well this would generalize.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.gitignore		.gitignore
Feature Engineering.ipynb		Feature Engineering.ipynb
Flex Net 2A.ipynb		Flex Net 2A.ipynb
Flex Net 2B.ipynb		Flex Net 2B.ipynb
Fourier Analysis.ipynb		Fourier Analysis.ipynb
Generate Training Data.ipynb		Generate Training Data.ipynb
Iowa Web Scraping.ipynb		Iowa Web Scraping.ipynb
PCA.png		PCA.png
Preprocess Samples.ipynb		Preprocess Samples.ipynb
README.md		README.md
avant-garde.wav		avant-garde.wav
fourier.png		fourier.png
plotting.py		plotting.py
results.png		results.png
spectrogram.png		spectrogram.png
trained_model_001.h5		trained_model_001.h5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sound-shift

Prior Work

Data Augmentation to Generate Training Data

Network Architecture

Why do Fourier Preprocessing?

Limitations

Training

Reuslts

About

Releases

Packages

Languages

jmsmdy/sound-parse

Folders and files

Latest commit

History

Repository files navigation

sound-shift

Prior Work

Data Augmentation to Generate Training Data

Network Architecture

Why do Fourier Preprocessing?

Limitations

Training

Reuslts

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages