whatsound is a toolkit for training, and testing audio classification using a neural network.
- Music
- Speech
- Ambient/Noise
- Silence
This toolkit uses Essentia for audio feature extraction, and PyBrain for the use of a backpropagation neural network for the training and testing of classification.
This is the toolkit for classification, which exposes the main classification functionality.
The project is split into modules fit for different purposes.
These modules are needed for audio training and classification.
Extracts audio features from a stream. The Essentia library is used for audio analysis. The features which are used for extraction are:
- MFCC
- Zero crossing rate
- Key strength
- Spectral Flux
- Pitch strength
- LPC
Utility functions
These are global parameters - settings for the neural net, training parameters, audio settings and classifier types.
This module allows training and testing of a data set, with optional the following parameters:
weights
: the path to a PyBrain weights XML filedataset
: the path to a directory containing audio samples split by classsplit
: the ratio with which to split the data set between training/testing