Auditory is the our repository for audition processing code in Go (golang) focused on filtering speech wav files via mel filters. A further step using gabors provides filtering for input to neural networks. The processing code is split into 4 packages, sound, mel, dft and agabor, that can be used independently. Example code is in examples/processspeech.
dft
- The 'dft' package does a fourier transform and computes the power spectrum on the sound samples passed in.
mel
- The 'mel' package creates a set of mel filter banks and applies them to the power data to create a spectrogram.
agabor
- The 'agabor' package produces an edge detector that detects oriented contrast transitions between light and dark which can be convolved with the output of the mel processing.
- There are 2 structs, FilterSet and Filter. You must create a FilterSet even if you are only adding one gabor Filter
sound
- sound.go contains code for loading a wav file into a buffer and then converting to a floating point tensor. There are functions for trimming and padding.
- sndenv.go is a higher level api that has code to process a sound in segments calling the sound code, mel code and gabor code
- playwav.go can be called to play a wav file
speech
- speech package has structs for Sequence and Unit
- packages for specific sound sets (corpora) include code to load these sound files with timing information and lookup code.
- Package timit Phones of the TIMIT database. See Speaker-Independent Phone Recognition Using Hidden Markov Models, Kai-Fu Lee and Hsiao-Wuen Hon in IEEE Transactions on Acoustics, Speech and Signal Processing, Vol 37, 1989
- Package grafestes contains the consonant vowel names and timing information for the sound sequences used for the research reported in "Listening Through Voices: Infant Statistical Word Segmentation Across Multiple Speakers", Katherine Graf Estes & Lew-Williams, 2015.
- Package synthcvs contains consonant vowel names and timing information for the synthesized speech generated with gnuspeech. These sounds are similar to the ones used by Saffran, Aslin & Newport, "Statistical Learning by 8-Month-Old Infants", 1996