deep-speechgen: RNN for acoustic speech generation

This project was an attempt to generate human speech using a recurrent neural network (RNN) architecture, dating back to a time when there was no WaveNet yet, and when I had no experience with deep learning or speech processing at all. The project report can be found here.

In hindsight, the project was probably a bit too ambitious but I still learned an aweful lot.

Technical Details

Model

I use a mixture density network as the basic architecture, where the neural network is composed of multiple long short-term memory (LSTM) units. The approach is inspired by the work of Graves (2013), who applied similar techniques to generate handwriting.

Experimental Setup

4.5 hours of English speech from the Simple4All Tundra Corpus were used as training data. The audio files were downsampled from 44.1KHz to 16KHz. From that, 40 mel-cepstral coefficients (mcp) were extracted at a framerate of 80fps and a window size of 0.025s. In a first experiment, those features were utilized to generate novel mcp vectors, from which a spectrogram can be produced. This approach is later extended to generate speech waveforms. AhoCoder [Download] was used to encode and decode the speech signal. For more details, see the report.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
ahocoder.sh		ahocoder.sh
ahodecoder.sh		ahodecoder.sh
downsample.py		downsample.py
extractdelta.sh		extractdelta.sh
plotlogspec.sh		plotlogspec.sh
plotwav.sh		plotwav.sh
report.pdf		report.pdf
speechRNN.py		speechRNN.py
speechRNN_utils.py		speechRNN_utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

deep-speechgen: RNN for acoustic speech generation

Technical Details

Model

Experimental Setup

About

Releases

Packages

Languages

rpinsler/deep-speechgen

Folders and files

Latest commit

History

Repository files navigation

deep-speechgen: RNN for acoustic speech generation

Technical Details

Model

Experimental Setup

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages