Influence of input type on phone recognition using a CNN

This is a repository containing our files for our research: "What is the influence of input type (read vs. spontaneous) on phone recognition in a deep convolutional neural network". The research has been done for the masters course Advanced Research Methods from the Radboud University Nijmegen. The study was done by training a DNN model on either spontaneous speech, read speech, or a combination of both, testing each model on the three types of speech. The dataset used in the Corpus Gesproken Nederlands (CGN) dataset.

Folders

CGN: contains the Corpus Gesproken Nederlands dataset (not pushed due to size)
Ponyland: general info on the ponyland servers from the RU university.
dataAnalysis: Scripts to analyze, balance and generate used datasets

Datasets

A comparison is made between Spontaneous and read speech. The dataset used is the Corpus Gesproken Nederlands. To be specific folder comp-o/nl and comp-n/nl. In addition a third balanced dataset comp-x is generated with half spontaneous and half read speech.

Data Analysis

see findings of the dataset analysis here

Used model

The model which we used can be found here. Aside from the data analysis and cleaning described in the dataAnalysis, is preprocessing also done using Kaldi in the script for the DNN training.

Usefull links:

Usefull papers:

Siniscalchi, S. M., Yu, D., Deng, L., & Lee, H. (2012). Exploiting Deep Neural Networks for Detection-Based Speech Recognition, 106, 148–157.
Scharenborg, O. (2010). Modeling the use of durational information in human spoken-word recognition. J Acoust Soc Am, 127(6), 3758–3770.
Qian, Y. & Woodland, P. (2016). Very Deep Convolutional Neural Networks for Robust Speech Recognition.
Abdel-Hamid, O., Mohamed, A., Jiang, H. & Penn, G. (2012). Applying Convolutional Neural Networks concepts to hybrid NN-HMM model for speech recognition. Acoustics, Speech, and Signal Processing, 1988. ICASSP-88., 1988 International Conference on. 4277-4280. 10.1109/ICASSP.2012.6288864.
Bengio, Y. & Lecun, Y. (1997). Convolutional Networks for Images, Speech, and Time-Series.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
DNNcm @ 9d33d4d		DNNcm @ 9d33d4d
dataAnalysis		dataAnalysis
ponyland		ponyland
.gitignore		.gitignore
.gitmodules		.gitmodules
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Influence of input type on phone recognition using a CNN

Folders

Datasets

Data Analysis

Used model

Usefull links:

Usefull papers:

About

Releases

Packages

Contributors 2

Languages

thaije/spontaneous-vs-read-phone-recognition

Folders and files

Latest commit

History

Repository files navigation

Influence of input type on phone recognition using a CNN

Folders

Datasets

Data Analysis

Used model

Usefull links:

Usefull papers:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages