Skip to content

Code for data preprocessing of CGN dataset with Kaldi, and CNN for phone recognition. Work in progress

Notifications You must be signed in to change notification settings

thaije/spontaneous-vs-read-phone-recognition

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Influence of input type on phone recognition using a CNN

This is a repository containing our files for our research: "What is the influence of input type (read vs. spontaneous) on phone recognition in a deep convolutional neural network". The research has been done for the masters course Advanced Research Methods from the Radboud University Nijmegen. The study was done by training a DNN model on either spontaneous speech, read speech, or a combination of both, testing each model on the three types of speech. The dataset used in the Corpus Gesproken Nederlands (CGN) dataset.

Folders

  • CGN: contains the Corpus Gesproken Nederlands dataset (not pushed due to size)
  • Ponyland: general info on the ponyland servers from the RU university.
  • dataAnalysis: Scripts to analyze, balance and generate used datasets

Datasets

A comparison is made between Spontaneous and read speech. The dataset used is the Corpus Gesproken Nederlands. To be specific folder comp-o/nl and comp-n/nl. In addition a third balanced dataset comp-x is generated with half spontaneous and half read speech.

Data Analysis

see findings of the dataset analysis here

Used model

The model which we used can be found here. Aside from the data analysis and cleaning described in the dataAnalysis, is preprocessing also done using Kaldi in the script for the DNN training.

Usefull links:

Usefull papers:

  • Siniscalchi, S. M., Yu, D., Deng, L., & Lee, H. (2012). Exploiting Deep Neural Networks for Detection-Based Speech Recognition, 106, 148–157.
  • Scharenborg, O. (2010). Modeling the use of durational information in human spoken-word recognition. J Acoust Soc Am, 127(6), 3758–3770.
  • Qian, Y. & Woodland, P. (2016). Very Deep Convolutional Neural Networks for Robust Speech Recognition.
  • Abdel-Hamid, O., Mohamed, A., Jiang, H. & Penn, G. (2012). Applying Convolutional Neural Networks concepts to hybrid NN-HMM model for speech recognition. Acoustics, Speech, and Signal Processing, 1988. ICASSP-88., 1988 International Conference on. 4277-4280. 10.1109/ICASSP.2012.6288864.
  • Bengio, Y. & Lecun, Y. (1997). Convolutional Networks for Images, Speech, and Time-Series.

About

Code for data preprocessing of CGN dataset with Kaldi, and CNN for phone recognition. Work in progress

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published