Skip to content


Folders and files

Last commit message
Last commit date

Latest commit



50 Commits

Repository files navigation

This is the implementation of Auto-Correlational Neural Networks (ACNN) proposed for disfluency detection from speech transcripts, based on this paper from EMNLP 2018.


  1. Basic Overview
  2. Task
  3. ACNN Model
  4. Requirements
  5. Data
  6. Training
  7. Citation
  8. Credits
  9. Contact

Basic Overview

basic overview


Disfluency refers to any interruptions in the normal flow of speech, including false starts, corrections, repetitions and filled pauses. The basic pattern of disfluency contains three main parts reparandum, interregnum and repair. As illustrated below, the reparandum "to Boston" is the part of the utterance that is replaced, the interregnum "uh I mean" is an optional part of a disfluent structure, and the repair "to Denver" replaces the reparandum. The fluent version is obtained by removing reparandum and interregnum words although disfluency detection models mainly deal with identifying and removing reparanda. The repair (e.g. "to Denver") frequently seems to be a "rough copy" of the reparandum (e.g., to Boston) -- i.e. they incorporate the same or very similar words in roughly the same word order. This similarity is strong evidence of a disfluency that can help the model detect reparanda.

ACNN Model

CNNs and RNNs are surprisingly poor at capturing the "rough copy" dependencies; as a result, their performance heavily depends on hand-crafted pattern-match features. Auto-Correlational Neural Network (ACNN) is a novel neural network that generalises CNN and is able to learn the "rough copies" without requiring any manual feature engineering. The ACNN model only uses whole-word inputs; however, it is competitive with lots of complex models in the literature which rely on hand-crafted features, additional information sources such as partial-word features (which would not be available in a realistic ASR application), or external resources such as dependency parsers and language models.


  • Python 3
  • Tensorflow > 0.12
  • Numpy
$ git clone
$ cd deep-disfluency-detector


We split the Switchboard corpus into training, dev and test set as follows: training data consists of all *sw[23]\*.dff* files, dev training consists of all *sw4[5-9]\*.dff* files and test data consists of all *sw4[0-1]\*.dff* files. We lower-case all text and remove all partial words (e.g. "neu-") and punctuations from the data. The format of input and output files is one sentence per line, where each word in the input sentence has a corresponding label in the output file (labels are either "F" or "E" to denote fluent or disfluent words). Since Switchboard Corpus is not open-source, we cannot release the data split that we use to train the ACNN model. We instead provide some sample data in `./sample_data`.


To train a new ACNN model from scratch:

$ python3 --data_path=/path/to/train_and_test_files --checkpoint_dir=/dir/to/save/checkpoints_and_summaries


To use the trained ACNN model to predict disfluency labels for your own data:

$ cd model/checkpoints
$ wget
$ wget
$ wget
$ cd ../..
$ python3 --input_path=/path/to/input/file --checkpoint_dir=./model --output_path=/path/to/output/file


  author = 	{Jamshid Lou, Paria and Anderson, Peter and Johnson, Mark},
  title = 	{Disfluency Detection using Auto-Correlational Neural Networks},
  booktitle = 	{Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP2018)},
  year = 	{2018},
  pages = 	{4610--4619},
  address = 	{Brussels, Belgium},
  publisher =   {Association for Computational Linguistics},
  url       =   {}


The baseline CNN code is a modified version of Denny's code.


Paria Jamshid Lou