speech-to-text

This framework provides python scripts to train neural networks for speech recognition.

Requirements

Python 3.6+

Prerequisites

Installing Dependencies

Install the required python dependencies listed in the requirements.txt:

pip install -r requirements.txt

Providing training data

To run a training, training data is required. A Training accepts a file, which contains metadata about the training data. The file itself is a JSON file consisting of an array. Each element has the following properties:

path: Absolute path to the audio file
text: Transcription of the audio file

A training data file could look like this:

[
  {
    "path": "path/to/audio/file1.wav",
    "text": "hello world"
  },
  {
    "path": "path/to/audio/file2.wav",
    "text": "goodbye world"
  }
]

You can find a downloader for the voxforge corpus at https://github.com/KevNetG/speech-to-text-voxforge. This repo also includes a generator.py file, which creates a training data file containing the required metadata for trainings.

Currently, only WAVE files are supported.

Usage

Configuring a training

The idea is, that you don't write you training configuration into the command line, but instead into a file, which you can modify and reuse for other trainings. You can find a sample configuration under examples/training.config.json and adjust it to your needs. A configuration file has the following properties:

epochs: Number of epochs to train
batchSize: Batch Size
trainingDataQuantity: Amount of training data that is taken from the provided sources
net: Name of the model. Models are specified in the models.py file
trainingData: Absolute Path to a training data file. You can specify multiple sources to simply scale your amount of available training data
alphabetPath: Path to an alphabet file

{
  "epochs": 10,
  "batchSize": 20,
  "trainingDataQuantity": 50000,
  "net": "graves",
  "trainingData": [
    "speech-to-text/training_data.json"
  ],
  "alphabetPath": "speech-to-text/examples/english.json"
}

You can use the english alphabet available under examples/english.json or create one yourself for any other language. The alphabet file is a simple JSON file consisting of an array containing the characters from the alphabet:

Running a training

To run a training execute the train.py and provide two arguments:

path: Where to store the training. You don't have to specify a file extension
plan: The path to a training configuration

like this:

python train.py "trainings/graves" "training_data.json"

Trainings are saved after each epoch.

Continuing an interrupted training

If you had to stop a training prematurely, you can continue it from the last checkpoint. Simply execute the continuetraining.py and pass the path to the training save file. You don't have to specify the file extension:

python continuetraining.py "trainings/graves.json"

Making a prediction

In order to create a transcription of an audio file, use the predict.py script. Pass the following arguments:

path: Path to a training save file
weights, Path to a weights matrix
audio: Path to the audio file which shall be transcribed

For example:

python predict.py "trainings/graves.json" "trainings/graves.weights-20-65.68075.h5" "media/audio.wav"

Displaying training statistics

If you want to display the training loss and the validation loss of a training, execute the statistics.py script:

python statistics.py "trainings/graves.statistics.json"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

speech-to-text

Requirements

Prerequisites

Installing Dependencies

Providing training data

Usage

Configuring a training

Running a training

Continuing an interrupted training

Making a prediction

Displaying training statistics

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
examples		examples
prediction		prediction
speech		speech
training		training
utils		utils
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
continuetraining.py		continuetraining.py
logger.py		logger.py
logging.json		logging.json
models.py		models.py
predict.py		predict.py
requirements.txt		requirements.txt
statistics.py		statistics.py
train.py		train.py

License

kevobt/speech-to-text

Folders and files

Latest commit

History

Repository files navigation

speech-to-text

Requirements

Prerequisites

Installing Dependencies

Providing training data

Usage

Configuring a training

Running a training

Continuing an interrupted training

Making a prediction

Displaying training statistics

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages