speech2text.py

This is a PyTorch inference script for the NVidia openseq2seq's wav2letter model to PyTorch.

The pretrained model weights for English were exported from a TensorFlow checkpoint to HDF5 using a little tfcheckpoint2pytorch script that I wrote.

Limitations: not ready for production, uses float32 weights; uses greedy decoder; does not chunk the input

Dependencies: pytorch (cpu version is OK), numpy, scipy, h5py; optional dependencies for saving the model weights to tfjs format: tensorflow v1.13.1 (install as pip3 install tensorflow==1.13.1), tensorflowjs (install as pip3 install tensorflowjs --no-deps, otherwise it would upgrade your TensorFlow from v1 to v2 and break everything)

The credit for the original wav2letter++ model goes to awesome Facebook AI Research scientists.

Example

# download the pretrained model weights for English and Russian
wget https://github.com/vadimkantorov/inferspeech/releases/download/pretrained/w2l_plus_large_mp.h5 # English, Wav2Letter
wget https://github.com/vadimkantorov/inferspeech/releases/download/pretrained/checkpoint_0010_epoch_01_iter_62500.model.h5 # Russian
wget https://github.com/vadimkantorov/inferspeech/releases/download/pretrained/jasper10x5_LibriSpeech_nvgrad_masks.h5.part_aa # English, Jasper, part1
wget https://github.com/vadimkantorov/inferspeech/releases/download/pretrained/jasper10x5_LibriSpeech_nvgrad_masks.h5.part_ab # English, Jasper, part2
cat jasper10x5_LibriSpeech_nvgrad_masks.h5.part_aa jasper10x5_LibriSpeech_nvgrad_masks.h5.part_ab > jasper10x5_LibriSpeech_nvgrad_masks.h5

# download and transcribe a wav file (16 kHz)
# should print: my heart doth plead that thou in him doth lie a closet never pierced with crystal eyes but the defendant doth that plea deny and says in him thy fair appearance lies
wget https://github.com/vadimkantorov/inferspeech/releases/download/pretrained/121-123852-0004.wav
python3 speech2text.py --model en_w2l --weights w2l_plus_large_mp.h5 -i 121-123852-0004.wav # use Wav2Letter model
python3 speech2text.py --model en_w2l --weights jasper10x5_LibriSpeech_nvgrad_masks.h5 -i 121-123852-0004.wav # use Jasper model

# transcribe some Russian wav file
python3 speech2text.py --model ru_w2l --weights checkpoint_0010_epoch_01_iter_62500.model.h5 -i some_test.wav

# save the model to ONNX format for inspection with https://lutzroeder.github.io/netron/
python3 speech2text.py --model en_w2l --weights w2l_plus_large_mp.h5 --onnx w2l_plus_large_mp.onnx

# save the model to TensorFlow.js format
python3 speech2text.py --model en_w2l --weights w2l_plus_large_mp.h5 --tfjs w2l_plus_large_mp.tfjs

Browser demo with TensorFlow.js (work in progress)

# download and extract the exported tfjs model
wget https://github.com/vadimkantorov/inferspeech/releases/download/pretrained/w2l_plus_large_mp.tfjs.tar.gz
tar -xf w2l_plus_large_mp.tfjs.tar.gz

# serve the tfjs model and demo.html file
python3 -m http.server

# open the demo at http://localhost:8000/demo.html and transcribe the test file 121-123852-0004.wav

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
README.md		README.md
deepdream.py		deepdream.py
demo.html		demo.html
diag.py		diag.py
speech2text.py		speech2text.py
vis.py		vis.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

speech2text.py

Example

Browser demo with TensorFlow.js (work in progress)

About

Releases 1

Packages

Languages

vadimkantorov/inferspeech

Folders and files

Latest commit

History

Repository files navigation

speech2text.py

Example

Browser demo with TensorFlow.js (work in progress)

About

Topics

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages