Skip to content

PyTorch speech2text inference script for the NVidia openseq2seq wav2letter model variant

Notifications You must be signed in to change notification settings

vadimkantorov/inferspeech

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

76 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

speech2text.py

This is a PyTorch inference script for the NVidia openseq2seq's wav2letter model to PyTorch.

The pretrained model weights for English were exported from a TensorFlow checkpoint to HDF5 using a little tfcheckpoint2pytorch script that I wrote.

Limitations: not ready for production, uses float32 weights; uses greedy decoder; does not chunk the input

Dependencies: pytorch (cpu version is OK), numpy, scipy, h5py; optional dependencies for saving the model weights to tfjs format: tensorflow v1.13.1 (install as pip3 install tensorflow==1.13.1), tensorflowjs (install as pip3 install tensorflowjs --no-deps, otherwise it would upgrade your TensorFlow from v1 to v2 and break everything)

The credit for the original wav2letter++ model goes to awesome Facebook AI Research scientists.

Example

# download the pretrained model weights for English and Russian
wget https://github.com/vadimkantorov/inferspeech/releases/download/pretrained/w2l_plus_large_mp.h5 # English, Wav2Letter
wget https://github.com/vadimkantorov/inferspeech/releases/download/pretrained/checkpoint_0010_epoch_01_iter_62500.model.h5 # Russian
wget https://github.com/vadimkantorov/inferspeech/releases/download/pretrained/jasper10x5_LibriSpeech_nvgrad_masks.h5.part_aa # English, Jasper, part1
wget https://github.com/vadimkantorov/inferspeech/releases/download/pretrained/jasper10x5_LibriSpeech_nvgrad_masks.h5.part_ab # English, Jasper, part2
cat jasper10x5_LibriSpeech_nvgrad_masks.h5.part_aa jasper10x5_LibriSpeech_nvgrad_masks.h5.part_ab > jasper10x5_LibriSpeech_nvgrad_masks.h5

# download and transcribe a wav file (16 kHz)
# should print: my heart doth plead that thou in him doth lie a closet never pierced with crystal eyes but the defendant doth that plea deny and says in him thy fair appearance lies
wget https://github.com/vadimkantorov/inferspeech/releases/download/pretrained/121-123852-0004.wav
python3 speech2text.py --model en_w2l --weights w2l_plus_large_mp.h5 -i 121-123852-0004.wav # use Wav2Letter model
python3 speech2text.py --model en_w2l --weights jasper10x5_LibriSpeech_nvgrad_masks.h5 -i 121-123852-0004.wav # use Jasper model

# transcribe some Russian wav file
python3 speech2text.py --model ru_w2l --weights checkpoint_0010_epoch_01_iter_62500.model.h5 -i some_test.wav

# save the model to ONNX format for inspection with https://lutzroeder.github.io/netron/
python3 speech2text.py --model en_w2l --weights w2l_plus_large_mp.h5 --onnx w2l_plus_large_mp.onnx

# save the model to TensorFlow.js format
python3 speech2text.py --model en_w2l --weights w2l_plus_large_mp.h5 --tfjs w2l_plus_large_mp.tfjs

Browser demo with TensorFlow.js (work in progress)

# download and extract the exported tfjs model
wget https://github.com/vadimkantorov/inferspeech/releases/download/pretrained/w2l_plus_large_mp.tfjs.tar.gz
tar -xf w2l_plus_large_mp.tfjs.tar.gz

# serve the tfjs model and demo.html file
python3 -m http.server

# open the demo at http://localhost:8000/demo.html and transcribe the test file 121-123852-0004.wav

About

PyTorch speech2text inference script for the NVidia openseq2seq wav2letter model variant

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published