ESPnet Model Zoo

Utilities managing the pretrained models created by ESPnet. This function is inspired by the Asteroid pretrained model function.

Zenodo community: https://zenodo.org/communities/espnet/
Registered models: table.csv

Install

pip install torch
pip install espnet_model_zoo

Python API for inference

See the next section about model_name

ASR

import soundfile
from espnet_model_zoo.downloader import ModelDownloader
from espnet2.bin.asr_inference import Speech2Text
d = ModelDownloader()
speech2text = Speech2Text(
    **d.download_and_unpack("model_name"),
    # Decoding parameters are not included in the model file
    maxlenratio=0.0,
    minlenratio=0.0,
    beam_size=20,
    ctc_weight=0.3,
    lm_weight=0.5,
    penalty=0.0,
    nbest=1
)
# Confirm the sampling rate is equal to that of the training corpus.
# If not, you need to resample the audio data before inputting to speech2text
speech, rate = soundfile.read("speech.wav")
nbests = speech2text(speech)

text, *_ = nbests[0]
print(text)

TTS

import soundfile
from espnet_model_zoo.downloader import ModelDownloader
from espnet2.bin.tts_inference import Text2Speech
d = ModelDownloader()
text2speech = Text2Speech(**d.download_and_unpack("model_name"))

speech, *_ = text2speech("foobar")
soundfile.write("out.wav", speech.numpy(), text2speech.fs, "PCM_16")

Speech separation

import soundfile
from espnet_model_zoo.downloader import ModelDownloader
from espnet2.bin.enh_inference import SeparateSpeech
d = ModelDownloader()
separate_speech = SeparateSpeech(
    **d.download_and_unpack("model_name"),
    # for segment-wise process on long speech
    segment_size=2.4,
    hop_size=0.8,
    normalize_segment_scale=False,
    show_progressbar=True,
    ref_channel=None,
    normalize_output_wav=True,
)
# Confirm the sampling rate is equal to that of the training corpus.
# If not, you need to resample the audio data before inputting to speech2text
speech, rate = soundfile.read("long_speech.wav")
waves = separate_speech(speech[None, ...], fs=rate)

This API allows processing both short audio samples and long audio samples. For long audio samples, you can set the value of arguments segment_size, hop_size (optionally normalize_segment_scale and show_progressbar) to perform segment-wise speech enhancement/separation on the input speech. Note that the segment-wise processing is disabled by default.

Instruction for ModelDownloader

from espnet_model_zoo.downloader import ModelDownloader
d = ModelDownloader("~/.cache/espnet")  # Specify cachedir
d = ModelDownloader()  # <module_dir> is used as cachedir by default

To obtain a model, you need to give a model name, which is listed in table.csv.

>>> d.download_and_unpack("kamo-naoyuki/mini_an4_asr_train_raw_bpe_valid.acc.best")
{"asr_train_config": <config path>, "asr_model_file": <model path>, ...}

Note that if the model already exists, you can skip downloading and unpacking.

You can also get a model with certain conditions.

d.download_and_unpack(task="asr", corpus="wsj")

If multiple models are found with the condition, the last model is selected. You can also specify the condition using "version" option.

d.download_and_unpack(task="asr", corpus="wsj", version=-1)  # Get the last model
d.download_and_unpack(task="asr", corpus="wsj", version=-2)  # Get previous model

You can also obtain it from the URL directly.

d.download_and_unpack("https://zenodo.org/record/...")

If you need to use a local model file using this API, you can also give it.

d.download_and_unpack("./some/where/model.zip")

In this case, the contents are also expanded in the cache directory, but the model is identified by the file path, so if you move the model to somewhere and unpack again, it's treated as another model, thus the contents are expanded again at another place.

Query model names

You can view the model names from our Zenodo community, https://zenodo.org/communities/espnet/, or using query(). All information are written in table.csv.

d.query("name")

You can also show them with specifying certain conditions.

d.query("name", task="asr")

Command line tools

espnet_model_zoo_query

# Query model name
espnet_model_zoo_query task=asr corpus=wsj 
# Show all model name
espnet_model_zoo_query
# Query the other key
espnet_model_zoo_query --key url task=asr corpus=wsj

espnet_model_zoo_download

espnet_model_zoo_download <model_name>  # Print the path of the downloaded file
espnet_model_zoo_download --unpack true <model_name>   # Print the path of unpacked files

espnet_model_zoo_upload

export ACCESS_TOKEN=<access_token>
espnet_zenodo_upload \
    --file <packed_model> \
    --title <title> \
    --description <description> \
    --creator_name <your-git-account>

Use pretrained model in ESPnet recipe

# e.g. ASR WSJ task
git clone https://github.com/espnet/espnet
pip install -e .
cd egs2/wsj/asr1
./run.sh --skip_data_prep false --skip_train true --download_model kamo-naoyuki/wsj

Register your model

Upload your model to Zenodo

You need to signup to Zenodo and create an access token to upload models. You can upload your own model by using espnet_model_zoo_upload command freely, but we normally upload a model using recipes.
Create a Pull Request to modify table.csv

You need to append your record at the last line.
(Administrator does) Increment the third version number of setup.py, e.g. 0.0.3 -> 0.0.4
(Administrator does) Release new version

Update your model

If your model has some troubles, please modify the record at Zenodo directly or reupload a corrected file using espnet_zenodo_upload as another record.

Name		Name	Last commit message	Last commit date
Latest commit History 168 Commits
.github		.github
ci		ci
espnet_model_zoo		espnet_model_zoo
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ESPnet Model Zoo

Install

Python API for inference

ASR

TTS

Speech separation

Instruction for ModelDownloader

Query model names

Command line tools

Use pretrained model in ESPnet recipe

Register your model

Update your model

About

Releases

Packages

Languages

License

eml914/espnet_model_zoo

Folders and files

Latest commit

History

Repository files navigation

ESPnet Model Zoo

Install

Python API for inference

ASR

TTS

Speech separation

Instruction for ModelDownloader

Query model names

Command line tools

Use pretrained model in ESPnet recipe

Register your model

Update your model

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages