PyTorch implementation for VocBench framework.
[arXiv]
- Python >= 3.6
- Get VocBench code
$ git clone https://github.com/facebookresearch/vocoder-benchmark.git
$ cd vocoder-benchmark
- Install dependencies
$ python3 -m venv vocbench
# activate the virtualenv
$ source vocbench/bin/activate
# Upgrade pip
$ python -m pip install --upgrade pip
# Install dependences
$ pip install -e .
- To use VocBench cli, make sure to set paths in your
.bashrc
or.bash_profile
appropriately.
VOCODER_BENCHMARK=/path/to/vocoder-benchmark
export PATH=$VOCODER_BENCHMARK/bin:$PATH
- Change the binary file permission and test your installation
$ chomd +x $VOCODER_BENCHMARK/bin/vocoder
$ vocoder --help
Usage: cli.py [OPTIONS] COMMAND [ARGS]...
Vocoder benchmarking CLI.
Options:
--help Show this message and exit.
Commands:
dataset Dataset processing.
diffwave Create, train, or use diffwave models.
parallel_wavegan Create, train, or use parallel_wavegan models.
wavegrad Create, train, or use wavegrad models.
wavenet Create, train, or use wavenet models.
wavernn Create, train, or use wavernn models.
$ vocoder dataset --help # For more information on how to download/split dataset
# e.g. download and split LJ Speech
$ vocoder dataset download --dataset ljspeech --path ~/local/datasets/lj # Download and unzip dataset files
$ vocoder dataset split --dataset ljspeech --path ~/local/datasets/lj # Create train / validation / test splits
$ vocoder [model-cmd] train --help
# e.g. train wavenet on LJ Speech dataset
$ vocoder wavenet train --path ~/local/models/wavenet --dataset ~/local/datasets/lj --config $VOCODER_BENCHMARK/config/wavenet_mulaw_normal.yaml
*For MelGAN and Parallel WaveGAN, they both use the same model cmd. You will need to choose the right configuration for each of them
# MelGAN
$ vocoder parallel_wavegan train --path ~/local/models/melgan --dataset ~/local/datasets/lj --config $VOCODER_BENCHMARK/config/melgan.v1.yaml
# Parallel WaveGAN
$ vocoder parallel_wavegan train --path ~/local/models/parallel_wavegan --dataset ~/local/datasets/lj --config $VOCODER_BENCHMARK/config/parallel_wavegan.yaml
Example of configuration files for each model is provided under config
directory.
$ vocoder [model-cmd] synthesize --help
Usage: cli.py [model-cmd] synthesize [OPTIONS] INPUT_FILE OUTPUT_FILE
Synthesize with the model.
Options:
--path TEXT Directory for the model [required]
--length TEXT The length of the output sample in seconds
--offset FLOAT Offset in seconds of the sample
--help Show this message and exit.
$ vocoder [model-cmd] evaluate --help
Usage: cli.py [model-cmd] evaluate [OPTIONS]
Evaluate a given vocoder.
Options:
--path TEXT Directory for the model [required]
--dataset TEXT Name of the dataset to use [required]
--checkpoint TEXT Checkpoint path (default: load latest checkpoint)
--help Show this message and exit.
*Frechet Audio Distance is currently not implemented. We use Google Research opensource repository to get FAD results.
- Pytorch, Pytorch.
- Audio, Pytorch.
- FAD, Google Research.
- WaveNet, Ryuichi Yamamoto.
- Parallel WaveGAN, Tomoki Hayashi.
- WaveGrad, Ivan Vovk.
- DiffWave, LMNT.
- Flops counter, Vladislav Sovrasov.
The majority of VocBench is licensed under CC-BY-NC, however portions of the project are available under separate license terms: Wavenet, ParallelWaveGAN, and flops counter are licensed under the MIT license; diffwave is licensed under the Apache 2.0 license; WaveGrad is licensed under the BSD-3 license.
List of papers that used our work (Feel free to add your own paper by making a pull request)