Skip to content

Official implementation of the source-filter HiFiGAN vocoder

License

Notifications You must be signed in to change notification settings

isletennos/SiFiGAN

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Source-Filter HiFi-GAN (SiFi-GAN)

This repo provides official PyTorch implementation of SiFi-GAN, a fast and pitch controllable high-fidelity neural vocoder.
For more information, please see our DEMO.

Environment setup

$ cd SiFiGAN
$ pip install -e .

Please refer to the Parallel WaveGAN repo for more details.

Folder architecture

  • egs: The folder for projects.
  • egs/namine_ritsu: The folder of the Namine Ritsu project example.
  • sifigan: The folder of the source codes.

The dataset preparation of Namine Ritsu database is based on NNSVS. Please refer to it for the procedure and details.

Run

In this repo, hyperparameters are managed using Hydra.
Hydra provides an easy way to dynamically create a hierarchical configuration by composition and override it through config files and the command line.

Dataset preparation

Make dataset and scp files denoting paths to each audio files according to your own dataset (e.g., egs/namine_ritsu/data/scp/namine_ritsu.scp).
List files denoting paths to the extracted features are automatically created in the next step (e.g., egs/namine_ritsu/data/scp/namine_ritsu.list).
Note that scp/list files for training/validation/evaluation are needed.

Preprocessing

# Move to the project directory
$ cd egs/namine_ritsu

# Extract acoustic features (F0, mel-cepstrum, and etc.)
# You can customize parameters according to sifigan/bin/config/extract_features.yaml
$ sifigan-extract-features audio=data/scp/namine_ritsu_all.scp

# Compute statistics of training data
$ sifigan-compute-statistics feats=data/scp/namine_ritsu_train.list stats=data/stats/namine_ritsu_train.joblib

Training

# Train a model customizing the hyperparameters as you like
$ sifigan-train generator=sifigan discriminator=univnet train=sifigan data=namine_ritsu out_dir=exp/sifigan

Inference

# Decode with several F0 scaling factors
$ sifigan-decode generator=sifigan data=namine_ritsu out_dir=exp/sifigan checkpoint_steps=400000 f0_factors=[0.5,1.0,2.0]

Analysis-Synthesis

# WORLD analysis + Neural vocoder synthesis
$ sifigan-anasyn generator=sifigan in_dir=your_own_input_wav_dir out_dir=your_own_output_wav_dir stats=pretrained_sifigan/namine_ritsu_train_no_dev.joblib checkpoint_path=pretrained_sifigan/checkpoint-400000steps.pkl f0_factor=1.0

I provide a pretrained SiFiGAN model HERE which is trained on the Namine Ritsu corpus in the same training manner described in the paper. You can download and place it in your own directory. Then set the appropriate path to the pretrained model and the command should work.

However, since the Namine Ritsu corpus includes a single female Japanese singer, there is a possibility that the model would not work well especially for make singers. I am planning to publish another pretrained model trained on larger dataset including many speakers.

Monitor training progress

$ tensorboard --logdir exp

Citation

If you find the code is helpful, please cite the following article.

@misc{https://doi.org/10.48550/arxiv.2210.15533,
    author = {Reo Yoneyama and Yi-Chiao Wu and Tomoki Toda},
    title = {{Source-Filter HiFi-GAN: Fast and Pitch Controllable High-Fidelity Neural Vocoder}},
    year = {2022},
    publisher = {arXiv},
    url = {https://arxiv.org/abs/2210.15533},
    doi = {10.48550/ARXIV.2210.15533},
    copyright = {arXiv.org perpetual, non-exclusive license}
}

Authors

Development: Reo Yoneyama @ Nagoya University, Japan
E-mail: yoneyama.reo@g.sp.m.is.nagoya-u.ac.jp

Advisors:
Yi-Chiao Wu @ Meta Reality Labs Research, USA
E-mail: yichiaowu@fb.com
Tomoki Toda @ Nagoya University, Japan
E-mail: tomoki@icts.nagoya-u.ac.jp

About

Official implementation of the source-filter HiFiGAN vocoder

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%