StarGAN-VC

This repository provides an official PyTorch implementation for StarGAN-VC.

StarGAN-VC is a nonparallel many-to-many voice conversion (VC) method using star generative adversarial networks (StarGAN). The current version performs VC by first modifying the mel-spectrogram of input speech of an arbitrary speaker in accordance with a target speaker index, and then generating a waveform using a speaker-independent neural vocoder (HiFi-GAN or Parallel WaveGAN) from the modified mel-spectrogram.

Audio samples are available here.

Papers

Hirokazu Kameoka, Takuhiro Kaneko, Kou Tanaka, and Nobukatsu Hojo, "StarGAN-VC: Non-parallel many-to-many voice conversion using star generative adversarial networks," in Proc. 2018 IEEE Workshop on Spoken Language Technology (SLT 2018), pp. 266-273, Dec. 2018. [Paper]
Hirokazu Kameoka, Takuhiro Kaneko, Kou Tanaka, and Nobukatsu Hojo, "Nonparallel Voice Conversion With Augmented Classifier Star Generative Adversarial Networks" IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 28, pp. 2982-2995, 2020. [Paper]

Preparation

Requirements

See requirements.txt.

Dataset

Setup your training and test sets. The data structure should look like:

/path/to/dataset/training
├── spk_1
│   ├── utt1.wav
│   ...
├── spk_2
│   ├── utt1.wav
│   ...
└── spk_N
    ├── utt1.wav
    ...
    
/path/to/dataset/test
├── spk_1
│   ├── utt1.wav
│   ...
├── spk_2
│   ├── utt1.wav
│   ...
└── spk_N
    ├── utt1.wav
    ...

Waveform generator

Place a copy of the directory parallel_wavegan from https://github.com/kan-bayashi/ParallelWaveGAN in pwg/.
HiFi-GAN models trained on several databases can be found here. Once these are downloaded, place them in pwg/egs/. Please contact me if you have any problems downloading.
Optionally, Parallel WaveGAN can be used instead for waveform generation. The trained models are available here. Once these are downloaded, place them in pwg/egs/.

Main

Train

To run all stages for model training, execute:

./recipes/run_train.sh [-g gpu] [-a arch_type] [-l loss_type] [-s stage] [-e exp_name]

Options:

-g: GPU device (default: -1)
#    -1 indicates CPU
-a: Generator architecture type ("conv" or "rnn")
#    conv: 1D fully convolutional network (default)
#    rnn: Bidirectional long short-term memory network
-l: Loss type ("cgan", "wgan", or "lsgan")
#    cgan: Cross-entropy GAN
#    wgan: Wasserstein GAN with the gradient penalty loss (default)
#    lsgan: Least squares GAN
-s: Stage to start (0 or 1)
#    Stages 0 and 1 correspond to feature extraction and model training, respectively.
-e: Experiment name (default: "conv_wgan_exp1")
#    This name will be used at test time to specify which trained model to load.

Examples:

# To run the training from scratch with the default settings:
./recipes/run_train.sh

# To skip the feature extraction stage:
./recipes/run_train.sh -s 1

# To set the gpu device to, say, 0:
./recipes/run_train.sh -g 0

# To use a generator with a recurrent architecture:
./recipes/run_train.sh -a rnn -e rnn_wgan_exp1

# To use the cross-entropy adversarial loss:
./recipes/run_train.sh -l cgan -e conv_cgan_exp1

# To use the least-squares adversarial loss:
./recipes/run_train.sh -l lsgan -e conv_lsgan_exp1

See other scripts in recipes for examples of training on different datasets.

To monitor the training process, use tensorboard:

tensorboard [--logdir log_path]

Test

To perform conversion, execute:

./recipes/run_test.sh [-g gpu] [-e exp_name] [-c checkpoint] [-v vocoder_type]

Options:

-g: GPU device (default: -1)
#    -1 indicates CPU
-e: Experiment name (e.g., "conv_wgan_exp1")
-c: Model checkpoint to load (default: 0)
#    0 indicates the newest model
-v: Vocoder type ("hfg" or "pwg")
#    hfg: HifiGAN (default)
#    pwg: Parallel WaveGAN

Examples:

# To perform conversion with the default settings:
./recipes/run_test.sh -g 0 -e conv_wgan_exp1

# To use Parallel WaveGAN as an alternative for waveform generation:
./recipes/run_test.sh -g 0 -e conv_wgan_exp1 -v pwg

Citation

If you find this work useful for your research, please cite our papers.

@INPROCEEDINGS{Kameoka2018SLT_StarGAN-VC,
  author={Hirokazu Kameoka and Takuhiro Kaneko and Kou Tanaka and Nobukatsu Hojo},
  booktitle={Proc. 2018 IEEE Spoken Language Technology Workshop (SLT)}, 
  title={StarGAN-VC: Non-parallel Many-to-Many Voice Conversion Using Star Generative Adversarial Networks}, 
  year={2018},
  pages={266--273}}
@Article{Kameoka2020IEEETrans_StarGAN-VC,
  author={Hirokazu Kameoka and Takuhiro Kaneko and Kou Tanaka and Nobukatsu Hojo},
  title={Nonparallel Voice Conversion With Augmented Classifier Star Generative Adversarial Networks},
  journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
  volume={28},
  pages={2982--2995},
  year={2020}}

Author

Hirokazu Kameoka (@kamepong)

E-mail: kame.hirokazu@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
pwg		pwg
recipes		recipes
README.md		README.md
compute_statistics.py		compute_statistics.py
convert.py		convert.py
dataset.py		dataset.py
extract_features.py		extract_features.py
module.py		module.py
net.py		net.py
normalize_features.py		normalize_features.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

StarGAN-VC

Papers

Preparation

Requirements

Dataset

Waveform generator

Main

Train

Test

Citation

Author

About

Releases

Packages

Languages

kamepong/StarGAN-VC

Folders and files

Latest commit

History

Repository files navigation

StarGAN-VC

Papers

Preparation

Requirements

Dataset

Waveform generator

Main

Train

Test

Citation

Author

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages