GitHub - snykral/VQMIVC: Adapted VQMIVC for a different dataset

VQMIVC Forked Repo, Adapting to other Dataset CHANGES WERE NOT COMMITED YET, DO NOT USE, AS IT WON'T WORK

Info From the original Authors:

one-shot/any-to-any voice conversion, which performs conversion across arbitrary speakers with only a single target-speaker utterance for reference. Vector quantization with contrastive predictive coding (VQCPC) is used for content encoding and mutual information (MI) is introduced as the correlation metric during training, to achieve proper disentanglement of content, speaker and pitch representations, by reducing their inter-dependencies in an unsupervised manner.

Requirements

Python 3.6 is used, install apex for speeding up training (optional), other requirements are listed in 'requirements.txt':

pip install -r requirements.txt

Quick start with pre-trained models

Don't use the Parallel WaveGan from it's github, instead:

pip install parallel_wavegan

Download the checkpoints from VQMIVC pre-trained models:

Then, make a conversion

python convert_example.py -s {source-wav} -r {reference-wav} -c {converted-wavs-save-path} -m {model-path}

For example:

python convert.py

The converted wav is put in 'converted' directory.

Training:

Step1. Data preparation & preprocessing.

Put the dataset under directory: 'Dataset/'
Training/testing speakers split & feature (mel+lf0) extraction:

Here, a new code pre.py was added in order to replace preprocess.py. There was an error that due to the dataset size, numpy arrays couldn't be loaded into RAM, so lines from 141 to 145 were modified in order to work with only a portion of the wavs; they are about calculating the mean and std of the mels spectrograms in order to normalize the data. Also, the wavs are globbed from the dataset was changed. You may still need to adapt the glob logic in order to adapt to your dataset.

python pre.py

Step2. model training:

python train.py use_CSMI=True use_CPMI=True use_PSMI=True

Training was adapted to fine tune from the VCTK checkpoint, so download the checkpoint from the original paper and then change the checkpoint path at config/convert.yaml. Also, full paths are used in this training, so you will need to change the paths at config/train.yaml too.

Hydra Problems Found:

2 problems were find while trying to use convert.py, so the made changes are:

Line 70 from the original code @hydra.main(config_path="config/convert.yaml") was changed to @hydra.main(config_path="config", config_name='convert')
Somehow, one of the packages are making the code lose the tracking of it's own path, so full paths are used instead of relative paths.

Vocoder

The utilized vocoder was the hifi-gan with adjustments and a checkpoint for pt_br from the following directory: https://github.com/freds0/hifi-gan The python file used to call the vocoder was: inference_e2e.py Changes made on inference_e2e on function inference()

Changed os.listdir to glob
Added a permute(1, 0) and unsqueeze(0) to match the model shape.
Used string .split() function instead of os.path.splitext

Citation From the Original Authors

If the code is used in your research, please Star our repo and cite our paper:

@inproceedings{wang21n_interspeech,
  author={Disong Wang and Liqun Deng and Yu Ting Yeung and Xiao Chen and Xunying Liu and Helen Meng},
  title={{VQMIVC: Vector Quantization and Mutual Information-Based Unsupervised Speech Representation Disentanglement for One-Shot Voice Conversion}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={1344--1348},
  doi={10.21437/Interspeech.2021-283}
}

Name		Name	Last commit message	Last commit date
Latest commit History 119 Commits
Dataset		Dataset
ParallelWaveGAN		ParallelWaveGAN
config		config
diagram		diagram
mel_stats		mel_stats
test_wavs		test_wavs
vocoder		vocoder
LICENSE		LICENSE
README.md		README.md
cog.yaml		cog.yaml
convert.py		convert.py
convert_example.py		convert_example.py
dataset.py		dataset.py
mi_estimators.py		mi_estimators.py
model_decoder.py		model_decoder.py
model_encoder.py		model_encoder.py
predict.py		predict.py
preprocess.py		preprocess.py
requirements.txt		requirements.txt
scheduler.py		scheduler.py
spectrogram.py		spectrogram.py
testing speakers.txt		testing speakers.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VQMIVC Forked Repo, Adapting to other Dataset CHANGES WERE NOT COMMITED YET, DO NOT USE, AS IT WON'T WORK

Info From the original Authors:

Requirements

Quick start with pre-trained models

Training:

Hydra Problems Found:

Vocoder

Citation From the Original Authors

About

Releases

Packages

Languages

License

snykral/VQMIVC

Folders and files

Latest commit

History

Repository files navigation

VQMIVC Forked Repo, Adapting to other Dataset CHANGES WERE NOT COMMITED YET, DO NOT USE, AS IT WON'T WORK

Info From the original Authors:

Requirements

Quick start with pre-trained models

Training:

Hydra Problems Found:

Vocoder

Citation From the Original Authors

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages