WIP: Support for Wavenet vocoder #21

r9y9 · 2018-01-06T12:44:06Z

Add script to generate training data for wavenet vocoder
Training deepvoice3 for wavenet vocoder
Training WaveNet vocoder
Add a option to use WaveNet vocoder to synthesis.py
Get quality imprevement

nikita-smetanin · 2018-03-03T16:04:49Z

I'm just wondering, what kind of data should I pass to generate_aligned_predictions.py to produce aligned mel-predictions for WaveNet? Should these audio files be preprocessed somehow (as well as mel-spectrograms)?

r9y9 · 2018-03-03T16:24:15Z

This is very WIP so may change in future, but for now I use the following command:

python generate_aligned_predictions.py \
   ./checkpoints_deepvoice3_wavenet/checkpoint_step000770000.pth \  
   ~/Dropbox/sp/wavenet_vocoder/data/ljspeech/ --preset=presets/deepvoice3_ljspeech_wavenet.json \ 
    ~/Dropbox/sp/wavenet_vocoder/data/ljspeech_deepvoice3

You need to pass:

Model checkpoint of DeepVoice3 (or similar)
Mel-spectrograms to be used for generate aligned predictions (inside ~/Dropbox/sp/wavenet_vocoder/data/ljspeech/ in my case). Raw audio is not used to generate predictions, but used to make sure we have correct time resolutions.

deepvoice3_pytorch/generate_aligned_predictions.py

Lines 102 to 103 in 096ed40

# Make sure we have correct lengths

assert mel_output.shape[0] * hparams.hop_size == len(wav)

r9y9 · 2018-03-03T17:14:08Z

Okay, still quite alpha, but seems started to work.

DeepVoice3_wavenet_quite_alpha_770k_for_deepvoice3_6k_for_wavenet.zip

EDIT: Trained WaveNet for 60k steps, starting from pre-trained model r9y9/wavenet_vocoder#19 (comment)

nikita-smetanin · 2018-03-07T09:28:43Z

@r9y9 Yes, thanks, I ran generate_aligned_predictions.py on deepvoice3 ljspeech data, not on wavenet data, so faced with some problems there. Now it's clear.

BTW, do you need any help with DeepVoice3 + WaveNet experiment? I reproduced your steps, but for now, it doesn't sound as good as in Baidu or Google demos (while WaveNet itself sounds very good on mels). So I'm wondering — what is the reason and what should we try to improve that. Do you have any ideas?

r9y9 · 2018-03-08T14:08:48Z

@nsmetanin Yes, I'm happy if you could help. I also haven't got as good results as Google demos. Currently I'm getting very coarse mel-spectrogram predictions with DeepVoice3 but I think we should be able to get sufficient precise mel-spectrogram, otherwise we may end up with noisy speech. I want to try outputs_per_step=1 as mentioned in Tacotron2 but I have an issue with the configuration (#24). Attention encoder/decoder models are tricky to train...

I am planning to try increasing kernel_size, encoder/decoder channels of DeepVoice3 to make the model more expressive.

nikita-smetanin · 2018-03-13T19:42:09Z

Also, there are parameters that should match both for DeepVoice3 output and WaveNet input, like preemphasis value, rescaling, and others. It wasn't clearly stated in those articles, what should we use, so I just want to try some combinations.

For example, if you trained WaveNet with rescaling=True and trying to put predictions of DeepVoice3 which was trained with rescaling=False, it will sound awful. Disabling preemphasis makes DeepVoice3 itself sound much worse, so that could a problem too. I want to try enabling preemphasis for mels both for DV3 and WV, and train WV to produce raw audio from mels with preemphasis.

fixes a bug for r > 1

ilyalasy · 2020-05-07T08:29:18Z

Sorry, cant actually get what generate_aligned_predictions.py does. Can you clarify a bit?
Do I need to train wavenet from original mels generated by wavenet preprocess?
Or I can use mels generated by deepvoice preprocess?
In case I need to use wavenet's preprocess what parameters should I copy so they will be the same as in deepvoice?
P.S. I'm trying to train both models on my own dataset (not english).
P.P.S. Sorry for silly questions :D

This was referenced Jan 6, 2018

Planned TODOs r9y9/wavenet_vocoder#1

Closed

Attention doesn't work well for downsample_step=1 and outputs_per_step=1 #24

Open

r9y9 force-pushed the wavenet-support branch from 86ccd4b to 050853e Compare January 13, 2018 18:28

r9y9 force-pushed the wavenet-support branch 2 times, most recently from 2431887 to 46d13ac Compare January 22, 2018 16:14

r9y9 force-pushed the wavenet-support branch from 46d13ac to 7b76738 Compare January 31, 2018 12:34

This was referenced Feb 6, 2018

Ask about Phoneme Segmentation and Phoneme Duration r9y9/wavenet_vocoder#12

Closed

Tacotron 2 #11

Closed

r9y9 force-pushed the wavenet-support branch from 7b76738 to c125fec Compare February 12, 2018 03:26

r9y9 force-pushed the wavenet-support branch 3 times, most recently from 5f15e35 to dd71473 Compare March 3, 2018 04:48

r9y9 force-pushed the wavenet-support branch from dbd729e to 2398cee Compare March 3, 2018 15:55

r9y9 mentioned this pull request Mar 13, 2018

Why lws and not regular stft? r9y9/wavenet_vocoder#35

Closed

r9y9 force-pushed the wavenet-support branch from fe86932 to 3226e41 Compare April 11, 2018 13:42

r9y9 added 9 commits May 1, 2018 13:32

Add script to generate training data for wavenet vocoder

4e20829

fix typo

ba1ecfd

hparams option for preprocess

b458377

ignore my local files

3b4fa64

Fixtypo

8360683

fix for RuntimeError

b54c279

Cleanup generate aligned features

a04941b

fixes a bug for r > 1

try this

d16e345

Fix for master

e40d3cb

r9y9 added 2 commits May 1, 2018 13:32

preset parameters for DeepVoice3 + WaveNet

ae3e8ae

cleanup

672e249

r9y9 force-pushed the wavenet-support branch from 3226e41 to 672e249 Compare May 1, 2018 04:32

WIP

260d50d

nikita-smetanin mentioned this pull request May 10, 2018

Taking Tacotron2 output to wavenet vocoder r9y9/wavenet_vocoder#30

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Support for Wavenet vocoder #21

WIP: Support for Wavenet vocoder #21

r9y9 commented Jan 6, 2018 •

edited

Loading

nikita-smetanin commented Mar 3, 2018

r9y9 commented Mar 3, 2018

r9y9 commented Mar 3, 2018 •

edited

Loading

nikita-smetanin commented Mar 7, 2018

r9y9 commented Mar 8, 2018

nikita-smetanin commented Mar 13, 2018

ilyalasy commented May 7, 2020

WIP: Support for Wavenet vocoder #21

Are you sure you want to change the base?

WIP: Support for Wavenet vocoder #21

Conversation

r9y9 commented Jan 6, 2018 • edited Loading

nikita-smetanin commented Mar 3, 2018

r9y9 commented Mar 3, 2018

r9y9 commented Mar 3, 2018 • edited Loading

nikita-smetanin commented Mar 7, 2018

r9y9 commented Mar 8, 2018

nikita-smetanin commented Mar 13, 2018

ilyalasy commented May 7, 2020

r9y9 commented Jan 6, 2018 •

edited

Loading

r9y9 commented Mar 3, 2018 •

edited

Loading