-
Notifications
You must be signed in to change notification settings - Fork 487
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Support for Wavenet vocoder #21
base: master
Are you sure you want to change the base?
Conversation
2431887
to
46d13ac
Compare
7b76738
to
c125fec
Compare
5f15e35
to
dd71473
Compare
I'm just wondering, what kind of data should I pass to |
This is very WIP so may change in future, but for now I use the following command:
You need to pass:
|
Okay, still quite alpha, but seems started to work. DeepVoice3_wavenet_quite_alpha_770k_for_deepvoice3_6k_for_wavenet.zip EDIT: Trained WaveNet for 60k steps, starting from pre-trained model r9y9/wavenet_vocoder#19 (comment) |
@r9y9 Yes, thanks, I ran BTW, do you need any help with DeepVoice3 + WaveNet experiment? I reproduced your steps, but for now, it doesn't sound as good as in Baidu or Google demos (while WaveNet itself sounds very good on mels). So I'm wondering — what is the reason and what should we try to improve that. Do you have any ideas? |
@nsmetanin Yes, I'm happy if you could help. I also haven't got as good results as Google demos. Currently I'm getting very coarse mel-spectrogram predictions with DeepVoice3 but I think we should be able to get sufficient precise mel-spectrogram, otherwise we may end up with noisy speech. I want to try I am planning to try increasing kernel_size, encoder/decoder channels of DeepVoice3 to make the model more expressive. |
Also, there are parameters that should match both for DeepVoice3 output and WaveNet input, like preemphasis value, rescaling, and others. It wasn't clearly stated in those articles, what should we use, so I just want to try some combinations. For example, if you trained WaveNet with rescaling=True and trying to put predictions of DeepVoice3 which was trained with rescaling=False, it will sound awful. Disabling preemphasis makes DeepVoice3 itself sound much worse, so that could a problem too. I want to try enabling preemphasis for mels both for DV3 and WV, and train WV to produce raw audio from mels with preemphasis. |
Sorry, cant actually get what |
ref #11, r9y9/wavenet_vocoder#1