-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add YourTTS VCTK recipe #2198
Add YourTTS VCTK recipe #2198
Conversation
978fe08
to
c63c073
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this, I think adding a script that downloads and prep VCTK, and also compute the d_vectors could help make this more useful to the users.
+1 for @WeberJulian's comment. You can use the downloader from https://github.com/coqui-ai/TTS/blob/dev/TTS/utils/downloaders.py |
93bf6cf
to
f771cd1
Compare
Done I automatically resampled the audio and computed the speaker embeddings on the recipe :). In addition, I added all the useful parameters to enable multilingual training and Speaker Consistency Loss (SCL) like the paper. I guess after this recipe we will not have too many open issues about YourTTS anymore :). |
@erogol Do you have any idea why the text unit test is broken? I did change nothing that affect this part of the code. |
4d4c96d
to
8ca3e78
Compare
8ca3e78
to
a066e14
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, now reproducing YourTTS is only a single line away :)
Running this code with Full log:
Also When I run:
to test then I get
Some how the interference cannot read speaker embeddings Here is my config:
It might be because of a typo on line #114:- Lines 110 to 120 in 9e5a469
Where it should be speakers_file instead of speaker_file ?
Also, After disabling
Guys @erogol @Edresson |
Btw here is my speakers file, it seems that it has d vectors of all speakers in it |
Also when restoring from |
2 hacks I made in codebase to made it work: Lines 110 to 120 in 9e5a469
Second, I changed TTS/TTS/tts/models/base_tts.py Lines 415 to 426 in a9167cf
But there could be a better way for the second hack? |
Created PR for the fix: #2234 |
Hi @iamkhalidbashir, |
Sure I'll create a new issue |
* Add YourTTS VCTK recipe * Fix lint * Add compute_embeddings and resample_files functions to be able to reuse it * Add automatic download and speaker embedding computation for YourTTS VCTK recipe * Add parameter for eval metadata file on compute embeddings function
No description provided.