Replies: 4 comments 1 reply
-
Hi -- yes, you can. Typically when I train StyleTTS2 to synthesise another language, I keep a bit of English in the dataset, just to make sure I retain the English speaking capability. The other thing is that the reference you use often matters. If you are using a reference from a (for example) Spanish speaker and try to synthesise English, then you might run into an issue where you get a Spanish accent. If none of this works, you can try to use this multilingual PL-BERT model on Huggingface - I just open-sourced it. https://huggingface.co/papercup-ai/multilingual-pl-bert |
Beta Was this translation helpful? Give feedback.
-
Got it, thanks for your reply! I was imagining that mixing the languages might be a solution. I didn't pay attention to the reference, though. I have this problem that I don't have the reference in English, so maybe I'll think how can I get to work in this case. Do you have any ideas? Also, good work with the PL Bert. It will definitely help me and others! |
Beta Was this translation helpful? Give feedback.
-
You can maybe use RVC (or any voice conversion model) to convert an english reference to the target speaker you are trying to synthesise, and then use the RVC output as the reference for styletts2 |
Beta Was this translation helpful? Give feedback.
-
May I suggest using Hexgrad? You can send in a reference audio in any language, with English text, and since it's the standard (English only) model under the tool, you will get non-accent English audio. |
Beta Was this translation helpful? Give feedback.
-
Is it possible to train using one language and use the model to synthesize English? I have a dataset in my language, but when I trained the model, it outputs with an undesirable accent.
Beta Was this translation helpful? Give feedback.
All reactions