Train in one language and synthesize in English #211

lucasgris · 2024-02-27T00:38:34Z

lucasgris
Feb 27, 2024

Is it possible to train using one language and use the model to synthesize English? I have a dataset in my language, but when I trained the model, it outputs with an undesirable accent.

rlenain · 2024-02-28T10:01:38Z

rlenain
Feb 28, 2024

Hi -- yes, you can. Typically when I train StyleTTS2 to synthesise another language, I keep a bit of English in the dataset, just to make sure I retain the English speaking capability.

The other thing is that the reference you use often matters. If you are using a reference from a (for example) Spanish speaker and try to synthesise English, then you might run into an issue where you get a Spanish accent.

If none of this works, you can try to use this multilingual PL-BERT model on Huggingface - I just open-sourced it. https://huggingface.co/papercup-ai/multilingual-pl-bert

1 reply

ara-vardanyan Apr 16, 2024

Hi! I have a use case that requires a synthesis with multiple languages (such as Spanish and English in the same sentence). Do you have any advice on using multilingual PL-BERT to fine tune for this use case?

lucasgris · 2024-02-28T12:09:36Z

lucasgris
Feb 28, 2024
Author

Got it, thanks for your reply! I was imagining that mixing the languages might be a solution.

I didn't pay attention to the reference, though. I have this problem that I don't have the reference in English, so maybe I'll think how can I get to work in this case. Do you have any ideas?

Also, good work with the PL Bert. It will definitely help me and others!

0 replies

rlenain · 2024-02-28T14:21:07Z

rlenain
Feb 28, 2024

You can maybe use RVC (or any voice conversion model) to convert an english reference to the target speaker you are trying to synthesise, and then use the RVC output as the reference for styletts2

0 replies

ADD-eNavarro · 2024-05-22T08:40:02Z

ADD-eNavarro
May 22, 2024

May I suggest using Hexgrad? You can send in a reference audio in any language, with English text, and since it's the standard (English only) model under the tool, you will get non-accent English audio.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train in one language and synthesize in English #211

{{title}}

Replies: 4 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Train in one language and synthesize in English #211

lucasgris Feb 27, 2024

Replies: 4 comments · 1 reply

rlenain Feb 28, 2024

ara-vardanyan Apr 16, 2024

lucasgris Feb 28, 2024 Author

rlenain Feb 28, 2024

ADD-eNavarro May 22, 2024

lucasgris
Feb 27, 2024

Replies: 4 comments 1 reply

rlenain
Feb 28, 2024

lucasgris
Feb 28, 2024
Author

rlenain
Feb 28, 2024

ADD-eNavarro
May 22, 2024