How to fix the voice across generations ? #554

MankaranSingh · 2024-09-15T18:45:32Z

Self Checks

I have searched for existing issues search for existing issues, including closed ones.
I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[FOR CHINESE USERS] 请务必使用英文提交 Issue，否则会被关闭。谢谢！:）
Please do not modify this template :) and fill in all the required fields.

1. Is this request related to a challenge you're experiencing? Tell me about your story.

When generating speech from webui, it samples random voice. How can I fix the generated voice ? I can help with a PR.

2. Additional context or comments

No response

3. Can you help us with this feature?

I am interested in contributing to this feature.

leng-yue · 2024-09-16T01:56:39Z

You can add a reference audio to pin the timbre.

czkoko · 2024-09-16T07:16:37Z

@leng-yue Using reference audio can pin the timbre well, but the speed and pause seem to be random, and reducing the temperature cannot solve the problem.
I hope it can use different punctuation to control the pause time between words and sentences. Sometimes the pauses between sentences are extremely short and unnatural.

leng-yue · 2024-09-16T08:01:01Z

Did you include proper puncs in your reference text?

czkoko · 2024-09-16T12:16:46Z

Did you include proper puncs in your reference text?

Yes, the reference audio use the high-quality natural voice synthesized by Microsoft Speech, and the reference text also uses reasonable punctuation.
Under the premise of using the same input text, the same default parameters and the same reference audio, the voice generated multiple times has the same timbre, but their speed, prosody or sentence pause time will be different.
For example, the following samples:
据澎湃新闻消息，上海受台风影响迎来强风雨天气，当地两大外卖平台及生鲜电商对此表示，已经着手采取各项极端天气应对措施，并为骑手配备雨衣、防水套等装备。
result.zip

leng-yue · 2024-09-16T18:52:07Z

Since it's an auto-regressive model, having different speed / porsody across different generation is an expected behavior, does this cause any issue on your side?

MankaranSingh added the enhancement New feature or request label Sep 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to fix the voice across generations ? #554

How to fix the voice across generations ? #554

MankaranSingh commented Sep 15, 2024

leng-yue commented Sep 16, 2024 •

edited

Loading

czkoko commented Sep 16, 2024

leng-yue commented Sep 16, 2024 •

edited

Loading

czkoko commented Sep 16, 2024

leng-yue commented Sep 16, 2024

How to fix the voice across generations ? #554

How to fix the voice across generations ? #554

Comments

MankaranSingh commented Sep 15, 2024

Self Checks

1. Is this request related to a challenge you're experiencing? Tell me about your story.

2. Additional context or comments

3. Can you help us with this feature?

leng-yue commented Sep 16, 2024 • edited Loading

czkoko commented Sep 16, 2024

leng-yue commented Sep 16, 2024 • edited Loading

czkoko commented Sep 16, 2024

leng-yue commented Sep 16, 2024

leng-yue commented Sep 16, 2024 •

edited

Loading

leng-yue commented Sep 16, 2024 •

edited

Loading