Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add alphas to control language and speaker balancer #1216

Merged
merged 11 commits into from
Mar 10, 2022
Merged

Conversation

Edresson
Copy link
Contributor

@Edresson Edresson commented Feb 8, 2022

This PR fixes the issue/suggestion reported in #1185 .

It permits the combination of weights for speaker and language balance. I normalize the language and speaker weights and added an independent alpha for each one.I performed the sum of the weights multiplied by the respective alphas of language and speaker balancer. In this way, we can control the influence of speaker and language in the batch balancer and easily add a new balancer if necessary in the future.

@vince62s
Copy link

vince62s commented Feb 8, 2022

Hi @Edresson . I am not sure to fully understand the logic.
say language_alpha = 10 and speaker_alpha = 1
hos does it behave exactly from a language balancing standpoint, from a speaker balancing standpoint and from a language/speaker standpoint.
thanks

@Edresson
Copy link
Contributor Author

Edresson commented Feb 8, 2022

Hi @Edresson . I am not sure to fully understand the logic. say language_alpha = 10 and speaker_alpha = 1 hos does it behave exactly from a language balancing standpoint, from a speaker balancing standpoint and from a language/speaker standpoint. thanks

In this way, we can control the language and speaker influence. In this case ( language_alpha = 10 and speaker_alpha = 1), language will generate the highest weights and speaker small ones. During the sample selection in a language X, a speaker with few samples will have more probably to be chosen than a speaker with more samples, however, we will keep the language balancer intact (because it has a high alpha). In theory, the alphas can be used to create levels of balancing. Balancers with the highest alphas will have priority (in this case language).

TTS/tts/models/base_tts.py Outdated Show resolved Hide resolved
TTS/tts/utils/speakers.py Outdated Show resolved Hide resolved
tests/data_tests/test_samplers.py Show resolved Hide resolved
@erogol
Copy link
Member

erogol commented Feb 11, 2022

I like the idea of computing weights and then sampling 👍

@erogol
Copy link
Member

erogol commented Feb 11, 2022

I am about to make formatters return List[Dict], I guess your changes are affected by that too. So maybe we wait until I push those changes and rebase this PR.

TTS/tts/models/base_tts.py Show resolved Hide resolved
TTS/tts/models/base_tts.py Show resolved Hide resolved
TTS/tts/models/base_tts.py Show resolved Hide resolved
TTS/utils/samplers.py Outdated Show resolved Hide resolved
@CLAassistant
Copy link

CLAassistant commented Feb 23, 2022

CLA assistant check
All committers have signed the CLA.

@Edresson Edresson force-pushed the dev branch 2 times, most recently from 24c0511 to ee61689 Compare March 7, 2022 19:35
TTS/config/shared_configs.py Outdated Show resolved Hide resolved
@erogol erogol merged commit 917f417 into coqui-ai:dev Mar 10, 2022
@vince62s
Copy link

@Edresson sorry to revive this discussion but the 0.6.2 is about to be merged including this.

as is, let say we have a very abnormal distribution of speakers, eg: a few speakers with very little number of samples. Those will be balanced with a probability which is the same as another speaker with a huge amount of data. Am I correct ?
I think there cold be some edges where some datasets include noisy speakers. I may be wrong though.

@Edresson
Copy link
Contributor Author

Edresson commented Mar 11, 2022

@Edresson sorry to revive this discussion but the 0.6.2 is about to be merged including this.

as is, let say we have a very abnormal distribution of speakers, eg: a few speakers with very little number of samples. Those will be balanced with a probability which is the same as another speaker with a huge amount of data. Am I correct ? I think there cold be some edges where some datasets include noisy speakers. I may be wrong though.

Yeah, the objective of the speaker balancer is that all speakers appear in a similar frequency in the batch. If the user has noisy speakers he needs to remove these speakers from the dataset. Unfortunately, we can't control it :(.

@vince62s
Copy link

Well somehow it is a change in behavior I think vs previous version, so might be good to be specific in the doc.

@Edresson
Copy link
Contributor Author

Edresson commented Mar 11, 2022

Well somehow it is a change in behavior I think vs previous version, so might be good to be specific in the doc.

By default use_speaker_weighted_sampler and use_language_weighted_sampler are disable. So if the user does not enable it, no change exists between the versions :). The only thing that this PR, changes is that currently, we can enable the language weighted sampler and the Speaker weighted sampler together (and use multiples GPUS with these samplers).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants