Add normalizer type C to text cleaners #85

shavit · 2024-09-28T21:12:50Z

There are duplications in the cleaners, should the normalizer be added inside the other cleaners, or be applied to all text?
https://github.com/idiap/coqui-ai-TTS/blob/dev/TTS/tts/utils/text/tokenizer.py#L110

/Closes #63

eginhard

Thank you for the PR and tests, it looks good! I suggest to rename the function to make the name more intuitive. I'd call it at the start of every cleaner, except no_cleaners().
You can run make clean && make lint to make sure your code passes the style check.

eginhard · 2024-09-29T07:55:32Z

TTS/tts/utils/text/cleaners.py

+def normalize_nfc(text: str) -> str:
+    """Canonical decomposition followed by canonical composition"""


Suggested change

def normalize_nfc(text: str) -> str:

"""Canonical decomposition followed by canonical composition"""

def normalize_unicode(text: str) -> str:

"""Normalize Unicode characters."""

TTS/tts/utils/text/cleaners.py

eginhard

Thanks a lot, this is a very useful contribution!

* Add normalizer type C to text cleaners * Linter recommendations * Add unicode normalize to every cleaner * Format test_text_cleaners.py

eginhard requested changes Sep 29, 2024

View reviewed changes

shavit added 2 commits September 30, 2024 10:50

Add normalizer type C to text cleaners

f521211

Linter recommendations

636ea59

shavit force-pushed the 63-normalize branch from 5412923 to 636ea59 Compare September 30, 2024 14:58

shavit commented Sep 30, 2024

View reviewed changes

TTS/tts/utils/text/cleaners.py Show resolved Hide resolved

shavit marked this pull request as ready for review September 30, 2024 15:23

shavit requested a review from eginhard September 30, 2024 15:24

shavit added 2 commits September 30, 2024 14:25

Add unicode normalize to every cleaner

41b0f4c

Format test_text_cleaners.py

8ec5d15

shavit force-pushed the 63-normalize branch from 1576006 to 8ec5d15 Compare September 30, 2024 18:30

eginhard approved these changes Oct 2, 2024

View reviewed changes

eginhard merged commit 36611a7 into idiap:dev Oct 2, 2024
49 checks passed

eginhard pushed a commit that referenced this pull request Oct 4, 2024

feat: normalize unicode characters in text cleaners (#85)

1d39246

* Add normalizer type C to text cleaners * Linter recommendations * Add unicode normalize to every cleaner * Format test_text_cleaners.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add normalizer type C to text cleaners #85

Add normalizer type C to text cleaners #85

shavit commented Sep 28, 2024 •

edited

Loading

eginhard left a comment

eginhard Sep 29, 2024

eginhard left a comment

		def normalize_nfc(text: str) -> str:
		"""Canonical decomposition followed by canonical composition"""

Add normalizer type C to text cleaners #85

Add normalizer type C to text cleaners #85

Conversation

shavit commented Sep 28, 2024 • edited Loading

eginhard left a comment

Choose a reason for hiding this comment

eginhard Sep 29, 2024

Choose a reason for hiding this comment

eginhard left a comment

Choose a reason for hiding this comment

shavit commented Sep 28, 2024 •

edited

Loading