Skip to content
This repository has been archived by the owner on Oct 25, 2024. It is now read-only.

fix ordinals and conjunctions in tts normalizer #341

Merged
merged 2 commits into from
Sep 20, 2023
Merged

Conversation

Spycsh
Copy link
Contributor

@Spycsh Spycsh commented Sep 19, 2023

Type of Change

Bug fix

Description

fix ordinals and conjunctions in tts normalizer

correctly handle following:

  • conjunctions
    CVPR-15 => cee vee pee ar fifteen
  • ordinals
    1st 2nd 3rd 4th 5th 11th 12th 21st 22nd => first second third fourth fifth eleventh twelfth twenty first twenty second

Expected Behavior & Potential Risk

Make the normalizer more robust.

There still are some words such as i7, ffmpeg, BTW not spelled correctly and should be hardcoded maybe in an advanced Trie. Also, the potention number is only treated as a year when it has prepositions in front of it and between (1000,2999), which still is not an absolute mapping (e.g. in Tom's 1986 report indicated that..., 1986 should be a year but with no prepositions it is converted as cardinal number).

How has this PR been tested?

add two UTs

Dependency Change?

re should be built-in Python package

@hshen14 hshen14 merged commit 0892f8a into main Sep 20, 2023
14 checks passed
@hshen14 hshen14 deleted the spycsh/fix_normalizer branch September 20, 2023 03:15
lvliang-intel pushed a commit that referenced this pull request Sep 20, 2023
* fix ordinals and conjunctions in tts normalizer

* fix comment
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants