This repository has been archived by the owner on Oct 25, 2024. It is now read-only.
fix ordinals and conjunctions in tts normalizer #341
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Type of Change
Bug fix
Description
fix ordinals and conjunctions in tts normalizer
correctly handle following:
CVPR-15 => cee vee pee ar fifteen
1st 2nd 3rd 4th 5th 11th 12th 21st 22nd => first second third fourth fifth eleventh twelfth twenty first twenty second
Expected Behavior & Potential Risk
Make the normalizer more robust.
There still are some words such as
i7, ffmpeg, BTW
not spelled correctly and should be hardcoded maybe in an advanced Trie. Also, the potention number is only treated as a year when it has prepositions in front of it and between (1000,2999), which still is not an absolute mapping (e.g. inTom's 1986 report indicated that...
, 1986 should be a year but with no prepositions it is converted as cardinal number).How has this PR been tested?
add two UTs
Dependency Change?
re should be built-in Python package