Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

diff between minnan and mandarin #9

Closed
yt605155624 opened this issue Aug 11, 2022 · 2 comments · Fixed by #13
Closed

diff between minnan and mandarin #9

yt605155624 opened this issue Aug 11, 2022 · 2 comments · Fixed by #13

Comments

@yt605155624
Copy link

yt605155624 commented Aug 11, 2022

头发

  • minnan: 头发(fa3)
  • mandarin: 头发(fa4)

拥抱

  • minnan: 拥(yong3)抱
  • mandarin: 拥(yong1)抱

.. maybe there are many cases..

Although this tool can handle polyphony words well, it is wrong for some common Mandarin pronunciation, maybe for mandarin users, we can use pypinyin to get partial_results in prepare_data ?

we can first replace the non polyphone chars:
image

but "拥" is polyphone, I need to find another way to solve it

擁 ㄩㄥ3
擁 ㄩㄥ1

maybe we have to modify POLYPHONIC_CHARS.txt refer to this https://www.zhihu.com/question/31151037

@GitYCC
Copy link
Owner

GitYCC commented Aug 11, 2022

Yes, there has some difference between Taiwan Mandarin and Chinese Mandarin.
So, in order to use g2p in Taiwan, we collect and annotate the training data for the situation. Hence, this model is trained that dataset.

Some suggests for you to handle this problem:

  1. For monophonic characters, you can revise the dictionary
  2. Train g2pW on high qaulity Chinese Mandarin dataset (maybe?

@GitYCC GitYCC mentioned this issue Aug 22, 2022
@lucasjinreal
Copy link

@GitYCC Does there any plan to traing a Chinese Mandarin version as well?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants