diff between minnan and mandarin #9

yt605155624 · 2022-08-11T06:32:55Z

头发

minnan: 头发（fa3）
mandarin: 头发（fa4）

拥抱

minnan: 拥（yong3）抱
mandarin: 拥（yong1）抱

.. maybe there are many cases..

Although this tool can handle polyphony words well, it is wrong for some common Mandarin pronunciation, maybe for mandarin users, we can use pypinyin to get partial_results in prepare_data ?

we can first replace the non polyphone chars:

but "拥" is polyphone, I need to find another way to solve it

擁 ㄩㄥ3
擁 ㄩㄥ1

maybe we have to modify POLYPHONIC_CHARS.txt refer to this https://www.zhihu.com/question/31151037

The text was updated successfully, but these errors were encountered:

GitYCC · 2022-08-11T08:17:40Z

Yes, there has some difference between Taiwan Mandarin and Chinese Mandarin.
So, in order to use g2p in Taiwan, we collect and annotate the training data for the situation. Hence, this model is trained that dataset.

Some suggests for you to handle this problem:

For monophonic characters, you can revise the dictionary
Train g2pW on high qaulity Chinese Mandarin dataset (maybe?

lucasjinreal · 2022-09-28T08:33:26Z

@GitYCC Does there any plan to traing a Chinese Mandarin version as well?

yt605155624 mentioned this issue Aug 11, 2022

Add g2pW to Chinese frontend PaddlePaddle/PaddleSpeech#2230

Merged

GitYCC mentioned this issue Aug 22, 2022

Develop #13

Merged

GitYCC closed this as completed in #13 Aug 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

diff between minnan and mandarin #9

diff between minnan and mandarin #9

yt605155624 commented Aug 11, 2022 •

edited

Loading

GitYCC commented Aug 11, 2022

lucasjinreal commented Sep 28, 2022

diff between minnan and mandarin #9

diff between minnan and mandarin #9

Comments

yt605155624 commented Aug 11, 2022 • edited Loading

GitYCC commented Aug 11, 2022

lucasjinreal commented Sep 28, 2022

yt605155624 commented Aug 11, 2022 •

edited

Loading