-
-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transliteration error #3
Comments
Thanks for your report. I'll fixed it later. |
@mozillazg I'm curious to see how you do it, looking at x030.go I see that 0x3060 is "da" so I'm not understanding how it becomes ta to being with. |
@zfLQ2qx2 I can't reproduce the issue:
Let me know if anything was missed. |
@mozillazg Looks like the difference is that I'm normalizing the string to fully decomposed form using golang.org/x/text/transform and calling transform.Chain(norm.NFD) prior to transliterating with go-unidecode. Before Hex: e38197e381a6e3818fe381a0e38195e38184 After Hex: e38197e381a6e3818fe3819fe38299e38195e38184 So looks like the normalization process changes 0x3060 to 0x305F plus 0x3099 (which is "combining katakana-hiragana voiced sound mark") and gets transliterated to "ta" and "" respectively. Ok, so now I understand where "ta" is coming from, so it looks like the workaround is to normalize to the fully composed form instead of decomposed form. I chose the fully decomposed form because I was trying to match the output of a nodejs function, but honestly there are several test cases for that which are kind of dubious, so I think using the fully composed form and then updating the test cases to match is the way to go. Apologies for having bothered you with this, but was interesting to work out. |
Thank you very much for the wonderful library, I am glad it is here!
I did come across one transliteration issue - I believe してください should be shitekudasai instead of shitekutasai. I tried a number of Hiragana-Romaji converters which used number of different methods and all choose "da" instead of "ta" for the fourth syllable.
The text was updated successfully, but these errors were encountered: