-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chinese Pīnyīn instead of Chinese characters #49
Comments
To check if I'm understanding the idea:
I should note that showing pinyin in the example sentences is already supported (there's an option in the menu accessible from the upper right), and all definitions include pinyin. I think adding an option for showing pinyin in the graph could be interesting, though, so I can look into that. |
Glad you might look into it! There's a couple of things that probably need some consideration, since suddenly we will be looking at the vernacular, spoken Chinese language. In Hanzi the morpheme (one character) is the smallest meaningful unit, but in the spoken Chinese language it's the 词 (cí) the word, that forms the smallest meaningful unit. The rules of Chinese Pīnyīn are defined in GB/T 16159-2012, I'll attach a Chinese Simplified text version of this document that I made (including the corrections of Mark Swofford / pinyin.info). Concerning your bullet points, and additional remarks: Tone Marks According to the rules in GB/T 16159-2012, Chinese Pīnyīn is written with diacritics, not numbers. Nodes. Yes, a node would display a syllable in Chinese Pīnyīn. How would you plan to deal with ambiguity? For example 行 (xíng / háng). In the Chinese language the majority of words have either 1 or 2 syllables, so there's a good chance that a node is a word. Edges Yes, edges would display 2 syllable words. I think it's viable to just put the two "edges" together, without even changing the diacritics. According to the rules (GB/T 16159-2012) the diacritics are always true to the syllable, they do not change when syllables are combined. However, one exception, sometimes the diacritic on the 2nd syllable is dropped (like in 看看 kànkan) but there is no explicit rule for that. In other cases a missing diacritic on the 2nd syllable forms a new word, for example 东西 can mean dōngxī (East-West) or dōngxi (thing). Examples There's Pin1yin1 with numbers in the examples, but just for the node names, and missing an option to show the correct spelling with diacritics. The example sentences seem to be not written in Chinese Pīnyīn. Unfortunately there's currently no library that implements the rules GB/T 16159-2012 to produce compliant Pīnyīn from Hanzi. Furthermore, even large language models make many mistakes producing Chinese Pīnyīn. I guess there's just not a sufficient volume of Pīnyīn text available (yet) to train models. Using Chinese Pīnyīn would indeed be very interesting to see on HanziGraph, to engage in a research, and to get a feeling for the vernacular, spoken language, as opposed to the more academic, abstract Chinese Simplified writing system. |
yeah, that mockup is what I had imagined would be interesting to build. I'll try to get it prototyped in the next few weeks. |
I did a bit of work on this today and have an initial graph and wordlist in the linked branch (pinyin-graph). I'm debating the best way to display it; it might be a standalone tool like I did with component breakdowns at first. |
For me, as a language learner and user, it makes sense to be a standalone tool. PinyinGraph might turn out to be quite a different tool than HanziGraph. Getting deeper into it, I guess you might eventually take it into a very different direction, due to its very different nature. One design question I'm curious to see how you will solve it is this: whether you put Pinyin merely as an extra representation layer on top of Hanzi, or if you will treat the new tool as truly "written spoken Chinese" and group all syllables (and words) that sound alike (and are spelled alike) together. This comment section might not be the right place, but as it concerns this design choice, and because I've spent considerable time thinking about this, I will share with you my own reasoning: Here's what we know:
What are the implications when we start to write Chinese with similar letters as we use for English?a) The meaning of words in sentences is defined by their "Part of Speech" (POS) such as noun, verb, adverb, adjective, etc.. Within each POS, a word can have various senses.
b) However, in English this doesn't seem to be too overwhelming, especially if the broader context is at least vaguely known. c) In Chinese character-based (Hànzì) writing, many syllables that sound the same but that represent different senses, have been grouped into single characters, which makes looking up written Chinese characters rather straightforward, too. For example:
d) In Chinese Pinyin writing, however, all definitions for words (made of one or more syllables) that sound alike (and are spelled alike) would be found under the same entry, just like in English.
This means that spoken Chinese relies very heavily on context. Chinese Pinyin, the written counterpart of the spoken, standard, Mandarin Chinese Pǔtōnghuà, relies very heavily on context, too. From my perspective, as a language learner and user, it will be very, very interesting to see how a dictionary or lookup / categorization tool like PinyinGraph will turn out to work and look like, if we have the courage to look at spoken Chinese through a Chinese writing system like Chinese Pinyin, and which holds true to the spoken, standard Mandarin Chinese language. |
Please provide a setting to display Chinese Pīnyīn (with tonemarks) in the HanziGraph instead of Chinese characters. This would be helpful to get a better feeling for the homonymes and thus the true character of the Chinese language. My reasoning and reason for this feature is the following:
My mother tongue is German and I have been studying Chinese language for almost 20 years, in all settings: university courses in China, living in China with native Chinese romantic partners, private courses, tutors, online tutoring, apps, graded readers (of which I read about 20), CDs, Youtube, everything in all configurations imaginable… yet I still can't even follow a simple conversation. However, last year
thus I have made the monumental and upright revolutionary choice to disregard Chinese characters forever on, regard them as a specialised study subject such as ancient Chinese history and herbarium science, and to continue studying with Chinese Pīnyīn only. Since then I have made REMARKABLE strides in studying Chinese language in a short time, as it suddenly is as easy as studying Spanish or Italian or any other language. It is truly suddenly a joy and much fun. The only drawback is the lack of reading materials in Chinese Pīnyīn (and the lack of "allies"), and the many writing mistakes Chinese tutors make since most tutors don't know yet the orthography rules of Chinese Pīnyīn, as of the CN Gov Pinyin Rules GB/T 16159-2012 (update of the 1996 version, which is already 28+ years available but still largely unknown.) Luckily ChatGPT is quite strong in Chinese Pīnyīn.
All that said to help your motivation :) Thank you for considering this feature.
The text was updated successfully, but these errors were encountered: