Skip to content

Commit

Permalink
km: Clean up markup & phoneme segmentation
Browse files Browse the repository at this point in the history
  • Loading branch information
r12a committed Dec 4, 2023
1 parent 0bc15d0 commit d597b0c
Show file tree
Hide file tree
Showing 3 changed files with 128 additions and 86 deletions.
53 changes: 41 additions & 12 deletions khmr/km-details.html
Original file line number Diff line number Diff line change
Expand Up @@ -930,15 +930,15 @@


'\u{17B4}': `
<p>An invisible character, originally intended (along with <span class="codepoint" translate="no"><span lang="km">&#x17B5;</span> <a href="/scripts/khmer/block#char17B5"><span class="uname">U+17B5 KHMER VOWEL INHERENT AA</span></a></span>) to <q>represent a phonetic difference not expressed by the spelling, so as to assist in phonetic sorting</q>, however, the Unicode Standard considers those characters to be <q>insufficient for that purpose</q> and <q>errors in the encoding</q>. It should not be used.<tt>u,677</tt></p>
<p>An invisible character, originally intended (along with <span class="hx img">17B5</span>) to <q>represent a phonetic difference not expressed by the spelling, so as to assist in phonetic sorting</q>, however, the Unicode Standard considers those characters to be <q>insufficient for that purpose</q> and <q>errors in the encoding</q>. It should not be used.<tt>u,677</tt></p>
`,





'\u{17B5}': `
<p>An invisible character, originally intended (along with <span class="codepoint" translate="no"><span lang="km">&#x17B4;</span> <a href="/scripts/khmer/block#char17B4"><span class="uname">U+17B4 KHMER VOWEL INHERENT AQ</span></a></span>) to <q>represent a phonetic difference not expressed by the spelling, so as to assist in phonetic sorting</q>, however, the Unicode Standard considers those characters to be <q>insufficient for that purpose</q> and <q>errors in the encoding</q>. It should not be used.<tt>u,677</tt></p>
<p>An invisible character, originally intended (along with <span class="hx img">17B4</span>) to <q>represent a phonetic difference not expressed by the spelling, so as to assist in phonetic sorting</q>, however, the Unicode Standard considers those characters to be <q>insufficient for that purpose</q> and <q>errors in the encoding</q>. It should not be used.<tt>u,677</tt></p>
`,


Expand Down Expand Up @@ -1406,6 +1406,7 @@


'\u{17C9}': `
<p class="insertTranscription">&#x17C9;</p>
<p>Changes the register of a consonant from <span class="ipa">ɔː</span> to <span class="ipa">ɑː</span>, affecting the inherent vowel and any other vowel following the consonant, eg.
<span class="eg" lang="km">ម៉ត់ចត់</span>
<span class="eg" lang="km">រ៉ាប់</span>
Expand Down Expand Up @@ -1436,6 +1437,7 @@


'\u{17CA}': `
<p class="insertTranscription">&#x17CA;</p>
<p>Changes the class of a consonant from <span class="ipa">ɔː</span> to <span class="ipa"></span>, affecting the inherent vowel and also any other vowel following the consonant, eg. compare
<span class="eg" lang="km">ក្រុមហ៊ុន</span>
<span class="eg" lang="km">ហ៊ាន</span>
Expand All @@ -1457,6 +1459,7 @@


'\u{17CB}': `
<p class="insertTranscription">&#x17CB;</p>
<p>Always placed above the final consonant in modern Khmer. Basically shortens the preceding vowel. Affects the preceding vowel sound in one of the following ways:</p>
<ul>
<li>After an inherent vowel
Expand Down Expand Up @@ -1499,6 +1502,7 @@


'\u{17CC}': `
<p class="insertTranscription">&#x17CC;</p>
<p>Not a very common mark. It represents a final <span class="ipa h">r</span> on the <em>previous</em> consonant (like the Devanagari <span class="name">repha</span>), although it may be pronounced as <span class="ipa"></span>, eg.
<span class="eg" lang="km">ទុគ៌ត</span>
</p>
Expand All @@ -1517,6 +1521,7 @@


'\u{17CD}': `
<p class="insertTranscription">&#x17CD;</p>
<p>Used over a consonant, particularly in loan words, to silence it and any attached vowels or subscripts, eg.
<span class="eg" lang="km">សាសន៍</span>
<span class="eg ipalist" lang="km">កេរ្តិ៍</span>
Expand All @@ -1532,6 +1537,7 @@


'\u{17CE}': `
<p class="insertTranscription">&#x17CE;</p>
<p>Very rare, but used over a consonant to convey excited emphasis, like an exclamation mark, eg.
<span class="eg" lang="km">ណែ៎</span>
<span class="eg" lang="km">នុ៎ះន៎</span>
Expand All @@ -1543,6 +1549,7 @@


'\u{17CF}': `
<p class="insertTranscription">&#x17CF;</p>
<p>Used over two consonants to indicate that they represent two specific words:
<span class="charExample" translate="no"><span class="ex" lang="km">ក៏</span> <span class="trans">k⁎</span> <span class="ipa">kɑː</span> <span class="meaning">auxiliary: also, then, therefore</span></span>
<span class="charExample" translate="no"><span class="ex" lang="km">ដ៏</span> <span class="trans">ɗ⁎</span> <span class="ipa">ɗɑː</span> <span class="meaning">pronoun which; very</span></span>
Expand Down Expand Up @@ -1589,6 +1596,7 @@


'\u{17D1}': `
<p class="insertTranscription">&#x17D1;</p>
<p>The sanskrit virama, sometimes used in Sanskrit words to indicate that a final consonant has no vowel sound, eg.
<span class="eg" lang="km">អាត្មន៑</span>
</p>
Expand All @@ -1601,6 +1609,7 @@


'\u{17D2}': `
<p class="insertTranscription">&#x17D2;</p>
<p>Serves to indicate in Unicode text that the following consonant should be rendered as a subscript. The shape is arbitrary, since it is never visible in Khmer (unlike Devanagari etc.).</p>
<p><span class="ipa">cəːŋ</span> (transcribed in Unicode as COENG) is actually the name given to the subscripted consonants themselves, and this should more accurately be called a <span class="ipa">cəːŋ</span> generator. </p>
<p>This virama-based model used by Unicode is consistent with the approach to other Indic scripts, however Cambodian people regard the subscripted consonants as different entities to the normal consonant characters, and need to be taught to use a <span class="ipa">cəːŋ</span> sign to type in Unicode.</p>
Expand All @@ -1611,6 +1620,7 @@


'\u{17D3}': `
<p class="insertTranscription">&#x17D3;</p>
<p><strong>Use discouraged</strong> in favor of the complete set of lunar date symbols.</p>
`,

Expand All @@ -1619,6 +1629,7 @@


'\u{17D4}': `
<p class="insertTranscription">&#x17D4;</p>
<p>Equivalent of a period, placed at the end of a sentence.</p>
<p>Also used in the following combination to mean <span class="meaning">etcetera</span>
<span class="charExample" translate="no"><span class="ex" lang="km">។ល។</span> <span class="trans">.ḻ.</span> <span class="ipa">lanəŋla</span></span>
Expand All @@ -1630,6 +1641,7 @@


'\u{17D5}': `
<p class="insertTranscription">&#x17D5;</p>
<p>Used at the end of a chapter or an entire text.</p>
`,

Expand All @@ -1638,6 +1650,7 @@


'\u{17D6}': `
<p class="insertTranscription">&#x17D6;</p>
<p>Used much like a colon in English. </p>
<p>It is typically used after the quotative particle
<span class="eg" lang="km">ថា</span>
Expand All @@ -1651,6 +1664,7 @@


'\u{17D7}': `
<p class="insertTranscription">&#x17D7;</p>
<p>Repetition sign. Repeats the word directly before. A common way of providing emphasis, eg.
<span class="eg" lang="km">ខ្លាំង ៗ</span>
<span class="eg" lang="km">គាត់មានផ្ទះថ្មី ៗ</span>
Expand All @@ -1669,6 +1683,7 @@


'\u{17D8}': `
<p class="insertTranscription">&#x17D8;</p>
<p>Means et cetera. Use of this character is discouraged. The preferred representation uses the individual characters, eg.
<span class="charExample" translate="no"><span class="ex" lang="km">។ល។</span> <span class="trans">.ḻ.</span></span>
</p>
Expand All @@ -1686,6 +1701,7 @@


'\u{17D9}': `
<p class="insertTranscription">&#x17D9;</p>
<p>Marks the beginning of literary and religious texts.</p>
<p>Forms a pair with <span class="ex" lang="km"></span>, which ends a text.</p>
<p>Means <span class="meaning">cock's eye</span>. It is said to represent the trunk of the elephant-god Ganesha.</p>
Expand All @@ -1696,6 +1712,7 @@


'\u{17DA}': `
<p class="insertTranscription">&#x17DA;</p>
<p>Marks the absolute end of a text. Usually used for poetic or religious texts. </p>
<p>Forms a pair with <span class="ex" lang="km"></span>, which starts a text.</p>
<p>Sometimes used in combination as <span class="ex" lang="km">។៚</span>.</p>
Expand All @@ -1707,6 +1724,7 @@


'\u{17DB}': `
<p class="insertTranscription">&#x17DB;</p>
<p>Placed after the amount, eg.
<span class="eg" lang="km">៣០០០ ៛</span>
</p>
Expand All @@ -1725,6 +1743,7 @@


'\u{17DD}': `
<p class="insertTranscription">&#x17DD;</p>
<p>A rarely used sign that indicates that the consonant retains its inherent vowel sound.</p>
`,

Expand All @@ -1733,79 +1752,89 @@


'\u{17E0}': `

<p class="insertTranscription">&#x17DD;</p>
<p><span class="h">0</span> digit.</p>
`,





'\u{17E1}': `

<p class="insertTranscription">&#x17E1;</p>
<p><span class="h">1</span> digit.</p>
`,





'\u{17E2}': `

<p class="insertTranscription">&#x17E2;</p>
<p><span class="h">2</span> digit.</p>
`,





'\u{17E3}': `

<p class="insertTranscription">&#x17E3;</p>
<p><span class="h">3</span> digit.</p>
`,





'\u{17E4}': `

<p class="insertTranscription">&#x17E4;</p>
<p><span class="h">4</span> digit.</p>
`,





'\u{17E5}': `

<p class="insertTranscription">&#x17E5;</p>
<p><span class="h">5</span> digit.</p>
`,





'\u{17E6}': `

<p class="insertTranscription">&#x17E6;</p>
<p><span class="h">6</span> digit.</p>
`,





'\u{17E7}': `

<p class="insertTranscription">&#x17E7;</p>
<p><span class="h">7</span> digit.</p>
`,





'\u{17E8}': `

<p class="insertTranscription">&#x17E8;</p>
<p><span class="h">8</span> digit.</p>
`,





'\u{17E9}': `

<p class="insertTranscription">&#x17E9;</p>
<p><span class="h">9</span> digit.</p>
`,


Expand Down
Loading

0 comments on commit d597b0c

Please sign in to comment.