forked from interscript/interscript-ruby
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
interscript#192 add bgnpcgn-uzb-Cyrl-Latn-1979 bgnpcgn-uzb-Cyrl-Latn-…
…2000
- Loading branch information
Showing
2 changed files
with
209 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,127 @@ | ||
--- | ||
authority_id: bgnpcgn | ||
id: 1979 | ||
language: iso-639-2:uzb | ||
source_script: Cyrl | ||
destination_script: Latn | ||
name: BGN/PCGN Romanization System -- Uzbek Cyrillic (1979) | ||
url: http://transliteration.eki.ee/pdf/Uzbek.pdf | ||
creation_date: 1979 | ||
|
||
notes: | ||
- At the beginning of a syllable, after a vowel, ъ or ь. | ||
|
||
tests: | ||
# https://ru.wikipedia.org/wiki/Узбекский_язык | ||
- source: Ўзбек ёзуви | ||
expected: Ŭzbek yozuwi | ||
- source: Ўзбек тили | ||
expected: Ŭzbek tili | ||
- source: катта | ||
expected: katta | ||
- source: куп | ||
expected: kup | ||
- source: кальта | ||
expected: kalʼta | ||
- source: Бори элға яхшилик қилғилки, мундин яхши йўқ Ким, дегайлар даҳр аро қолди фалондин яхшилик | ||
expected: Bori elgha yakhshilik qilghilki, mundin yakhshi yŭq Kim, degaylar dahr aro qoldi falondin yakhshilik | ||
- source: Бахр ул-худо | ||
expected: Bakhr ul-khudo | ||
- source: Рисале-йи маариф-и Шейбани | ||
expected: Risale-yi maarif-i Sheybani | ||
- source: Карами Хакка нихоят йукдур | ||
expected: Karami Khakka nikhoyat yukdur | ||
- source: Йахши | ||
expected: Yakhshi | ||
- source: Тутук белгись | ||
expected: Tutuk belgisʼ | ||
- source: | | ||
Барча одамлар эркин, қадр-қиммат ва ҳуқуқларда тенг бўлиб туғиладилар. | ||
Улар ақл ва виждон соҳибидирлар ва бир-бирлари ила биродарларча муомала қилишлари зарур. | ||
expected: | | ||
Barcha odamlar erkin, qadr-qimmat wa huquqlarda teng bŭlib tughiladilar. | ||
Ular aql wa wizhdon sohibidirlar wa bir-birlari ila birodarlarcha muomala qilishlari zarur. | ||
- source: ПАПАПАЧУКА Респект! | ||
expected: PAPAPACHUKA Respekt! | ||
|
||
map: | ||
rules: | ||
# note[1] | ||
- pattern: (?<=[АаЕеЁёИиОоУуЭэЮюЯяЪъЬь])\u0415 | ||
result: Ye | ||
- pattern: (?<=[АаЕеЁёИиОоУуЭэЮюЯяЪъЬь])\u0435 | ||
result: ye | ||
|
||
characters: | ||
'\u0410': 'A' # А | ||
'\u0411': 'B' # Б | ||
'\u0412': 'W' # В | ||
'\u0413': 'G' # Г | ||
'\u0492': 'Gh' # Ғ | ||
'\u0414': 'D' # Д | ||
'\u0415': 'E' # Е | ||
'\u0401': 'Yo' # Ё | ||
'\u0416': 'Zh' # Ж | ||
'\u0417': 'Z' # З | ||
'\u0418': 'I' # И | ||
'\u0419': 'Y' # Й | ||
'\u041A': 'K' # К | ||
'\u049A': 'Q' # Қ | ||
'\u041B': 'L' # Л | ||
'\u041C': 'M' # М | ||
'\u041D': 'N' # Н | ||
'\u041E': 'O' # О | ||
'\u041F': 'P' # П | ||
'\u0420': 'R' # Р | ||
'\u0421': 'S' # С | ||
'\u0422': 'T' # Т | ||
'\u0423': 'U' # У | ||
'\u040E': 'Ŭ' # Ў | ||
'\u0424': 'F' # Ф | ||
'\u0425': 'Kh' # Х | ||
'\u04B2': 'H' # Ҳ | ||
'\u0426': 'Ts' # Ц | ||
'\u0427': 'Ch' # Ч | ||
'\u0428': 'Sh' # Ш | ||
'\u042a': "\u02BC" # Ъ | ||
'\u042c': "\u02BC" # Ь | ||
'\u042D': 'E' # Э | ||
'\u042E': 'Yu' # Ю | ||
'\u042F': 'Ya' # Я | ||
|
||
'\u0430': 'a' # а | ||
'\u0431': 'b' # б | ||
'\u0432': 'w' # в | ||
'\u0433': 'g' # г | ||
'\u0493': 'gh' # ғ | ||
'\u0434': 'd' # д | ||
'\u0435': 'e' # e | ||
'\u0451': 'yo' # ё | ||
'\u0436': 'zh' # ж | ||
'\u0437': 'z' # з | ||
'\u0438': 'i' # и | ||
'\u0439': 'y' # й | ||
'\u043A': 'k' # к | ||
'\u049B': 'q' # қ | ||
'\u043B': 'l' # л | ||
'\u043C': 'm' # м | ||
'\u043D': 'n' # н | ||
'\u043E': 'o' # о | ||
'\u043F': 'p' # п | ||
'\u0440': 'r' # р | ||
'\u0441': 's' # с | ||
'\u0442': 't' # т | ||
'\u0443': 'u' # у | ||
'\u045E': 'ŭ' # ў | ||
'\u0444': 'f' # ф | ||
'\u0445': 'kh' # х | ||
'\u04B3': 'h' # ҳ | ||
'\u0446': 'ts' # ц | ||
'\u0447': 'ch' # ч | ||
'\u0448': 'sh' # ш | ||
'\u044a': "\u02BC" # ъ | ||
'\u044c': "\u02BC" # ь | ||
'\u044D': 'e' # э | ||
'\u044F': 'ya' # я | ||
'\u044E': 'yu' # ю | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,82 @@ | ||
--- | ||
authority_id: bgnpcgn | ||
id: 2000 | ||
language: iso-639-2:uzb | ||
source_script: Cyrl | ||
destination_script: Latn | ||
name: TABLE OF CORRESPONDENCES CYRILLIC - ROMAN BGN/PCGN 2000 Agreement | ||
description: | | ||
In 1995, the Uzbek government adopted the Roman alphabet to replace the existing Cyrillic alphabet. | ||
The presentation below provides a table of correspondences between the former Cyrillic alphabet and the | ||
current Roman alphabet. When Uzbek Roman-alphabet spellings are not available, this table can be used to | ||
convert Uzbek Cyrillic spellings. This table of correspondences supersedes the BGN/PCGN 1979 romanization | ||
system for Uzbek. | ||
url: http://transliteration.eki.ee/pdf/Uzbek.pdf | ||
creation_date: 2000 | ||
confirmation_date: 2017-11 | ||
|
||
notes: | ||
- The letter sequence ye is used initially, after the vowel characters 1, 6, 7, 10, 16, 21, 29, 30, 31, and 32, and after characters 11 and 28. | ||
- The Unicode encoding of the apostrophe appearing in rows 27 and 28 is U+2019. The inverted apostrophe appearing in rows 32 (o‘) and 34 (g‘) is U+2018. | ||
- The Roman-script columns show only lowercase forms but, when applying the table, uppercase and lowercase Roman letters as appropriate should be used. | ||
|
||
tests: | ||
# https://ru.wikipedia.org/wiki/Узбекский_язык | ||
- source: Ўзбек ёзуви | ||
expected: O‘zbek yozuwi | ||
- source: Ўзбек тили | ||
expected: O‘zbek tili | ||
- source: катта | ||
expected: katta | ||
- source: куп | ||
expected: kup | ||
- source: кальта | ||
expected: kal’ta | ||
- source: Бори элға яхшилик қилғилки, мундин яхши йўқ Ким, дегайлар даҳр аро қолди фалондин яхшилик | ||
expected: Bori elg‘a yaxshilik qilg‘ilki, mundin yaxshi yo‘q Kim, degaylar dahr aro qoldi falondin yaxshilik | ||
- source: Бахр ул-худо | ||
expected: Baxr ul-xudo | ||
- source: Рисале-йи маариф-и Шейбани | ||
expected: Risale-yi maarif-i Sheybani | ||
- source: Карами Хакка нихоят йукдур | ||
expected: Karami Xakka nixoyat yukdur | ||
- source: Йахши | ||
expected: Yaxshi | ||
- source: Тутук белгись | ||
expected: Tutuk belgis’ | ||
- source: | | ||
Барча одамлар эркин, қадр-қиммат ва ҳуқуқларда тенг бўлиб туғиладилар. | ||
Улар ақл ва виждон соҳибидирлар ва бир-бирлари ила биродарларча муомала қилишлари зарур. | ||
expected: | | ||
Barcha odamlar erkin, qadr-qimmat wa huquqlarda teng bo‘lib tug‘iladilar. | ||
Ular aql wa wijdon sohibidirlar wa bir-birlari ila birodarlarcha muomala qilishlari zarur. | ||
- source: ПАПАПАЧУКА Респект! | ||
expected: PAPAPACHUKA Respekt! | ||
|
||
map: | ||
inherit: bgnpcgn-uzb-Cyrl-Latn-1979 | ||
|
||
rules: | ||
# note[1] | ||
- pattern: (?<=[АаЕеЁёИиОоУуЭэЮюЯяЙйЬь])\u0415 | ||
result: Ye | ||
- pattern: (?<=[АаЕеЁёИиОоУуЭэЮюЯяЙйЬь])\u0435 | ||
result: ye | ||
|
||
characters: | ||
'\u0412': 'V' # В | ||
'\u0492': "G\u2018" # Ғ | ||
'\u0416': 'J' # Ж | ||
'\u040E': "O\u2018" # Ў | ||
'\u0425': 'X' # Х | ||
'\u042a': "\u2019" # Ъ note[2] | ||
'\u042c': "\u2019" # Ь note[2] | ||
|
||
'\u0432': 'w' # в | ||
'\u0493': "g\u2018" # ғ | ||
'\u0436': 'j' # ж | ||
'\u045E': "o\u2018" # ў | ||
'\u0445': 'x' # х | ||
'\u044a': "\u2019" # ъ note[2] | ||
'\u044c': "\u2019" # ь note[2] | ||
|