Skip to content

UzTransliterator | State-of-the-art machine transliteration tool for Uzbek language

License

Notifications You must be signed in to change notification settings

UlugbekSalaev/UzTransliterator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

72 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


UzTransliterator | State-of-the-art machine transliteration tool for Uzbek language, Cyrillic<>Latin<>NewLatin

The main goal of this paper is to present a state-of-the-art machine transliteration tool between three common scripts used in low-resource Uzbek language: old Cyrillic, currently official Latin, and newly announced New-Latin alphabets, which was created using a combination of rule-based and statistical approaches. The created tool is available as an open-source Python package, as well as a web-based application including a public API.

Feel free to use the tools presented in this project, a paper about more details on creation and usage here.
If you find it useful, plese make sure to cite the paper:

@article{salaev2022machine,
  title={A machine transliteration tool between Uzbek alphabets},
  author={Salaev, Ulugbek and Kuriyozov, Elmurod and G{\'o}mez-Rodr{\'\i}guez, Carlos},
  journal={arXiv preprint arXiv:2205.09578},
  year={2022}
}

About The Project

Web-interface of the tool

Feel free to use the tool presented in this project, and if you find it useful, plese make sure to cite the paper here (coming soon...) Demo of the web-based transliteration tool can be seen here.

In this paper, we presented a Python code, a web tool, and an API created for the Uzbek language that performs machine transliteration between two popularly used Cyrillic and Latin alphabets, as well as a newly reformed version of the Latin alphabet, which, according to the governmental decree, all legal texts will have been completely adapted to by year 2023.

(back to top)

Installation

Python

pip install UzTransliterator
Source: https://pypi.org/project/UzTransliterator/

Using
from UzTransliterator import UzTransliterator
obj = UzTransliterator.UzTransliterator()
print(obj.transliterate("маткаб", from_="cyr", to="lat"))
Output: maktab

Options

from_='cyr', to='lat'
from_='cyr', to='nlt'
from_='lat', to='cyr'
from_='lat', to='nlt'
from_='nlt', to='cyr'
from_='nlt', to='lat'

Web Interface

https://nlp.urdu.uz/?menu=translit

API

URL: https://uz-translit.herokuapp.com/translit
Methods: GET, POST
Parametres: text:str, from_:str, to:str
Example Request: https://uz-translit.herokuapp.com/translit?text=мактаб&from_=cyr&to=lat

Note

New latin alphabet has some difference than Latin. Main changing is presented in following as format Latin - New Latin:
“G‘, g‘” — “Ḡ, ḡ”
“O‘, o‘” — “Ō, ō”
“Sh, sh” — “Ş, ş”
“Ch, ch” — “Ç ç”

Built With

Programming language used:

These are the major libraries used inside Python:

(back to top)

License

Distributed under the MIT LICENSE. See LICENSE.txt for more information.

(back to top)

About

UzTransliterator | State-of-the-art machine transliteration tool for Uzbek language

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages