GitHub - anoopkunchukuttan/crowd-indic-transliteration-data: Xlit-Crowd: Hindi-English Transliteration Corpus

Xlit-Crowd: Hindi-English Transliteration Corpus

The corpus contains transliteration pairs for Hindi-English. These pairs were obtained via crowdsourcing by asking workers to transliterate Hindi words into the Roman script. The tasks were done on Amazon Mechanical Turk and yielded a total of 14919 pairs.

The details regarding the dataset are mentioned in the following paper. Kindly cite this paper if you are using this dataset for research:

Mitesh M. Khapra, Ananthakrishnan Ramanathan, Anoop Kunchukuttan, Karthik Visweswariah, Pushpak Bhattacharyya. When Transliteration Met Crowdsourcing : An Empirical Study of Transliteration via Crowdsourcing using Efficient, Non-redundant and Fair Quality Control . Language and Resources and Evaluation Conference (LREC 2014). 2014.

License

Xlit-Crowd: Hindi-English Transliteration Corpus by Mitesh Khapra is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
crowd_transliterations.hi-en.txt		crowd_transliterations.hi-en.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Xlit-Crowd: Hindi-English Transliteration Corpus

License

About

Releases

Packages

anoopkunchukuttan/crowd-indic-transliteration-data

Folders and files

Latest commit

History

Repository files navigation

Xlit-Crowd: Hindi-English Transliteration Corpus

License

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages