This repository consists of an Aesthetics corpus that was created using text from the following sources:
- http://hindisamay.com, an e-library maintained by Mahatma Gandhi Antarrashtriya Hindi Vishwa Vidyalaya, Wardha
- http://premchand.co.in, a website dedicated to the popular novelist Premchand’s stories, and
- Bhandarkar Oriental Research Institute’s Digital Library (http://borilib.com)
The repository also consists of an exhustive stop word list prepared from the sources listed below: Wictionary Top 1900 https://1000mostcommonwords.com/1000-most-common-hindi-words https://blogs.transparent.com/hindi/first-100-high-frequency-words-in-hindi http://home.iitk.ac.in/~prasant/HindiCorpus/word.html https://github.com/oprogramador/most-common-words-by-language https://github.com/Alir3z4/stop-words https://github.com/stopwords-iso/stopwords-hi/blob/master/stopwords-hi.txt https://github.com/Xangis/extra-stopwords https://data.mendeley.com/datasets/bsr3frvvjc/1 https://www.ranks.nl/stopwords/hindi Frequency list generated from Wiki Dump August 2019 Aesthetics Corpus (custom) http://opus.nlpl.eu/ CFILT Hindi Corpus (http://www.cfilt.iitb.ac.in/Downloads.html) CFILT Hindi English Parallel Corpus (Anoop Kunchukuttan, Pratik Mehta, Pushpak Bhattacharyya. The IIT Bombay English-Hindi Parallel Corpus. Language Resources and Evaluation Conference. 2018) TDIL English Hindi Tourism Text Corpus TDIL Hindi English ILCI II Corpus on Agriculture and Entertainment TDIL Hindi Monolingual Text Corpus ILCI II TDIL Hindi English Health ILCI
The "Linguistic Resources" obtained from TDIL have been developed & made available by TDIL, MeitY, Government of India. Co-Authors: Dr. Jatinderkumar R. Saini, Dr. Dhanya Pramod Copyright © 2019, Gayatri Venugopal This work is licensed under GNU GPL v3 https://www.gnu.org/licenses/gpl-3.0.html