GitHub - TamsilAmani/Language-Analysis-in-Code-Mixed-Text: Language Detection using Code Mixed Text

ANALYSIS OF PART OF SPEECH TAGS IN LANGUAGE IDENTIFICATION OF CODE MIXED TEXT :

This project was made as a part of my Minor & Major Assignments during my course in B.Tech (Computer Science) in Jamia Millia Islamia University, New Delhi. A research paper has also been published on it. Link - https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3462976

AIM - The project determines the language of the words given in a sentence. The sentence can be either in English, Hindi or Hinglish (that is Hindi written in English script).

APPROACH - Firstly, we trained the models to identify Enlish words based on a data set where X was (Word, POS) and Y was English. Secondly, we trained the models to identify Hindi words. We first trans-literated Hindi words (written in Hindi script) to English words and then obtained the final data set where X was (Words, POS (as in Hindi script)) and Y was Hindi.

TEACHNOLOGIES USED - NLTK library and Pandas in Python

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
20k.txt		20k.txt
English_Words.txt		English_Words.txt
Language_Identification.ipynb		Language_Identification.ipynb
README.md		README.md
Test_set.csv		Test_set.csv
iamsrk_tweets.csv		iamsrk_tweets.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

TamsilAmani/Language-Analysis-in-Code-Mixed-Text

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages