Project Title:

A Bangla Parts-Of-Speech (POS) tagging software

Project Description:

The Bangla Parts of Speech tagging tool will take a Bangla text as input, analyze the text and label each part according to the role it plays in a sentence (according to morphology, syntax and semantics). The labels include nouns, verbs, adjectives and so on. For example, it will tag each word of the sentence, “আমি ভাত খাই” as আমি_PRO ভাত_NN খাই_VRB, where PRO, NN and VRB are standard short forms for Pronoun, Noun and Verb, respectively.

Since Parts of speech forms the core of a language, this tool can help to understand the basics of Bangla language. This tool can also help in the use and analysis of corpora, Named Entity Recognition (NER), sentiment analysis, question answering, and word sense disambiguation.

I have used the algorithm proposed by Md. Nesarul Hoque and Md. Hanif Seddiqui, “Bangla Parts-of-Speech Tagging using Bangla Stemmer and Rule based Analyzer”, in 18th International Conference on Computer and Information Technology (ICCIT), 2015.

I have used C++ as the programming language.

Project Outcome:

The tool takes a text file as input, uses the algorithm to tag each word with PoS and then writes the outcome on another text file as shown below :

Accuracy:

Testing on 782 words collected from Prothom Alo Online yielded 64% accurate result. I have tested the accuracy using a confusion matrix :

It is also worthwhile to mention that, it was very hard to find authentic data against which the data produced by the tool could be checked. So the former was created with the help of volunteers, but the accuracy of that data itself remains dubious.

Project Dependency:

No platform dependent library was use. So the project runs on any platform. But on windows, it sometimes produces totally wrong outputs for reason(s) unknown.

Name		Name	Last commit message	Last commit date
Latest commit History 79 Commits
.vscode		.vscode
Documents		Documents
accurateOutputs		accurateOutputs
testInputs		testInputs
testOutputs		testOutputs
BPlusTree.h		BPlusTree.h
Bivokti_suffix.txt		Bivokti_suffix.txt
Bochon_suffix.txt		Bochon_suffix.txt
NotStemmed_suffix.txt		NotStemmed_suffix.txt
Quantifiers.txt		Quantifiers.txt
README.md		README.md
confusionMatrix.h		confusionMatrix.h
main.cpp		main.cpp
postags.txt		postags.txt
read.txt		read.txt
stringOperations.h		stringOperations.h
test.txt		test.txt
write.txt		write.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Title:

Project Description:

Project Outcome:

Accuracy:

Project Dependency:

About

Releases

Packages

Languages

SharifMAbdullah/Bangla-Parts-of-Speech-Tagger

Folders and files

Latest commit

History

Repository files navigation

Project Title:

Project Description:

Project Outcome:

Accuracy:

Project Dependency:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages