Skip to content

Latest commit

 

History

History
22 lines (19 loc) · 1.77 KB

README.md

File metadata and controls

22 lines (19 loc) · 1.77 KB

Bengali Handwritten Character Recognition using CNN

##Dataset The dataset was obtained online from the CMATERdb pattern recognition database repository. It consists of a Train folder and a Test folder, containing 12,000 and 3,000 images respectively.

##Project Overview The purpose of this project is to apply Convoluted Neural Network(CNN) architecture to the problem of handwritten Bengali character recognition (HBCR) and achieve accuracy of over 95%. To my knowledge, the highest accuracy on the CMTRADB 3.2.1 is 95.84% and the objective is to surpass the same. Bangla is one of the most spoken languages, ranked fifth in the world. It is also a significant language with a rich heritage; February 21st is announced as the International Mother Language day by UNESCO to respect the language martyrs for the language in Bangladesh in 1952. Bangla is the second most spoken language in India and it’s the first language of Bangladesh.

CNNs have been extensively and successfully used for English character recognition however there exist immense opportunity and potential to apply the same in the Bengali language space. The main challenge in handwritten character classification is to deal with the enormous variety of handwriting styles by different writers in different languages. Furthermore, some of the complex handwriting scripts comprise different styles for writing words. Depending on languages, characters are written isolated from each other in some cases, (e.g., Thai, Laos and Japanese). In some other cases, they are cursive and sometimes the characters are connected with each other (e.g., English, Bengali and Arabic). Bengali handwritten character recognition is particularly challenging owing to the similarities in different character shapes with distinct sounds.