In this project, I use the Pomegranate library to build a hidden Markov model for part of speech tagging using a "universal" tagset. I achieved a >96% tag accuracy with larger tagsets on realistic text corpora. This project includes three steps.
1 Process raw texts. 2 Build a Most Frequent Class tagger to use as a baseline. 3 Build an HMM Part of Speech tagger and compare to the MFC baseline.
All codes are stored in the jupyter notebook.