-
Notifications
You must be signed in to change notification settings - Fork 5
Homework 2
Jinho D. Choi edited this page Feb 16, 2015
·
1 revision
Part-of-speech tagging is a task of finding grammatical categories of word tokens given a sentence. Your task is to write a program that takes a sequence of word tokens (e.g., ["John", "bought", "a", "book"]) and generates a tag list (e.g., ["NNP", "VBD", "DT", "NN"]), where each tag represents the POS tag of each word token, respectively. See this page for more details about different kinds of POS tags.
- Extend the
AbstractTagger
class and write your own POS tagger. - Feel free to take any approach we have discussed (e.g., GreedyTagger, ExhaustiveTagger, HMMTagger).
- Feel free to use any classifier we discussed (e.g., Naive Bayes, Hidden Markov Model).
- Explore more features for the best result. Your grade will be based on how much improvement you make over the baseline
HMMTagger
.
- Download the following files.
- Training:
trn.pos
. - Development:
dev.pos
. - Use the training data for training your model.
- See
HW2Test
. - Use the development data for evaluating your approach.
- Write a report that explains what kind of improvement you made in your program compared to the one we discussed in the class.
- Submit your report and all necessary Java files.
©2015 Emory University