Homework 2

Part-of-speech Tagging

Part-of-speech tagging is a task of finding grammatical categories of word tokens given a sentence. Your task is to write a program that takes a sequence of word tokens (e.g., ["John", "bought", "a", "book"]) and generates a tag list (e.g., ["NNP", "VBD", "DT", "NN"]), where each tag represents the POS tag of each word token, respectively. See this page for more details about different kinds of POS tags.

Task 1

Extend the AbstractTagger class and write your own POS tagger.
Feel free to take any approach we have discussed (e.g., GreedyTagger, ExhaustiveTagger, HMMTagger).
Feel free to use any classifier we discussed (e.g., Naive Bayes, Hidden Markov Model).
Explore more features for the best result. Your grade will be based on how much improvement you make over the baseline HMMTagger.

Task 2

Download the following files.
Training: trn.pos.
Development: dev.pos.
Use the training data for training your model.
See HW2Test.
Use the development data for evaluating your approach.

Task 3

Write a report that explains what kind of improvement you made in your program compared to the one we discussed in the class.
Submit your report and all necessary Java files.

Artificial Intelligence

Syllabus.
Schedule.

Instructor

Jinho D. Choi

Emory University

Provide feedback

Saved searches

Use saved searches to filter your results more quickly