Skip to content
Jinho D. Choi edited this page Feb 16, 2015 · 1 revision

Part-of-speech Tagging

Part-of-speech tagging is a task of finding grammatical categories of word tokens given a sentence. Your task is to write a program that takes a sequence of word tokens (e.g., ["John", "bought", "a", "book"]) and generates a tag list (e.g., ["NNP", "VBD", "DT", "NN"]), where each tag represents the POS tag of each word token, respectively. See this page for more details about different kinds of POS tags.

Task 1

  • Extend the AbstractTagger class and write your own POS tagger.
  • Feel free to take any approach we have discussed (e.g., GreedyTagger, ExhaustiveTagger, HMMTagger).
  • Feel free to use any classifier we discussed (e.g., Naive Bayes, Hidden Markov Model).
  • Explore more features for the best result. Your grade will be based on how much improvement you make over the baseline HMMTagger.

Task 2

  • Download the following files.
  • Training: trn.pos.
  • Development: dev.pos.
  • Use the training data for training your model.
  • See HW2Test.
  • Use the development data for evaluating your approach.

Task 3

  • Write a report that explains what kind of improvement you made in your program compared to the one we discussed in the class.
  • Submit your report and all necessary Java files.

Artificial Intelligence

Instructor


Emory University

Clone this wiki locally