This repository contains the code, scripts, and instructions needed to reproduce the results in the paper
Type-Supervised Hidden Markov Models for Part-of-Speech Tagging with Incomplete Tag Dictionaries
Dan Garrette and Jason Baldridge
In Proceedings of EMNLP 2012
This code is frozen as of the version used to obtain the results in the paper. It will not be maintained.
To see the most up-to-date version of the code, visit this repository.
Set up English data
The English experiments rely on Penn Treebank data. This script prepares that data for use by the experiments. The treebank directory referenced when running the script should contain a folder combined
containing files wsj_0000.mrg
through wsj_2454.mrg
.
sh run.sh "en-data /path/to/treebank"
Run English experiments on sections 00-15
sh run.sh en-run16
Run English experiments on sections 00-07
sh run.sh en-run8
Run Italian experiments
The Italian data is already located in the data
directory, so this
experiment can be launched immediately without need for data setup.
sh run.sh it-run
If you have any questions, please contact Dan Garrette (dhg@cs.utexas.edu).