Skip to content

Latest commit

 

History

History
82 lines (75 loc) · 3.83 KB

README.md

File metadata and controls

82 lines (75 loc) · 3.83 KB

Deep Semantic Role Labeling in Tensorflow

This repository contains the following:

Getting Started

Prerequisites

  • Python 2.7
  • virtualenv

Virtualenv Installation

virtualenv ~/.venvs/tf-srl
source ~/.venvs/tf-srl/bin/activate
cd semantic-role-labeling
pip install -r requirements.txt

Download word embeddings

We use GloVe 100-dimensional vectors trained on 6B tokens. They can be downloaded with the following:

./data/scripts/get-resources.sh

Usage

Training CoNLL-2005

In order to generate SRL training data for CoNLL-2005, you will need to download and extract the PTB corpus LDC99T42 (which is not publicly available).

To train a model based on Deep Semantic Role Labeling: What works and what's next (He et al. 2017), you can then use the following scripts:

# download and prepare training data (only needs to be run once)
./data/scripts/conll05-data.sh -i /path/to/ptb/
# extract features and train default model with CoNLL-2005 train/devel split
./data/scripts/train-srl.sh -i data/datasets/conll05/ -o data/experiments/conll05/

To train a phrase-constrained model, you need to override the default configuration file and mode:

./data/scripts/train-srl.sh -i data/datasets/conll05/ -o data/experiments/conll05-phrase/ -c data/configs/phrase.json -m phrase

Training CoNLL-2012

In order to generate SRL training data corresponding to the train-dev-test split from CoNLL-2012, you will need to download and extract OntoNotes 5 LDC2013T19.

Having done this, you can train a model as follows:

# download and prepare data (only needs to be run once)
./data/scripts/conll2012-data.sh -i /path/to/ontonotes-release-5.0/
# extract features and train default model with CoNLL-2012 train/devel split
./data/scripts/train-srl.sh -i data/datasets/conll2012/ -o data/experiments/conll2012/

Training with other data

It's possible to train using CoNLL-style data in other formats (with different columns). To do this, you must specify a few required fields through a .json configuration file:

{
  "columns": {
    "word": 0, 
    "roleset": 4,
    "predicate": 5
  },
  "arg_start_col": 6
}

Here, "word": 0 means that words appear in the first column. Similarly, "roleset": 4 means that the roleset or sense for predicates appears in the 4th column. "predicate" provides the column index of the lemma of the predicate. Other columns can be added for use in feature extraction, but these are the bare minimum required. "arg_start_col" gives the first column containing argument labels. No additional columns can occur after argument columns.

Then, if you have a training file named train.conll and dev file named valid.conll in path/to/data/directory, you can train as follows with a custom reader named reader.json:

./data/scripts/train-srl.sh -i path/to/data/directory -o path/to/output/directory -t train.conll -v valid.conll --custom reader.json

Evaluation

To simplify evaluation, train-srl.sh can be used directly. For CoNLL-05, for example, you can test on the Brown corpus as follows:

./data/scripts/train-srl.sh -i data/datasets/conll05/ -o data/experiments/conll05/ --test test-brown.conll

where test-brown.conll must be located in data/datasets/conll05/.