QDMR Parsing

This directory contains the implementation of our baseline models and evaluation metrics.

Environment set-up

Our experiments were conducted in a python 3.6.8 environment. To set up the environment, please run the following commands, which download and install the required packages:

pip install -r requirements.txt
python -m spacy download en_core_web_sm

Data pre-processing

Before training or evaluating the models, the data files should be processed with the script utils/preprocess_examples.py.

$ python utils/preprocess_examples.py -h

usage: preprocess_examples.py [-h] [--lexicon_file LEXICON_FILE]
                              [--output_file_base OUTPUT_FILE_BASE]
                              [--sample SAMPLE]
                              input_file output_dir

example command: python utils/preprocess_examples.py data/QDMR/train.csv data/
--lexicon_file data/QDMR/train_lexicon_tokens.json --output_file_base
train_dynamic

positional arguments:
  input_file            path to input file
  output_dir            path to output file, without file extension

optional arguments:
  -h, --help            show this help message and exit
  --lexicon_file LEXICON_FILE
                        path to lexicon json file with allowed tokens per
                        example
  --output_file_base OUTPUT_FILE_BASE
                        output file base name (without file extension)
  --sample SAMPLE       json-formatted string with dataset down-sampling
                        configuration, for example: {"ATIS": 0.5, "CLEVR":
                        0.2}

Lexicon file generation

The lexicon json files for our QDMR parsing models can be found in Break's dataset. To generate valid lexicon tokens for a new example, use the valid_annotation_tokens method here. Note that you would still need to format the valid lexicon tokens according to the lexicon file format {"source": "NL question", "allowed_tokens": [valid lexicon tokens]}, e.g.:

{"source": "what flights go from dallas to phoenix ", "allowed_tokens": "['higher than', 'same as', 'what ', 'and ', 'than ', 'at most', 'distinct', 'two', 'at least', 'or ', 'date', 'on ', '@@14@@', 'equal', 'hundred', 'those', 'sorted by', 'elevation', 'which ', '@@6@@', 'was ', 'dallas', 'did ', 'population', 'height', 'one', 'that ', 'on', 'did', 'who', 'true', '@@2@@', '100', 'false', 'and', 'was', 'who ', 'a ', 'the', 'number of ', '@@16@@', 'if ', 'where', '@@18@@', 'how', 'larger than', 'is ', 'from ', 'a', 'for each', 'less', 'are ', '@@19@@', '@@4@@', '@@11@@', 'distinct ', 'flight', 'to', 'not ', 'objects', 'with ', ', ', 'lowest', 'in', 'has ', 'zero', 'in ', 'there ', 'lower than', 'highest', 'go', '@@9@@', 'than', 'size', 'multiplication', 'with', 'besides ', ',', '@@1@@', 'what', 'have', 'those ', 'of', '@@3@@', 'that', 'there', '@@10@@', '@@5@@', 'both ', '@@15@@', 'number of', 'price', 'any', 'which', 'to ', 'how ', 'when ', 'of ', 'division', 'dallass', 'is', 'sum', 'or', 'if', 'more', '@@12@@', 'smaller than', 'flights', 'phoenix', '@@7@@', '@@17@@', 'for each ', 'from', '@@13@@', 'has', 'difference', 'when', 'are', 'any ', '@@8@@', 'both', 'the ', ',  ', 'besides', 'have ', 'where ', 'not']"}

Model training and inference

There are 5 baseline models implemented in the paper, three seq2seq neural models and two rule-based (not-neural) models:

model	type	implementation
copy	rule-based	`model/rule_based/copy_model.py`
rule-based	rule-based	`model/rule_based/rule_based_model.py`
seq2seq	neural	AllenNLP official model, configuration: `model/seq2seq/train-seq2seq.json`
copynet	neural	AllenNLP official model, configuration: `model/seq2seq/train-seq2seq-copynet.json`
seq2seq-dynamic	neural	AllenNLP-based model, configuration: `model/seq2seq/train-seq2seq-dynamic.json`

To run our rule-based models, use the script model/run_model.py either on an input file or by providing a question as an argument (see examples in the evaluation section).

Cofiguration and petrained models

Training and running the neural models can be done with the AllenNLP framework and the provided configurations.
The pretrained neural models described in our paper are provided below, along with their hyperparameter configurations:

model	dataset	hyperparameters	download
seq2seq	Break	`layers1_lr0.001_hd450_dop0.0`	seq2seq_low
copynet	Break	`layers2_lr0.001_hd450_dop0.2`	copynet_low
seq2seq-dynamic	Break	`layers1_lr0.001_hd450_dop0.2`	dynamic_low
seq2seq	Break high-level	`layers1_lr0.001_hd300_dop0.0`	seq2seq_high
copynet	Break high-level	`layers3_lr0.001_hd450_dop0.2`	copynet_high
seq2seq-dynamic	Break high-level	`layers1_lr0.001_hd300_dop0.3`	dynamic_high

Evaluation

Evaluation of any model with the metrics described in the paper (i.e. EM, SARI, GED, GED+) can be done with the script model/run_model.py, by specifying the model for evaluation and passing the flag --evaluate. For further usage options please check the help menu of the script, sample commands are provided below.

Code for computing the SARI score was taken from the tensor2tensor library by Google.

Note on evaluation speed

GED and GED+ are algorithms which approximate distance between graphs. Their execution, and particularly the execution of GED+, can take long time and for large graphs might not even be feasible. To handle this, we apply 3 mechanisms:

In case the execution of either GED or GED+ takes longer than 10 minutes, it is interrupted and the example will be skipped (it's also possible to interrupt evaluation on specific example with Ctrl+C).
Decomposition graphs with more than 5 nodes are skipped during GED+ computation.
Computation of GED+ can be done using multiple processes (see example in the section below).

Example commands

Evaluating the a trained seq2seq-dynamic model on 10 random examples from the development set.

python model/run_model.py \
--input_file data/dev_dynamic.tsv \
--random_n 10 \
--model dynamic \
--model_dir trained_models/seq2seq_dynamic/ \
--evaluate

Evaluating the copy baseline on the development set, with 10 processes for computing GED+.

python model/run_model.py \
--input_file data/dev.tsv \
--model copy \
--evaluate \
--num_processes 10

Evaluating the rule-based baseline on a question provided as an argument.

python model/run_model.py \
--model rule_based \
--evaluate \
--question "Return the keywords which have been contained by more than 100 ACL papers" \
--gold "papers @@SEP@@ @@1@@ in ACL @@SEP@@ keywords of @@2@@ @@SEP@@ number of @@2@@ for each @@3@@ @@SEP@@ @@3@@ where @@4@@ is more than 100"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

QDMR Parsing

Environment set-up

Data pre-processing

Lexicon file generation

Model training and inference

Cofiguration and petrained models

Evaluation

Note on evaluation speed

Example commands

Files

README.md

Latest commit

History

README.md

File metadata and controls

QDMR Parsing

Environment set-up

Data pre-processing

Lexicon file generation

Model training and inference

Cofiguration and petrained models

Evaluation

Note on evaluation speed

Example commands