Skip to content
/ SASP Public

Learning to Generate Programs for Table Fact Verification via Structure-Aware Semantic Parsing

Notifications You must be signed in to change notification settings

ousuixin/SASP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Learning to Generate Programs for Table Fact Verification via Structure-Aware Semantic Parsing

This repository contains a preliminary release of code for SASP used for experiments in our ACL 2022 paper "Learning to Generate Programs for Table Fact Verification via Structure-Aware Semantic Parsing". It is based on the open source implementation of pytorch neural symbolic machine.

The difference with Pytorch Neural Symbolic Machines:

  • We postprocess the dependency tree to generate operation oriented tree, which will be used to sample programs to bootstrap SASP and bias the program generation for Tabfact.
  • We change the reward function to Maximum Likelyhood Most Violation Reward, which achieves better performance than the binary reward.
  • We use an interpreter more suitable for Tabfact. We also revised the data pre-processing script, and processed the TABFACT dataset with our implementation.
  • The original implementation will first sample, then train and evaluate at last. We make sampling, training and evaluating threads work parallelly by multiprocessing and thread lock. What's more, the original implementation uses one GPU for model training, one GPU for model evaluating and multiple GPUs for program sampling. The program sampling is the most time consuming procedure, so we use one GPU for Sampling, and use another GPU for both training and evaluating. In this way, both two GPUs are almost fully loaded, and the training speed is much faster.

Requirement

Dataset Preparation

  • Firstly, you should download the dataset from the official website of Tabfact, then preprocess it with our script SASP/table/tabfact/preprocess_example.py. Or just use the preprocessed data under SASP/data/tabfact/data_shard
  • Secondly, generate the operation oriented tree with the script SASP/table/tabfact/gen_with_func.py, or just use processed data inside data/tabfact/data_shard
  • Thirdly, sample some examples in our paper to bootstrap SASP with the script SASP/table/random_explore.py. Or just download saved_program.jsonl, and put it into data/tabfact/

Training

Run the following command under SASP/:

python -m table.experiments train --seed=0 --cuda --config=data/config/config.vanilla_bert.json --work-dir=runs/demo_run --extra-config='{}'

It will take around 20 hours.

Testing

Run the following command under SASP/:

python -m table.experiments test --cuda --eval-batch-size=128 --eval-beam-size=4 --save-decode-to=runs/demo_run/test_result.json --model=./runs/demo_run/model.best.bin --test-file=data/tabfact/data_shard/test.jsonl --extra-config='{}'

You can also download our trained model, and put it into runs/demo_run for testing.

Reference

Please consider citing the following papers if you are using our codebase.

@incollection{NIPS2018_8204,
title = {Memory Augmented Policy Optimization for Program Synthesis and Semantic Parsing},
author = {Liang, Chen and Norouzi, Mohammad and Berant, Jonathan and Le, Quoc V and Lao, Ni},
booktitle = {Advances in Neural Information Processing Systems 31},
editor = {S. Bengio and H. Wallach and H. Larochelle and K. Grauman and N. Cesa-Bianchi and R. Garnett},
pages = {10015--10027},
year = {2018},
publisher = {Curran Associates, Inc.},
url = {http://papers.nips.cc/paper/8204-memory-augmented-policy-optimization-for-program-synthesis-and-semantic-parsing.pdf}
}

@inproceedings{liang2017neural,
  title={Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak Supervision},
  author={Liang, Chen and Berant, Jonathan and Le, Quoc and Forbus, Kenneth D and Lao, Ni},
  booktitle={Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
  volume={1},
  pages={23--33},
  year={2017}
}

@inproceedings{yin20acl,
    title = {Ta{BERT}: Pretraining for Joint Understanding of Textual and Tabular Data},
    author = {Pengcheng Yin and Graham Neubig and Wen-tau Yih and Sebastian Riedel},
    booktitle = {Annual Conference of the Association for Computational Linguistics (ACL)},
    month = {July},
    year = {2020}
}

About

Learning to Generate Programs for Table Fact Verification via Structure-Aware Semantic Parsing

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages