The Effect of Data Representation in Question Decomposition

Code for the paper The Effect of Language Representation in Question Decomposition

About
Project structure
Environment Setup

About

This was the final project in the introductory NLP course taught by Prof Jonathan Berant during 2020-2021.

Authors

The authors of this project are:

Itay Levy
Gal Suchetzky

Project

The Break It Down paper suggested a new decomposition task of a complex question to a series of simple, natural languages steps that can be executed in sequence for answering the original question. In this project, we investigate the effect of performing the same task with varying degrees of structured data. We trained several encoder-decoder models using different types of gold data representation. Our experiments show that increasing structurality benefits out-of-distribution generalization significantly.

Acknowledgements

Prof. Jonathan Berant - Helped with guidance and pointed us to the right people to ask our questions.

Project structure

Framework

The base code for this project was adapted from this pytorch template project. This framework allows us to define and implement many models, metrics, trainers, testers and data loaders and define the combination of the running with a simple, extensible configuration file.

Models

The models and code used in this project can be found under the 'model/' directory.
Our models were adapted from the course's homework assignments.
We have implemented a very basic model and allowed additional mechanisms to be added using the config files, allowing many models that are only slightly different from one another, allowing us to test the contribution of each mechanism to the success of the model.

Training

Training models is as easy as running the train script appropriate configuration file using the following command:

python train.py -c config.json

Environment Setup

To setup the environment, do the following steps:

Create new conda env with python 3.8
Install pytorch with directions from site using pip
pip install nlp
pip install spacy
pip install edit_distance
pip install networkx
pip install matplotlib
pip install torchtext
pip install tensorboardx
pip install scipy

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
configs		configs
configs_capacity		configs_capacity
data_loader		data_loader
logger		logger
model		model
tester		tester
tests		tests
trainer		trainer
utils		utils
.flake8		.flake8
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
The_Effect_of_Language_Representation_in_Question_Decomposition.pdf		The_Effect_of_Language_Representation_in_Question_Decomposition.pdf
experiment_run.py		experiment_run.py
interactive_model.py		interactive_model.py
parse_config.py		parse_config.py
run.py		run.py
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Effect of Data Representation in Question Decomposition

About

Authors

Project

Acknowledgements

Project structure

Framework

Models

Training

Environment Setup

About

Releases

Packages

Languages

License

itayle/qdmr

Folders and files

Latest commit

History

Repository files navigation

The Effect of Data Representation in Question Decomposition

About

Authors

Project

Acknowledgements

Project structure

Framework

Models

Training

Environment Setup

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages