Code for the paper The Effect of Language Representation in Question Decomposition
This was the final project in the introductory NLP course taught by Prof Jonathan Berant during 2020-2021.
The authors of this project are:
- Itay Levy
- Gal Suchetzky
The Break It Down paper suggested a new decomposition task of a complex question to a series of simple, natural languages steps that can be executed in sequence for answering the original question. In this project, we investigate the effect of performing the same task with varying degrees of structured data. We trained several encoder-decoder models using different types of gold data representation. Our experiments show that increasing structurality benefits out-of-distribution generalization significantly.
Prof. Jonathan Berant - Helped with guidance and pointed us to the right people to ask our questions.
The base code for this project was adapted from this pytorch template project. This framework allows us to define and implement many models, metrics, trainers, testers and data loaders and define the combination of the running with a simple, extensible configuration file.
The models and code used in this project can be found under the 'model/' directory.
Our models were adapted from the course's homework assignments.
We have implemented a very basic model and allowed additional mechanisms to be added using the config files,
allowing many models that are only slightly different from one another, allowing us to test the contribution of each
mechanism to the success of the model.
Training models is as easy as running the train script appropriate configuration file using the following command:
python train.py -c config.json
To setup the environment, do the following steps:
- Create new conda env with python 3.8
- Install pytorch with directions from site using pip
- pip install nlp
- pip install spacy
- pip install edit_distance
- pip install networkx
- pip install matplotlib
- pip install torchtext
- pip install tensorboardx
- pip install scipy