CompAQT is a method to improve compositional generalization for multi-step quantitative reasoning for question answering over tabular data. This repository contains:
- A dataset composed of multi-step quatitative reasoning samples from four previously published datasets.
- Code and instructions on how to apply the CompAQT method to QA models.
You can download the data from the below links:
The dataset is composed of four previously released datasets that have been filtered and processed to focus on multi-step quantitative reasoning.
- FinQA (paper, github repo)
- TAT-QA (paper, github repo)
- HiTab (paper, github repo)
- MultiHiertt (paper, github repo)
All datasets except FinQA have been filtered to only include samples that require quantitative reasoning. The samples have also been reformatted to match the FinQA format.
Each sample has the following format:
{
"source": the original source of the dataset (`finqa`, `tatqa`, `hitab`, or `multihiertt`)
"pre_text": the text before the table
"post_text": the text after the table
"table_ori": the original table, represented as a nested array
"table": the normalized table, where the first row represents the column headers and the left-most column represents the row headers
"id": unique example id; the id matches the id of each sample in the original dataset
"qa": {
"question": the question
"program": the reasoning program
"gold_inds": the gold supporting facts
"exe_ans": the gold execution result
}
}
To select a particular dataset, set the source
parameter in generator/config.py
to one these options: finqa|tatqa|hitab|multihiertt|all
You can set up the environment by installing all requirements: pip install -r requirements.txt
The code is largely adapted from FinQA.
All configurations are modifiable within generator/config.py
- First navigate to the generator:
cd generator
- Run
chmod +x run_finqa.sh
- To run FinQA with CompAQT:
./run_finqa.sh
- First navigate to the generator:
cd generator
- Run
chmod +x run_pvn.sh
- To run FinQA with CompAQT:
./run_pvn.sh