mcQA : Multiple Choice Questions Answering

Answering multiple choice questions with Language Models.

News 📢

🚧 This project is currently under development. Stay tuned ! 🤩

Jun 6th, 2020

Refactored data subpackage, the library now supports RACE, Synonym, Swag and ARC data sets.
Upgrade to transformers==2.10.0.

Installation

With pip

pip install mcqa

From source

git clone https://github.com/mcqa-suite/mcqa.git
cd mcQA
pip install -e .

Getting started

Data preparation

To train a mcQA model, you need to create a csv file with n+2 columns, n being the number of choices for each question. The first column should be the context sentence, the n following columns should be the choices for that question and the last column is the selected answer.

Below is an example of a 3 choice question (taken from the CoS-E dataset) :

Context sentence	Choice 1	Choice 2	Choice 3	Label
People do what during their time off from work?	take trips	brow shorter	become hysterical	take trips

If you have a trained mcQA model and want to infer on a dataset, it should have the same format as the train data, but the label column.

See example data preparation below:

from mcqa.data import MCQAData

mcqa_data = MCQAData(bert_model="bert-base-uncased", lower_case=True, max_seq_length=256) 
                     
train_dataset = mcqa_data.read(data_file='swagaf/data/train.csv', is_training=True)
test_dataset = mcqa_data.read(data_file='swagaf/data/test.csv', is_training=False)

Model training

from mcqa.models import Model

mdl = Model(bert_model="bert-base-uncased", device="cuda") 
            
mdl.fit(train_dataset, train_batch_size=32, num_train_epochs=20)

Prediction

preds = mdl.predict(test_dataset, eval_batch_size=32)

Evaluation

from sklearn.metrics import accuracy_score
from mcqa.data import get_labels

print(accuracy_score(preds, get_labels(train_dataset)))

References

Type	Title	Author	Year
📰 Paper	Are We Modeling the Task or the Annotator? An Investigation of Annotator Bias in Natural Language Understanding Datasets	Mor Geva, Yoav Goldberg, Jonathan Berant	2019
📰 Paper	Explain Yourself! Leveraging Language Models for Commonsense Reasoning	Nazneen Fatema Rajani, Bryan McCann, Caiming Xiong and Richard Socher	2019
📰 Paper	SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference	Rowan Zellers, Yonatan Bisk, Roy Schwartz and Yejin Choi	2018
📰 Paper	Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering	Todor Mihaylov, Peter Clark, Tushar Khot, Ashish Sabharwal	2018
📰 Paper	CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge	Alon Talmor, Jonathan Herzig, Nicholas Lourie, Jonathan Berant	2018
📰 Paper	RACE: Large-scale ReAding Comprehension Dataset From Examinations	Guokun Lai, Qizhe Xie, Hanxiao Liu, Yiming Yang and Eduard Hovy	2017
💻 Framework	Scikit-learn: Machine Learning in Python	Pedregosa et al.	2011
💻 Framework	PyTorch	Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan	2016
💻 Framework	Transformers: State-of-the-art Natural Language Processing for TensorFlow 2.0 and PyTorch.	Hugging Face	2018
📹 Video	Stanford CS224N: NLP with Deep Learning Lecture 10 – Question Answering	Christopher Manning	2019

LICENSE

Apache-2.0

Contributing

Read our Contributing Guidelines.

Citation

@misc{Taycir2019,
  author = {mcQA-suite},
  title = {mcQA},
  year = {2019},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/mcQA-suite/mcQA/}}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

mcQA : Multiple Choice Questions Answering

News 📢

Jun 6th, 2020

Installation

With pip

From source

Getting started

Data preparation

Model training

Prediction

Evaluation

References

LICENSE

Contributing

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

mcQA : Multiple Choice Questions Answering

News 📢

Jun 6th, 2020

Installation

With pip

From source

Getting started

Data preparation

Model training

Prediction

Evaluation

References

LICENSE

Contributing

Citation