GitHub

Context-Aware Answer Extraction in Question Answering

Yeon Seonwoo, Ji-Hoon Kim, Jung-Woo Ha, Alice Oh | Paper

KAIST, NAVER CLOVA, NAVER AI LAB

Abstract

Extractive QA models have shown very promising performance in predicting the correct answer to a question for a given passage. However, they sometimes result in predicting the correct answer text but in a context irrelevant to the given question. This discrepancy becomes especially important as the number of occurrences of the answer text in a passage increases. To resolve this issue, we propose BLANC (BLock AttentioN for Context prediction) based on two main ideas: context prediction as an auxiliary task in multi-task learning manner, and a block attention method that learns the context prediction task. With experiments on reading comprehension, we show that BLANC outperforms the state-of-the-art QA models, and the performance gap increases as the number of answer text occurrences increases. We also conduct an experiment of training the models using SQuAD and predicting the supporting facts on HotpotQA and show that BLANC outperforms all baseline models in this zero-shot setting.

Overview of BLANC

Getting Started

Build train/dev/test set from NaturalQ-MRQA dataset

Download NaturalQ (MRQA) train/dev dataset (link) and locate the train/dev set in data/naturalQ/ directory
Rename the dev set to test.jsonl.gz
Run split_data.py in code/src/preprocessor as follows:

python split_data.py \
    --source ../../../data/naturalQ/train.jsonl.gz \
    --train_output train.jsonl \
    --dev_output dev.jsonl \
    --data_type mrqa

Compress train.jsonl and dev.jsonl with gzip (gzip train.jsonl; gzip dev.jsonl)
Locate these two files into data/naturalQ/ directory
Now you have train/dev/test set in data/naturalQ directory

Train BLANC on NaturalQ

Run BLANC script in code/ as follows:

LABEL=trial_001 GEOP=0.99 WINS=3 LMB=0.8 bash run_blanc_naturalqa.sh

Test BLANC

Indicate which checkpoint you want to test.

In this case, we test BLANC with trial_001 checkpoint.

LABEL=trial_001 GEOP=0.99 WINS=3 LMB=0.8 bash run_blanc_naturalqa_test.sh

Code Reference

https://github.com/facebookresearch/SpanBERT

Copyright

Acknowledgement

This work was partly supported by NAVER Corp. and Institute for Information & communications Technology Promotion(IITP) grant funded by the Korea government(MSIP) (2017-0-01780, The technology development for event recognition/relational reasoning and learning knowledge based system for video understanding).

How to cite

@inproceedings{seonwoo-etal-2020-context,
    title = "Context-Aware Answer Extraction in Question Answering",
    author = "Seonwoo, Yeon  and
      Kim, Ji-Hoon  and
      Ha, Jung-Woo  and
      Oh, Alice",
    booktitle = "EMNLP",
    year = "2020",
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
code		code
img		img
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Context-Aware Answer Extraction in Question Answering

Abstract

Overview of BLANC

Getting Started

Build train/dev/test set from NaturalQ-MRQA dataset

Train BLANC on NaturalQ

Test BLANC

Code Reference

Copyright

Acknowledgement

How to cite

About

Releases

Packages

Languages

License

yeonsw/BLANC

Folders and files

Latest commit

History

Repository files navigation

Context-Aware Answer Extraction in Question Answering

Abstract

Overview of BLANC

Getting Started

Build train/dev/test set from NaturalQ-MRQA dataset

Train BLANC on NaturalQ

Test BLANC

Code Reference

Copyright

Acknowledgement

How to cite

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages