DisGeM: Distractor Generation for Multiple Choice Questions with Span Masking

A Distractor Generation framework utilizing Pre-trained Language Models (PLMs) that are pre-trained with Masked Language Modeling (MLM) objective.

Paper

Abstract

Recent advancements in Natural Language Processing (NLP) have impacted numerous sub-fields such as natural language generation, natural language inference, question answering, and more. However, in the field of question generation, the creation of distractors for multiple-choice questions (MCQ) remains a challenging task. In this work, we present a simple, generic framework for distractor generation using readily available Large Language Models (LLMs). Unlike previous methods, our framework relies solely on pre-trained language models and does not require additional training on specific datasets. Building upon previous research, we introduce a two-stage framework consisting of candidate generation and candidate selection. Our proposed distractor generation framework outperforms previous methods without the need for training or fine-tuning. Human evaluations confirm that our approach produces more effective and engaging distractors. The related codebase is publicly available at https://github.com/obss/disgem.

Installation

Clone the repository.

git clone https://github.com/obss/disgem.git
cd disgem

In the project root, create a virtual environment (preferably using conda) as follows:

conda env create -f environment.yml

Datasets

Download datasets by the following command. This script will download CLOTH and DGen datasets.

bash scripts/download_data.sh

Generate Distractors

To see the arguments for generation see python -m generate --help.

The following provides an example to generate distractors for CLOTH test-high dataset. You can alter top-k and dispersion parameters as needed.

python -m generate data/CLOTH/test/high --data-format cloth --top-k 3 --dispersion 0 --output-path cloth_test_outputs.json

Contributing

Format and check the code style of the codebase as follows.

To check the codestyle,

python -m scripts.run_code_style check

To format the codebase,

python -m scripts.run_code_style format

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
.github/workflows		.github/workflows
disgem		disgem
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
generate.py		generate.py
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DisGeM: Distractor Generation for Multiple Choice Questions with Span Masking

Abstract

Installation

Datasets

Generate Distractors

Contributing

About

Releases

Packages

Languages

License

obss/disgem

Folders and files

Latest commit

History

Repository files navigation

DisGeM: Distractor Generation for Multiple Choice Questions with Span Masking

Abstract

Installation

Datasets

Generate Distractors

Contributing

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages