This multiple choice question dataset on a fictional organ, the Glianorex, is used to assess the capabilities of models to answer questions on knowledge they have never encountered.
The data generation pipeline is provided in this repository. It supports OpenAI and Anthropic models.
While we provide a pre-generated dataset, we encourage researchers and model evaluators to generate their own private data, ensuring your models have not been contaminated by the dataset and content found in this repository.
Paper: Multiple Choice Questions and Large Languages Models: A Case Study with Fictional Medical Data
The dataset is composed of 976 questions with 4 options, only 1 option is correct.
Questions can be in English or French and are tagged by the language
column.
In addition, a generator
column contains the name of the model used to generate the data.
- Run
cargo build
- Set your OpenAI API key,
export OPENAI_API_KEY=sk-...
or Anthropic API keyexport ANTHROPIC_API_KEY=...
- Run
cargo run <model>
withmodel
being one of GPT models (for examplegpt-4
,gpt-3.5-turbo
) orclaude
.
Multiple .json
files will be generated containing the different data generated.
book-toc.json
contains the table of content of the book.book-content-en.json
is the entire book in English, respecting the same structure asbook-toc.json
book-content-fr.json
is the entire book in French, respecting the same structure asbook-toc.json
book-qa-en.json
contains the question and answer samples generated in English based on chapters inbook-content-en.json
book-qa-fr.json
contains the question and answer samples translated from English