SpacedAce quiz generation LLM trainig

This repository contains all steps necessary for reproduction of the trained LLM.

Steps to reproduce

1. Data scraping and preprocessing

The data was scraped from wikipedia's most vital articles list (level 2). These articles are used to attain context from which questions will be generated.

2. Question generation

A larger more capable model's output is used for generating quiz questions, that will serve as a reference for the model in training.

3. Context scoring

The quality of some of the questions was subpar. It was noticed that this is usually the case of bad contexts. In an attempt to filter out lower quality questions, the quality of the contexts was evaluated and subsequently the dataset can be filtered.

4. LLM training

Lora was used to finetune Llama-3-8B-Instruct for question generation on the generated data.

5. LLM response evaluation

Gemini-1.5-flash was tasked with preference scoring the generated questions to evaluate the quality of the questions on a multidimensional scale.

Repository structure

Each step of the training, from scraping to response evaluation has its own separate subdirectory. The subfolders contain further information about their contents. For more information, consult the README.md in each directory.

Data scraping and preprocessing - "scraping" directory
Question generation - "question_generation" directory
Context scoring - "context_scoring" directory
LLM training - "train-llm" directory
LLM response evaluation - "eval" directory

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SpacedAce quiz generation LLM trainig

Steps to reproduce

1. Data scraping and preprocessing

2. Question generation

3. Context scoring

4. LLM training

5. LLM response evaluation

Repository structure

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
context_scoring		context_scoring
eval		eval
question_generation		question_generation
scraping		scraping
train-llm		train-llm
README.md		README.md

spaced-ace/llm-training

Folders and files

Latest commit

History

Repository files navigation

SpacedAce quiz generation LLM trainig

Steps to reproduce

1. Data scraping and preprocessing

2. Question generation

3. Context scoring

4. LLM training

5. LLM response evaluation

Repository structure

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages