Aristote is a personal learning companion! By analyzing video courses transcripts, Aristote crafts concise summaries and generates engaging quizzes, helping you grasp key concepts with ease. Whether you're reviewing lectures or exploring new topics, Aristote is here to make learning more interactive, efficient, and enjoyable.
Key Features:
- Metadata Generation: Extracts title, description, topics, and main discipline from video transcripts.
- Quiz Creation: Generates diverse quiz questions on various topics covered in the video.
- Quiz Evaluation: Utilizes GPT-4 to assess quiz quality based on specific criteria to accelerate the selection of the quizzes.
Disclaimer: Aristote can make mistakes. Aristote generates as many quizzes as possible so that a human can select the best quizzes and have as much new quiz ideas as possible.
Table of Contents
- How does it work?
- Getting Started: Use Aristote with Docker (Recommended)
- CLI Usage (for local testing without Docker)
- Contributing
- Credits
Here is a detailed description of the Aristote pipeline for each key feature:
- Chunk Processing: Splits transcripts into manageable segments.
- Text Enhancement: Rewrites each chunk with a Large Language Model (LLM) to improve readability.
- Summarization: Generates a summary for each chunk.
- Metadata Extraction: Extracts title, description, and other metadata from the set of summaries.
- Chunk Processing: Splits transcripts into chunks of varying lengths to create quizzes from local and global information.
- Text Enhancement: Rewrites each chunk with a Large Language Model (LLM) to improve readability.
- Quiz Generation: Generates a quiz from each rewritten chunk by sequentially asking for a question, the correct answer, three incorrect answers and an explanation for the correct answer.
- Quiz Assessment: Evaluates quizzes using GPT-4 based on specific boolean criteria:
- whether the question is really a question,
- whether the question is related to the subject of the course,
- whether the question is self-contained,
- whether the language is clear,
- whether the answers are all different,
- whether the answers are related,
- whether the fake answers are not obvious,
- whether the quiz is about a theoretical concept or a specific course example.
- Score aggregation: Computes the number of criteria that are successful.
- Quiz Ranking: Ranks quizzes based on the number of successful criteria.
This diagram shows how the pipeline works:
Change VLLM_API_URL environment variable in .env to the URL of the API (example : https://api.openai.com/v1/chat/completions) and optionally set VLLM_TOKEN if authentification is needed.
docker build -t aristote -f server/Dockerfile . && docker run --env-file .env --network="host" -p 3000:3000 --name aristote aristote
Warning: --network="host"
only works on Linux.
The below scripts do the following :
- Request a job from AristoteAPI
- If a job is returned by AristoteAPI, extract relevent information from the response and handle the job
- Send back the result of the treatment to AristoteAPI
docker exec -it aristote python aristote/generate_quizz.py
docker exec -it aristote python aristote/translate_quizz.py
docker exec -it aristote python aristote/evaluate_quizz.py
- Docker and Docker Compose (recommended)
- 16GB+ GPU VRAM for running Llama 3 8B
- Copy .env.dist to .env and configure as needed.
- Run:
docker compose up
- Go to
http://localhost:3000/docs
to access the API documentation.
You can launch the LLM service and the Metadata/Quiz Generation service separately.
docker run --runtime nvidia --gpus all \
-v ~/.cache/huggingface:/root/.cache/huggingface \
-p 8000:8000 \
--ipc=host \
--env "HUGGING_FACE_HUB_TOKEN=${HF_TOKEN}" \
vllm/vllm-openai:latest \
--model meta-llama/Meta-Llama-3-8B-Instruct \
--tokenizer meta-llama/Meta-Llama-3-8B-Instruct \
--dtype bfloat16 \
--tensor-parallel-size 1
docker build -t aristote -f server/Dockerfile . && docker run --env-file .env --network="host" -p 3000:3000 --name aristote aristote
Warning: --network="host"
only works on Linux.
- Access to a LLM deployed following the OpenAI endpoint conventions. You can deploy your LLM like here or you can just use OpenAI's API.
- Set up a virtual environment and install dependencies (we recommend installing dependencies with uv).
uv pip install .
You can also simply install the dependencies with pip
.
pip install .
- Copy
.env.dist
to.env
and configure as needed. Load your environment variables with:export $(cat .env | xargs)
.
aristote generate-metadata {METADATA_YML_CONFIG_PATH}
Examples of configs are here to use an OpenAI model and here for a HuggingFace model deployed through a /v1/chat/completions
route.
aristote generate-quizzes {QUIZ_GEN_YML_CONFIG_PATH}
Examples of configs are here to use an OpenAI model and here for a HuggingFace model deployed through a /v1/chat/completions
route.
To contribute, we recommend installing the just
command runner.
Then you can setup the dev dependencies with:
just install
Launch tests with:
just test
Check linting with:
just lint
And reformat files with:
just format
This project is based on a prototype made by four students from the Paris Digital Lab: Antoine Vaglio, Liwei Sun, Mohammed Bahhad et Pierre-Louis Veyrenc.
Illuin Technology perfected the project and made it available as it is now for CentraleSupélec. The main contributors are Mohamed-Ali Barka, António Loison and Bruno Hays from Illuin Technology.