Automated Essay Scoring Notebooks

This repository contains notebooks for two approaches to the Automated Essay Scoring Kaggle competition. The goal of this competition is to train a model to score student essays efficiently and accurately. A reliable automated scoring system can significantly reduce the time and cost of grading, allowing the inclusion of essay questions in standardized testing, which are crucial indicators of student learning but often avoided due to grading challenges.

Evaluation is based on the quadratic weighted kappa, a metric that measures the agreement between two outcomes.

For more details about the competition, see the official Kaggle page.

Approach 1: Fine-tuned DeBERTa-v3-small

Base model(s): deberta-v3-small
Methodology:
- Used the HuggingFace Trainer API to fine-tune the model.
- Added new tokens to the tokenizer to handle "new paragraph" and "double space" as DeBERTa removes these from essays.
Score: 0.80

Approach 2: Multi-LLM Embedding Extraction + LightGBM

Base model(s): deberta-base, deberta-large, deberta-v3-large, longformer-base-4096, bigbird-roberta-base, bigbird-roberta-large
Methodology:
- Extracted embeddings using the HuggingFace API.
- Used concatenated embeddings as input for a LightGBM model.
- Applied threshold post-processing as discussed here.
Score: 0.81

Common Techniques Used in Both Approaches

Cross-Validation: Employed Stratified K-Fold Cross-Validation.
Evaluation Metric: Implemented Quadratic Weighted Kappa (QWK) as the evaluation metric.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
deberta-v3-fine-tuning.ipynb		deberta-v3-fine-tuning.ipynb
multi-llm-lgbm.ipynb		multi-llm-lgbm.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automated Essay Scoring Notebooks

Approach 1: Fine-tuned DeBERTa-v3-small

Approach 2: Multi-LLM Embedding Extraction + LightGBM

Common Techniques Used in Both Approaches

About

Releases

Packages

Languages

jdpsc/essay-scoring-notebooks

Folders and files

Latest commit

History

Repository files navigation

Automated Essay Scoring Notebooks

Approach 1: Fine-tuned DeBERTa-v3-small

Approach 2: Multi-LLM Embedding Extraction + LightGBM

Common Techniques Used in Both Approaches

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages