Skip to content

This repository contains notebooks for two approaches to the Automated Essay Scoring Kaggle competition. The goal of this competition is to train a model to score student essays accurately.

Notifications You must be signed in to change notification settings

jdpsc/essay-scoring-notebooks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Automated Essay Scoring Notebooks

This repository contains notebooks for two approaches to the Automated Essay Scoring Kaggle competition. The goal of this competition is to train a model to score student essays efficiently and accurately. A reliable automated scoring system can significantly reduce the time and cost of grading, allowing the inclusion of essay questions in standardized testing, which are crucial indicators of student learning but often avoided due to grading challenges.

Evaluation is based on the quadratic weighted kappa, a metric that measures the agreement between two outcomes.

For more details about the competition, see the official Kaggle page.

Approach 1: Fine-tuned DeBERTa-v3-small

  • Base model(s): deberta-v3-small
  • Methodology:
    • Used the HuggingFace Trainer API to fine-tune the model.
    • Added new tokens to the tokenizer to handle "new paragraph" and "double space" as DeBERTa removes these from essays.
  • Score: 0.80

Approach 2: Multi-LLM Embedding Extraction + LightGBM

  • Base model(s): deberta-base, deberta-large, deberta-v3-large, longformer-base-4096, bigbird-roberta-base, bigbird-roberta-large
  • Methodology:
    • Extracted embeddings using the HuggingFace API.
    • Used concatenated embeddings as input for a LightGBM model.
    • Applied threshold post-processing as discussed here.
  • Score: 0.81

Common Techniques Used in Both Approaches

  • Cross-Validation: Employed Stratified K-Fold Cross-Validation.
  • Evaluation Metric: Implemented Quadratic Weighted Kappa (QWK) as the evaluation metric.

About

This repository contains notebooks for two approaches to the Automated Essay Scoring Kaggle competition. The goal of this competition is to train a model to score student essays accurately.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published