Skip to content

distilbert trained on the IMDB dataset

Notifications You must be signed in to change notification settings

jzonthemtn/distilbert-imdb

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 

Repository files navigation

distilbert-imdb

This repository contains code to fine tune a classifier using the imdb dataset. The resulting model can then be converted to ONNX (and used by Apache OpenNLP).

For a trained model, see https://huggingface.co/jzonthemtn/distilbert-imdb.

For a walkthrough, see the blog post at https://medium.com/@jzonthemtn/hugging-face-transformers-now-enabled-in-apache-opennlp-by-onnx-runtime-f3ff923c7ff8.

Requirements

python3 -m pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116
python3 -m pip install transformers onnxruntime torch sklearn datasets

Train a Model

python3 train.py

Convert the Trained Model to ONNX

python3 -m transformers.onnx --model=checkpoints/checkpoint-376/ --feature sequence-classification exported-to-onnx

Training and Evaluation Summaries

Epochs Training Evaluation
1 {'eval_loss': 0.21822933852672577, 'eval_accuracy': 0.93224, 'eval_f1': 0.9321042084168337, 'eval_runtime': 225.0407, 'eval_samples_per_second': 111.091, 'eval_steps_per_second': 13.886}
5 {'train_runtime': 3537.2631, 'train_samples_per_second': 35.338, 'train_steps_per_second': 2.209, 'train_loss': 0.11836099642938479, 'epoch': 5.0} {'eval_loss': 0.3719911277294159, 'eval_accuracy': 0.93136, 'eval_f1': 0.9316987740805606, 'eval_runtime': 235.7789, 'eval_samples_per_second': 106.032, 'eval_steps_per_second': 6.629, 'epoch': 5.0}

About

distilbert trained on the IMDB dataset

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages