Skip to content

Iterative training on pseudo-labeled data experiment on the MNIST-dataset

License

Notifications You must be signed in to change notification settings

NiklasvonM/Self-Training

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Self-Training

This repository implements an iterative pseudo-labeling experiment: A CNN is trained on a small initial subset of the MNIST dataset, say 1000 samples instead of the full 60000. The model then predicts the labels of the remaining 59000 samples. Those data points for which the model has a high enough confidence, say above 95 %, are added to the training dataset with the predicted (as opposed to the true) labels. This process is then repeated, measuring the model's quality on the test dataset on each iteration.

process diagram

Getting Started

Installation

Install Poetry and then install the required packages by running poetry install.

There is no need to download any data as that is done automatically when you run the experiment for the first time.

Running the Experiment

Execute poetry run python scripts/run_experiment.py to run one experiment. To repeatedly run experiments with random confidence thresholds, run poetry run python scripts/run_multiple_experiments.py.

These scripts save their results to ./output. The results may then be plotted by the functions in self_training.plot, see scripts/plot_all.py.

Results

Initial Training Samples Iteration Confidence Threshold Test Accuracy
60000 1 NA 98.52 %
1000 1 99 % 90.73 %
1000 10 99 % 95.60 %

accuracy improvement over first iteration by confidence threshold and iteration

Accuracy on high confidence predictions Accuracy on low confidence predictions
high confidence accuracy by iteration low confidence accuracy by iteration
share of train labels that are correct by iteration number of train data points by iteration

About

Iterative training on pseudo-labeled data experiment on the MNIST-dataset

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages