Skip to content

The codes for recent knowledge distillation algorithms and benchmark results via TF2.0 low-level API

Notifications You must be signed in to change notification settings

sseung0703/Knowledge_distillation_via_TF2.0

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Knowledge_distillation_via_TF2.0

  • Now, I'm fixing all the issues and refining the codes. It will be easier to understand how each KD works than before.
  • Algorithms are already implemented again, but they should be checked more with hyperparameter tuning.
    • the algorithms which have experimental results have been confirmed.
  • This Repo. will be upgraded version of my previous benchmark Repo. (link)

Implemented Knowledge Distillation Methods

Defined knowledge by the neural response of the hidden layer or the output layer of the network

Experimental Results

  • I use WResNet-40-4 and WResNet-16-4 as the teacher and the student network, respectively.
  • All the algorithm is trained in the sample configuration, which is described in "train_w_distillation.py", and only each algorithm's hyper-parameters are tuned. I tried only several times to get acceptable performance, which means that my experimental results are perhaps not optimal.
  • Although some of the algorithms used soft-logits parallelly in the paper, I used only the proposed knowledge distillation algorithm to make a fair comparison.
  • Initialization-based methods give a far higher performance in the start point but a poor performance in the last point due to overfitting. Therefore, initialized students must have a regularization algorithm, such as Soft-logits.

Training/Validation accuracy

Full Dataset 50% Dataset 25% Dataset 10% Dataset
Methods Accuracy Last Accuracy Last Accuracy Last Accuracy
Teacher 78.59 - - -
Student 76.25 - - -
Soft_logits 76.57 - - -
FitNet 75.78 - - -
AT 78.14 - - -
FSP 76.08 - - -
DML - - - -
KD_SVD - - - -
FT 77.30 - - -
AB 76.52 - - -
RKD 77.69 - - -
VID - - - -
MHGD - - - -
CO 78.54 - - -

Plan to do

  • Check all the algorithms.
  • do experiments.

About

The codes for recent knowledge distillation algorithms and benchmark results via TF2.0 low-level API

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages