This repo contains the code of the work. We benchmark 11 state-of-the-art knowledge distillation methods with spot-adaptive KD in PyTorch, including:
- (FitNet) - Fitnets: hints for thin deep nets
- (AT) - Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer
- (SP) - Similarity-Preserving Knowledge Distillation
- (CC) - Correlation Congruence for Knowledge Distillation
- (VID) - Variational Information Distillation for Knowledge Transfer
- (RKD) - Relational Knowledge Distillation
- (PKT) - Probabilistic Knowledge Transfer for deep representation learning
- (FT) - Paraphrasing Complex Network: Network Compression via Factor Transfer
- (FSP) - A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning
- (NST) - Like what you like: knowledge distill via neuron selectivity transfer
- (CRD) - Contrastive Representation Distillation
1.Fetch the pretrained teacher models by:
sh train_single.sh
which will run the code and save the models to ./run/$dataset/$seed/$model/ckpt
The flags in train_single.sh
can be explained as:
seed
: specify the random seed.dataset
: specify the training dataset.num_classes
: give the number of categories of the above dataset.model
: specify the model, see'models/init.py'
to check the available model types.
Note: the default setting can be seen in config files from 'configs/$dataset/seed-$seed/single/$model.yml'
.
2.Run our spot-adaptive KD by:
sh train.sh
@article{song2022spot,
title={Spot-adaptive knowledge distillation},
author={Song, Jie and Chen, Ying and Ye, Jingwen and Song, Mingli},
journal={IEEE Transactions on Image Processing},
volume={31},
pages={3359--3370},
year={2022},
publisher={IEEE}
}