Implementation for the paper "Light Multi-segment Activation for model compression", which has been accepted by AAAI'2020.
If you find this code useful in your research, please cite the paper:
Zhenhui Xu, Guolin Ke, Jia Zhang, Jiang Bian, and Tie-Yan Liu. "Light Multi-segment Activation for Model Compression." In Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI, 2020.
This repo is built for the experimental codes in our paper, containing all the model implementation, data acquisition, and parameter settings. Here we thanks the authors of the codebase, and our repo is improved from it.
There are three main functional folders, cnn_models
, translation_models
and onmt
. cnn_models
contains all the detailed implementations of CNN architectures; translation_models
contains the high-level definition of translational models and onmt
is part of the OpenNMT-py codebase, containing much detailed translation-related implementation (transformers etc.). Besides, after running our codes, some folders named summary
, models
, datas
, manager
for some result storages may be created.
Moreover, there are three main_*.py
for the experimental entries. main.py
is for the experiments on CIFAR-10, main_wrn.py
is for running Wide Residual Networks on CIFAR-100, main_opennmt
is for the translational experiments on both Ope and WMT13, main_joint.py
is used for testing the effectiveness when jointly using Quantized Distillation and LMA. Besides, activations.py
contains the implementations of LMA and
the other baseline activations. model_manager.py
mainly supports the distillation framework, which can well manage the teacher model and student models.
FYI, scripts
is for batch running, perl_scripts
is for BLEU computation; helpers
contains some common utils; datasets
is for data acquisition, which supports download and decompresses the datas automatically; quantization
is mainly succeeded from the original codebase and can be used for joint experiments.
The models are based on Python and Pytorch, to run the codes, please set up the environment first:
- Install Anaconda with Python >= 3.6. Miniconda is a quick way to get started.
- Clone the repository
git clone https://github.com/LMA-AAAI/LMA
- Run the conda file script to create a conda environment:
cd LMA conda env create -f environment.yml
To train a teacher model, using the following script:
python main.py -train_teacher -manager 0906
It will produce a new trained teacher model saved in models
and the model manager will be saved in manager
. The arguments not specified are set to the default settings in the main file. And more default settings for translational models are here.
After the above step, we can train a student model under the supervision of from the teacher and ground truth. For example,
for sed in 1 2 3 4 5
do
for stmodel in 0 1 2
do
python main.py -train_student -manager 0906 -stModel $stmodel -stud_act relu -plot_title 0523 -seed $sed
python main.py -train_student -manager 0906 -stModel $stmodel -stud_act lma -plot_title 0523 -seed $sed
python main.py -train_student -manager 0906 -stModel $stmodel -stud_act swish -plot_title 0523 -seed $sed
python main.py -train_student -manager 0906 -stModel $stmodel -stud_act aplu -plot_title 0523 -seed $sed -num_bins 8
python main.py -train_student -manager 0906 -stModel $stmodel -stud_act prelu -plot_title 0523 -seed $sed -num_bins 8
done
done
The following table shows the example results by running the scripts on CIFAR-10 with Student 1.
Method | Seed 1 | Seed 2 | Seed 3 | Seed 4 | Seed 5 | Mean | Std |
---|---|---|---|---|---|---|---|
ReLU | 88.64 | 88.91 | 88.35 | 88.98 | 88.83 | 89.92 | 0.21 |
PReLU | 89.49 | 89.20 | 89.41 | 89.69 | 88.76 | 89.31 | 0.35 |
Swish | 89.18 | 88.92 | 88.97 | 89.11 | 89.00 | 89.03 | 0.11 |
APLU-8 | 89.67 | 90.12 | 89.75 | 89.94 | 90.13 | 89.92 | 0.21 |
LMA-8 | 90.59 | 90.27 | 90.84 | 90.62 | 90.53 | 90.57 | 0.20 |
Thanks for your visiting, if you have any questions, please new an issue.