This repository is the official PyTorch implementation of NeurIPS 2022 paper Structural Pruning via Latency-Saliency Knapsack.
Useful links:
Please check the LICENSE file. HALP may be used non-commercially, meaning for research or evaluation purposes only. For business inquiries, please contact researchinquiries@nvidia.com.
-
Prepare Environment
To run the code and reproduce the results, it is highly recommended to create the docker image using Dockerfile.
Alternatively, please run the code with virtual environment with Python 3.6, and install the necessary packages:
pip install torch==1.4.0 pip install torchvision==0.5.0 pip install numpy pip install Pillow pip install PyYAML pip install pandas
Additionally install APEX library for FP16 support: Installing NVIDIA APEX
-
Download Pretrained Models
We provide the pretrained baseline models in Google Drive. Please download and put the pretrained models in the folder
model_ckpt/
. -
Download Latency LUT
The latency lookup table is provided: ResNet50_on_TitanV
Please download the latency lookup table file and put it under folder
LUT/
. -
Prepare Data
Download the ImageNet1K and modify the
data_root
in config file to the correct path accordingly.
Train a ResNet50 baseline
python multiproc.py --nproc_per_node 8 main.py --exp configs/exp_configs/rn50_imagenet_baseline.yaml --no_prune
Prune a ResNet50
python multiproc.py --nproc_per_node 8 main.py --exp configs/exp_configs/rn50_imagenet_prune.yaml --pretrained model_ckpt/resnet50_full.pth
Evaluate a pruned ResNet50 before removing the zero weights
python multiproc.py --nproc_per_node 8 main.py --pretrained model_ckpt/resnet50_halp55.pth --eval_only
Evaluate a pruned ResNet50 after removing the zero weights
python multiproc.py --nproc_per_node 8 main.py --pretrained model_ckpt/resnet50_halp55_clean.pth --mask model_ckpt/resnet50_halp55_group_mask.pkl --eval_only
Measure the actualy latency of a pruned model
python profile.py --model_path model_ckpt/resnet50_halp55_clean.pth --mask_path model_ckpt/resnet50_halp55_group_mask.pkl
Model | FLOPs | Top-1 Acc | Top-5 Acc | FPS | Checkpoint |
---|---|---|---|---|---|
ResNet50 | 2.998G | 77.44 | 93.74 | 1213 | RN50-HALP80 |
1.957G | 76.47 | 93.11 | 1674 | RN50-HALP55 | |
1.113G | 74.41 | 91.85 | 2610 | RN50-HALP30 |
@inproceedings{shen2022structural,
title={Structural Pruning via Latency-Saliency Knapsack},
author={Shen, Maying and Yin, Hongxu and Molchanov, Pavlo and Mao, Lei and Liu, Jianna and Alvarez, Jose},
booktitle={Advances in Neural Information Processing Systems},
year={2022}
}