Skip to content

A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (papers, repositories) that are missed by the repo.

Notifications You must be signed in to change notification settings

yifu-ding/awesome-model-quantization

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Awesome Model Quantization

A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (papers, repositories) that are missed by the repo.

Paper list

2020

Paper Tags Code Years
A Novel In-DRAM Accelerator Architecture for Binary Neural Network Hardware -- 2020
An Energy-Efficient and High Throughput in-Memory Computing Bit-Cell With Excellent Robustness Under Process Variations for Binary Neural Network Hardware -- 2020
BNN Pruning: Pruning Binary Neural Network Guided by Weight Flipping Frequency Binarization Link 2020
Compressing deep neural networks on FPGAs to binary and ternary precision with hls4ml Hardware -- 2020
End-to-end Learned Image Compression with Fixed Point Weight Quantization Low-bit Quantization -- 2020
Low-bit Quantization Needs Good Distribution Low-bit Quantization -- 2020
SIMBA: A Skyrmionic In-Memory Binary Neural Network Accelerator Hardware 2020
Training Binary Neural Networks with Real-to-Binary Convolutions Binarization Link 2020
Training with Quantization Noise for Extreme Model Compression Low-bit Quantization Link 2020
Phoenix: A Low-Precision Floating-Point Quantization Oriented Architecture for Convolutional Neural Networks Low-bit Quantization -- 2020
Towards Lossless Binary Convolutional Neural Networks Using Piecewise Approximation Binarization Not yet 2020
IMAC: In-Memory Multi-Bit Multiplication and ACcumulation in 6T SRAM Array Hardware -- 2020
Understanding Learning Dynamics of Binary Neural Networks via Information Bottleneck Binarization -- 2020
Training high-performance and large-scale deep neural networks with full 8-bit integers Low-bit Quantization -- 2020
MoBiNet: A Mobile Binary Network for Image Classification Binarization -- 2020
Controlling information capacity of binary neural network Binarization -- 2020
BinaryDuo: Reducing Gradient Mismatch in Binary Activation Network by Coupling Binary Activations Binarization Link 2020
Binary Neural Networks: A Survey Binarization -- 2020
An Energy-Efficient Bagged Binary Neural Network Accelerator Hardware; Binarization -- 2020
Forward and Backward Information Retention for Accurate Binary Neural Networks Binarization Link 2020
MeliusNet: Can Binary Neural Networks Achieve MobileNet-level Accuracy? Binarization Link 2020
Design of High Robustness BNN Inference Accelerator Based on Binary Memristors Hardware -- 2020
RPR: Random Partition Relaxation for Training Binary and Ternary Weight Neural Networks Binarization; Low-bit Quantization -- 2020
OrthrusPE: Runtime Reconfigurable Processing Elements for Binary Neural Networks Hardware -- 2020
Distillation Guided Residual Learning for Binary Convolutional Neural Networks Binarization -- 2020
A Resource-Efficient Inference Accelerator for Binary Convolutional Neural Networks Hardware -- 2020
How Does Batch Normalization Help Binary Training? Binarization -- 2020

2019

Paper Tags Code Years
Product Engine for Energy-Efficient Execution of Binary Neural Networks Using Resistive Memories Hardware, Binarization -- 2019
A Systematic Study of Binary Neural Networks' Optimisation Binarization -- 2019
Accurate and Compact Convolutional Neural Networks with Trained Binarization Binarization -- 2019
Balanced Circulant Binary Convolutional Networks Binarization -- 2019
Binary Ensemble Neural Network: More Bits per Network or More Networks per Bit? Binarization -- 2019
BNN+: Improved Binary Network Training Binarization -- 2019
Circulant Binary Convolutional Networks: Enhancing the Performance of 1-bit DCNNs with Circulant Back Propagation Binarization -- 2019
daBNN: A Super Fast Inference Framework for Binary Neural Networks on ARM devices Hardware, Binarization Link 2019
Deep Binary Reconstruction for Cross-Modal Hashing Binarization -- 2019
Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks Low-bit Quantization -- 2019
Dual Path Binary Neural Network Binarization -- 2019
Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices Hardware -- 2019
Fully Quantized Network for Object Detection Low-bit Quantization -- 2019
Hyperdrive: A Multi-Chip Systolically Scalable Binary-Weight CNN Inference Engine Hardware -- 2019
Improved training of binary networks for human pose estimation and image recognition Binarization -- 2019
Learning Channel-wise Interactions for Binary Convolutional Neural Networks Binarization -- 2019
MetaQuant: Learning to Quantize by Learning to Penetrate Non-differentiable Quantization Low-bit Quantization Link 2019
Proxquant: Quantized neural networks via proximal operators Low-bit Quantization, Binarization Link 2019
PXNOR: Perturbative Binary Neural Network Binarization Link 2019
Quantization Networks Low-bit Quantization Link 2019
Recursive Binary Neural Network Training Model for Efficient Usage of On-Chip Memory Binarization -- 2019
SeerNet: Predicting Convolutional Neural Network Feature-Map Sparsity through Low-Bit Quantization Low-bit Quantization -- 2019
Self-Binarizing Networks Binarization -- 2019
Towards Unified INT8 Training for Convolutional Neural Network Low-bit Quantization -- 2019
Training Accurate Binary Neural Networks from Scratch Binarization Link 2019
Using Neuroevolved Binary Neural Networks to solve reinforcement learning environments Binarization Link 2019
Xcel-RAM: Accelerating Binary Neural Networks in High-Throughput SRAM Compute Arrays Hardware -- 2019
XNOR-Net++: Improved binary neural networks Binarization -- 2019
An Energy-Efficient Reconfigurable Processor for Binary-and Ternary-Weight Neural Networks With Flexible Data Bit Width Binarization, Low-bit Quantization -- 2019

2018

Paper Tags Code Years
Two-Step Quantization for Low-bit Neural Networks Low-bit Quantization -- 2018
Extremely Low Bit Neural Network: Squeeze the Last Bit Out with ADMM Low-bit Quantization Link 2018
PACT: PARAMETERIZED CLIPPING ACTIVATION FOR QUANTIZED NEURAL NETWORKS Low-bit Quantization -- 2018
Towards Fast and Energy-Efficient Binarized Neural Network Inference on FPGA Hardware -- 2018
A Main/Subsidiary Network Framework for Simplifying Binary Neural Networks Binarization -- 2018
A Survey of FPGA-based Accelerators for Convolutional Neural Networks Hardware -- 2018
An Energy-Efficient Architecture for Binary Weight Convolutional Neural Networks Binarization -- 2018
Analysis and Implementation of Simple Dynamic Binary Neural Networks Binarization -- 2018
Apprentice: Using Knowledge Distillation Techniques To Improve Low-Precision Network Accuracy Low-bit Quantization -- 2018
BitFlow: Exploiting Vector Parallelism for Binary Neural Networks on CPU Binarization -- 2018
BitStream: Efficient Computing Architecture for Real-Time Low-Power Inference of Binary Neural Networks on CPUs Binarization, Hardware -- 2018
Blended Coarse Gradient Descent for Full Quantization of Deep Neural Networks Low-bit Quantization, Binarization -- 2018
BRein Memory: A Single-Chip Binary/Ternary Reconfigurable in-Memory Deep Neural Network Accelerator Achieving 1.4 TOPS at 0.6 W Hardware -- 2018
FBNA: A Fully Binarized Neural Network Accelerator Hardware -- 2018
FINN-R: An End-to-End Deep-Learning Framework for Fast Exploration of Quantized Neural Networks Hardware -- 2018
Loss-aware Binarization of Deep Networks Binarization -- 2018
ReBNet: Residual Binarized Neural Network Binarization Link 2018
Model compression via distillation and quantization Low-bit Quantization Link 2018
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference Low-bit Quantization -- 2018
Stochastic weights binary neural networks on FPGA Binarization -- 2018
Structured Binary Neural Networks for Accurate Image Classification and Semantic Segmentation Binarization -- 2018
SYQ: Learning Symmetric Quantization For Efficient Deep Neural Networks Low-bit Quantization Link 2018
Towards Fast and Energy-Efficient Binarized Neural Network Inference on FPGA Binarization, Hardware -- 2018
Training Binary Weight Networks via Semi-Binary Decomposition Binarization -- 2018
Training Competitive Binary Neural Networks from Scratch Binarization Link 2018
XNOR Neural Engine: A Hardware Accelerator IP for 21.6-fJ/op Binary Neural Network Inference Hardware -- 2018

2017

Paper Tags Code Years
Ternary Neural Networks with Fine-Grained Quantization Low-bit Quantization -- 2017
ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks Low-bit Quantization Link 2017
Towards Accurate Binary Convolutional Neural Network Binarization Link 2017
Deep Learning with Low Precision by Half-wave Gaussian Quantization Low-bit Quantization Link 2017
Performance Guaranteed Network Acceleration via High-Order Residual Quantization Low-bit Quantization -- 2017
From Hashing to CNNs: Training Binary Weight Networks via Hashing Binarization -- 2017
INCREMENTAL NETWORK QUANTIZATION: TOWARDS LOSSLESS CNNS WITH LOW-PRECISION WEIGHTS Low-bit Quantization Link 2017
Trained Ternary Quantization Low-bit Quantization Link 2017
On-chip Memory Based Binarized Convolutional Deep Neural Network Applying Batch Normalization Free Technique on an FPGA Hardware -- 2017
FP-BNN- Binarized neural network on FPGA Hardware -- 2017
WRPN: Wide Reduced-Precision Networks Low-bit Quantization -- 2017
Deep Learning Binary Neural Network on an FPGA Hardware, Binarization -- 2017
A GPU-Outperforming FPGA Accelerator Architecture for Binary Convolutional Neural Networks Hardware, Binarization -- 2017

2016

Paper Tags Code Years
Ternary weight networks Low-bit Quantization Link 2016
DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients Low-bit Quantization Link 2016
XNOR-Net- ImageNet Classification Using Binary Convolutional Neural Networks Binarization Link 2016
Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1 Binarization Link 2016
BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1 Binarization Link 2016

2015

Paper Tags Code Years
Bitwise Neural Networks Binarization -- 2015
BinaryConnect- Training Deep Neural Networks with binary weights during propagations Binarization Link 2015

Related Codes

Code From Description
PyTorch-Quant.py https://github.com/Ewenwan/pytorch-playground/blob/master/utee/quant.py Different quantization methods implement by Pytorch.
ZF-Net https://support.alpha-data.com/pub/appnotes/cnn/ An Open Source FPGA CNN Library

Docs

Doc Description
QuantizationMethods.md Quantization Methods
Embedded Deep Learning.md Run BNN in FPGA
An Open Source FPGA CNN Library.pdf Code: ZF-Net, Doc of An Open Source FPGA CNN Library
Accelerating CNN inference on FPGAs- A Survey.pdf Accelerating CNN inference on FPGAs: A Survey.

Reference

Our Team

Our team is part of the DIG group of the State Key Laboratory of Software Development Environment (SKLSDE), supervised Prof. Xianglong Liu. The main research goals of our team is compressing and accelerating models under multiple scenes.

Members

Ruihao Gong

Ruihao Gong is currently a third-year graduate student at Beihang University under the supervision of Prof. Xianglong Liu. Since 2017, he worked on the build-up of computer vision systems and model quantization as an intern at Sensetime Research, where he enjoyed working with the talented researchers and grew up a lot with the help of Fengwei Yu, Wei Wu, and Junjie Yan. During the early time of the internship, he independently took responsibility for the development of intelligent video analysis system Sensevideo. Later, he started the research on model quantization which can speed up the inference and even the training of neural networks on edge devices. Now he is devoted to further promoting the accuracy of extremely low-bit models and the auto-deployment of quantized models.

Haotong Qin

I am a Ph.D. student (Sep 2019 - ) in the State Key Laboratory of Software Development Environment (SKLSDE) and ShenYuan Honors College at Beihang University, supervised by Prof. Wei Liand Prof. Xianglong Liu. I obtained a B.Eng degree in computer science and engineering from Beihang University. I was a research intern (Jun 2020 - Aug 2020) at the WeiXin Group of Tencent. In my undergraduate study, I interned at the Speech group of Microsoft Research Asia (MSRA) supervised by Dr. Wenping Hu. I'm interested in deep learning, computer vision, and model compression. My research goal is to enable state-of-the-art neural network models to be successfully deployed on resource-limited hardware. This includes compressing and accelerating models on multiple tasks, and flexible and efficient deployment for multiple hardware.

Xiangguo Zhang

Xiangguo Zhang is a second-year graduate student in the School of Computer Science of Beihang University, under the guidance of Prof. Xianglong Liu. He received a bachelor's degree from Shandong University in 2019 and entered Beihang University in the same year. Currently, he is interested in computer vision and post training quantization.

Yifu Ding

Yifu Ding is a senior student in the School of Computer Science and Engineering at Beihang University. She is in the State Key Laboratory of Software Development Environment (SKLSDE), under the supervision of Prof. Xianglong Liu. Currently, she is interested in computer vision and model quantization. She thinks that neural network models which are highly compressed can be deployed on resource-constrained devices. And among all the compression methods, quantization is a potential one.

Our Work

Binary Neural Network: A Survey [PDF]

H. Qin, R. Gong, X. Liu*, X. Bai, J. Song, N. Sebe

Pattern Recognition (PR), 2020

Forward and Backward Information Retention for Accurate Binary Neural Networks [PDF]

H. Qin, R. Gong, X. Liu*, M. Shen, Z. Wei, F. Yu, J. Song

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020

Boosting Temporal Binary Coding for Large-scale Video Search

Y. Wu, X. Liu*, H. Qin , K. Xia, S. Hu, Y. Ma, M. Wang

IEEE Transactions on Multimedia (TMM), 2020

Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks

Ruihao Gong, Xianglong Liu*, Shenghu Jiang, Tianxiang Li, Peng Hu, Jiazhen Lin, Fengwei Yu, Junjie Yan

IEEE ICCV 2019

Towards Unified INT8 Training for Convolutional Neural Network

Feng Zhu, Ruihao Gong, Fengwei Yu, Xianglong Liu, Yanfei Wang, Zhelong Li, Xiuqi Yang, Junjie Yan

IEEE CVPR 2020

DMS: Differentiable Dimension Search for Binary Neural Networks

Yuhang Li and Ruihao Gong and Fengwei Yu and Xin Dong and Xianglong Liu

ICLR 2020 NAS workshop

Rotation Consistent Margin Loss for Efficient Low-bit Face Recognition

Yudong Wu, Yichao Wu, Ruihao Gong, Yuanhao Lv, Ken Chen, Ding Liang, Xiaolin Hu, Xianglong Liu, Junjie Yan

IEEE CVPR 2020

Balanced Binary Neural Networks with Gated Residual

Mingzhu Shen and Xianglong Liu and Ruihao Gong and Kai Han

ICASSP 2020

About

A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (papers, repositories) that are missed by the repo.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • VHDL 98.6%
  • Other 1.4%