Awesome Model Quantization

A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (papers, repositories) that are missed by the repo.

Paper list

2020

Paper	Tags	Code	Years
A Novel In-DRAM Accelerator Architecture for Binary Neural Network	Hardware	--	2020
An Energy-Efficient and High Throughput in-Memory Computing Bit-Cell With Excellent Robustness Under Process Variations for Binary Neural Network	Hardware	--	2020
BNN Pruning: Pruning Binary Neural Network Guided by Weight Flipping Frequency	Binarization	Link	2020
Compressing deep neural networks on FPGAs to binary and ternary precision with hls4ml	Hardware	--	2020
End-to-end Learned Image Compression with Fixed Point Weight Quantization	Low-bit Quantization	--	2020
Low-bit Quantization Needs Good Distribution	Low-bit Quantization	--	2020
SIMBA: A Skyrmionic In-Memory Binary Neural Network Accelerator	Hardware		2020
Training Binary Neural Networks with Real-to-Binary Convolutions	Binarization	Link	2020
Training with Quantization Noise for Extreme Model Compression	Low-bit Quantization	Link	2020
Phoenix: A Low-Precision Floating-Point Quantization Oriented Architecture for Convolutional Neural Networks	Low-bit Quantization	--	2020
Towards Lossless Binary Convolutional Neural Networks Using Piecewise Approximation	Binarization	Not yet	2020
IMAC: In-Memory Multi-Bit Multiplication and ACcumulation in 6T SRAM Array	Hardware	--	2020
Understanding Learning Dynamics of Binary Neural Networks via Information Bottleneck	Binarization	--	2020
Training high-performance and large-scale deep neural networks with full 8-bit integers	Low-bit Quantization	--	2020
MoBiNet: A Mobile Binary Network for Image Classification	Binarization	--	2020
Controlling information capacity of binary neural network	Binarization	--	2020
BinaryDuo: Reducing Gradient Mismatch in Binary Activation Network by Coupling Binary Activations	Binarization	Link	2020
Binary Neural Networks: A Survey	Binarization	--	2020
An Energy-Efficient Bagged Binary Neural Network Accelerator	Hardware; Binarization	--	2020
Forward and Backward Information Retention for Accurate Binary Neural Networks	Binarization	Link	2020
MeliusNet: Can Binary Neural Networks Achieve MobileNet-level Accuracy?	Binarization	Link	2020
Design of High Robustness BNN Inference Accelerator Based on Binary Memristors	Hardware	--	2020
RPR: Random Partition Relaxation for Training Binary and Ternary Weight Neural Networks	Binarization; Low-bit Quantization	--	2020
OrthrusPE: Runtime Reconfigurable Processing Elements for Binary Neural Networks	Hardware	--	2020
Distillation Guided Residual Learning for Binary Convolutional Neural Networks	Binarization	--	2020
A Resource-Efficient Inference Accelerator for Binary Convolutional Neural Networks	Hardware	--	2020
How Does Batch Normalization Help Binary Training?	Binarization	--	2020

2019

Paper	Tags	Code	Years
Product Engine for Energy-Efficient Execution of Binary Neural Networks Using Resistive Memories	Hardware, Binarization	--	2019
A Systematic Study of Binary Neural Networks' Optimisation	Binarization	--	2019
Accurate and Compact Convolutional Neural Networks with Trained Binarization	Binarization	--	2019
Balanced Circulant Binary Convolutional Networks	Binarization	--	2019
Binary Ensemble Neural Network: More Bits per Network or More Networks per Bit?	Binarization	--	2019
BNN+: Improved Binary Network Training	Binarization	--	2019
Circulant Binary Convolutional Networks: Enhancing the Performance of 1-bit DCNNs with Circulant Back Propagation	Binarization	--	2019
daBNN: A Super Fast Inference Framework for Binary Neural Networks on ARM devices	Hardware, Binarization	Link	2019
Deep Binary Reconstruction for Cross-Modal Hashing	Binarization	--	2019
Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks	Low-bit Quantization	--	2019
Dual Path Binary Neural Network	Binarization	--	2019
Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices	Hardware	--	2019
Fully Quantized Network for Object Detection	Low-bit Quantization	--	2019
Hyperdrive: A Multi-Chip Systolically Scalable Binary-Weight CNN Inference Engine	Hardware	--	2019
Improved training of binary networks for human pose estimation and image recognition	Binarization	--	2019
Learning Channel-wise Interactions for Binary Convolutional Neural Networks	Binarization	--	2019
MetaQuant: Learning to Quantize by Learning to Penetrate Non-differentiable Quantization	Low-bit Quantization	Link	2019
Proxquant: Quantized neural networks via proximal operators	Low-bit Quantization, Binarization	Link	2019
PXNOR: Perturbative Binary Neural Network	Binarization	Link	2019
Quantization Networks	Low-bit Quantization	Link	2019
Recursive Binary Neural Network Training Model for Efficient Usage of On-Chip Memory	Binarization	--	2019
SeerNet: Predicting Convolutional Neural Network Feature-Map Sparsity through Low-Bit Quantization	Low-bit Quantization	--	2019
Self-Binarizing Networks	Binarization	--	2019
Towards Unified INT8 Training for Convolutional Neural Network	Low-bit Quantization	--	2019
Training Accurate Binary Neural Networks from Scratch	Binarization	Link	2019
Using Neuroevolved Binary Neural Networks to solve reinforcement learning environments	Binarization	Link	2019
Xcel-RAM: Accelerating Binary Neural Networks in High-Throughput SRAM Compute Arrays	Hardware	--	2019
XNOR-Net++: Improved binary neural networks	Binarization	--	2019
An Energy-Efficient Reconfigurable Processor for Binary-and Ternary-Weight Neural Networks With Flexible Data Bit Width	Binarization, Low-bit Quantization	--	2019

2018

Paper	Tags	Code	Years
Two-Step Quantization for Low-bit Neural Networks	Low-bit Quantization	--	2018
Extremely Low Bit Neural Network: Squeeze the Last Bit Out with ADMM	Low-bit Quantization	Link	2018
PACT: PARAMETERIZED CLIPPING ACTIVATION FOR QUANTIZED NEURAL NETWORKS	Low-bit Quantization	--	2018
Towards Fast and Energy-Efficient Binarized Neural Network Inference on FPGA	Hardware	--	2018
A Main/Subsidiary Network Framework for Simplifying Binary Neural Networks	Binarization	--	2018
A Survey of FPGA-based Accelerators for Convolutional Neural Networks	Hardware	--	2018
An Energy-Efficient Architecture for Binary Weight Convolutional Neural Networks	Binarization	--	2018
Analysis and Implementation of Simple Dynamic Binary Neural Networks	Binarization	--	2018
Apprentice: Using Knowledge Distillation Techniques To Improve Low-Precision Network Accuracy	Low-bit Quantization	--	2018
BitFlow: Exploiting Vector Parallelism for Binary Neural Networks on CPU	Binarization	--	2018
BitStream: Efficient Computing Architecture for Real-Time Low-Power Inference of Binary Neural Networks on CPUs	Binarization, Hardware	--	2018
Blended Coarse Gradient Descent for Full Quantization of Deep Neural Networks	Low-bit Quantization, Binarization	--	2018
BRein Memory: A Single-Chip Binary/Ternary Reconfigurable in-Memory Deep Neural Network Accelerator Achieving 1.4 TOPS at 0.6 W	Hardware	--	2018
FBNA: A Fully Binarized Neural Network Accelerator	Hardware	--	2018
FINN-R: An End-to-End Deep-Learning Framework for Fast Exploration of Quantized Neural Networks	Hardware	--	2018
Loss-aware Binarization of Deep Networks	Binarization	--	2018
ReBNet: Residual Binarized Neural Network	Binarization	Link	2018
Model compression via distillation and quantization	Low-bit Quantization	Link	2018
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference	Low-bit Quantization	--	2018
Stochastic weights binary neural networks on FPGA	Binarization	--	2018
Structured Binary Neural Networks for Accurate Image Classification and Semantic Segmentation	Binarization	--	2018
SYQ: Learning Symmetric Quantization For Efficient Deep Neural Networks	Low-bit Quantization	Link	2018
Towards Fast and Energy-Efficient Binarized Neural Network Inference on FPGA	Binarization, Hardware	--	2018
Training Binary Weight Networks via Semi-Binary Decomposition	Binarization	--	2018
Training Competitive Binary Neural Networks from Scratch	Binarization	Link	2018
XNOR Neural Engine: A Hardware Accelerator IP for 21.6-fJ/op Binary Neural Network Inference	Hardware	--	2018

2017

Paper	Tags	Code	Years
Ternary Neural Networks with Fine-Grained Quantization	Low-bit Quantization	--	2017
ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks	Low-bit Quantization	Link	2017
Towards Accurate Binary Convolutional Neural Network	Binarization	Link	2017
Deep Learning with Low Precision by Half-wave Gaussian Quantization	Low-bit Quantization	Link	2017
Performance Guaranteed Network Acceleration via High-Order Residual Quantization	Low-bit Quantization	--	2017
From Hashing to CNNs: Training Binary Weight Networks via Hashing	Binarization	--	2017
INCREMENTAL NETWORK QUANTIZATION: TOWARDS LOSSLESS CNNS WITH LOW-PRECISION WEIGHTS	Low-bit Quantization	Link	2017
Trained Ternary Quantization	Low-bit Quantization	Link	2017
On-chip Memory Based Binarized Convolutional Deep Neural Network Applying Batch Normalization Free Technique on an FPGA	Hardware	--	2017
FP-BNN- Binarized neural network on FPGA	Hardware	--	2017
WRPN: Wide Reduced-Precision Networks	Low-bit Quantization	--	2017
Deep Learning Binary Neural Network on an FPGA	Hardware, Binarization	--	2017
A GPU-Outperforming FPGA Accelerator Architecture for Binary Convolutional Neural Networks	Hardware, Binarization	--	2017

2016

Paper	Tags	Code	Years
Ternary weight networks	Low-bit Quantization	Link	2016
DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients	Low-bit Quantization	Link	2016
XNOR-Net- ImageNet Classification Using Binary Convolutional Neural Networks	Binarization	Link	2016
Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1	Binarization	Link	2016
BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1	Binarization	Link	2016

2015

Paper	Tags	Code	Years
Bitwise Neural Networks	Binarization	--	2015
BinaryConnect- Training Deep Neural Networks with binary weights during propagations	Binarization	Link	2015

Related Codes

Code	From	Description
PyTorch-Quant.py	https://github.com/Ewenwan/pytorch-playground/blob/master/utee/quant.py	Different quantization methods implement by Pytorch.
ZF-Net	https://support.alpha-data.com/pub/appnotes/cnn/	An Open Source FPGA CNN Library

Docs

Doc	Description
QuantizationMethods.md	Quantization Methods
Embedded Deep Learning.md	Run BNN in FPGA
An Open Source FPGA CNN Library.pdf	Code: ZF-Net, Doc of An Open Source FPGA CNN Library
Accelerating CNN inference on FPGAs- A Survey.pdf	Accelerating CNN inference on FPGAs: A Survey.

Reference

Our Team

Our team is part of the DIG group of the State Key Laboratory of Software Development Environment (SKLSDE), supervised Prof. Xianglong Liu. The main research goals of our team is compressing and accelerating models under multiple scenes.

Members

Ruihao Gong

Ruihao Gong is currently a third-year graduate student at Beihang University under the supervision of Prof. Xianglong Liu. Since 2017, he worked on the build-up of computer vision systems and model quantization as an intern at Sensetime Research, where he enjoyed working with the talented researchers and grew up a lot with the help of Fengwei Yu, Wei Wu, and Junjie Yan. During the early time of the internship, he independently took responsibility for the development of intelligent video analysis system Sensevideo. Later, he started the research on model quantization which can speed up the inference and even the training of neural networks on edge devices. Now he is devoted to further promoting the accuracy of extremely low-bit models and the auto-deployment of quantized models.

Haotong Qin

I am a Ph.D. student (Sep 2019 - ) in the State Key Laboratory of Software Development Environment (SKLSDE) and ShenYuan Honors College at Beihang University, supervised by Prof. Wei Liand Prof. Xianglong Liu. I obtained a B.Eng degree in computer science and engineering from Beihang University. I was a research intern (Jun 2020 - Aug 2020) at the WeiXin Group of Tencent. In my undergraduate study, I interned at the Speech group of Microsoft Research Asia (MSRA) supervised by Dr. Wenping Hu. I'm interested in deep learning, computer vision, and model compression. My research goal is to enable state-of-the-art neural network models to be successfully deployed on resource-limited hardware. This includes compressing and accelerating models on multiple tasks, and flexible and efficient deployment for multiple hardware.

Xiangguo Zhang

Xiangguo Zhang is a second-year graduate student in the School of Computer Science of Beihang University, under the guidance of Prof. Xianglong Liu. He received a bachelor's degree from Shandong University in 2019 and entered Beihang University in the same year. Currently, he is interested in computer vision and post training quantization.

Yifu Ding

Yifu Ding is a senior student in the School of Computer Science and Engineering at Beihang University. She is in the State Key Laboratory of Software Development Environment (SKLSDE), under the supervision of Prof. Xianglong Liu. Currently, she is interested in computer vision and model quantization. She thinks that neural network models which are highly compressed can be deployed on resource-constrained devices. And among all the compression methods, quantization is a potential one.

Our Work

Binary Neural Network: A Survey [PDF]

H. Qin, R. Gong, X. Liu*, X. Bai, J. Song, N. Sebe

Pattern Recognition (PR), 2020

Forward and Backward Information Retention for Accurate Binary Neural Networks [PDF]

H. Qin, R. Gong, X. Liu*, M. Shen, Z. Wei, F. Yu, J. Song

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020

Boosting Temporal Binary Coding for Large-scale Video Search

Y. Wu, X. Liu*, H. Qin , K. Xia, S. Hu, Y. Ma, M. Wang

IEEE Transactions on Multimedia (TMM), 2020

Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks

Ruihao Gong, Xianglong Liu*, Shenghu Jiang, Tianxiang Li, Peng Hu, Jiazhen Lin, Fengwei Yu, Junjie Yan

IEEE ICCV 2019

Towards Unified INT8 Training for Convolutional Neural Network

Feng Zhu, Ruihao Gong, Fengwei Yu, Xianglong Liu, Yanfei Wang, Zhelong Li, Xiuqi Yang, Junjie Yan

IEEE CVPR 2020

DMS: Differentiable Dimension Search for Binary Neural Networks

Yuhang Li and Ruihao Gong and Fengwei Yu and Xin Dong and Xianglong Liu

ICLR 2020 NAS workshop

Rotation Consistent Margin Loss for Efficient Low-bit Face Recognition

Yudong Wu, Yichao Wu, Ruihao Gong, Yuanhao Lv, Ken Chen, Ding Liang, Xiaolin Hu, Xianglong Liu, Junjie Yan

IEEE CVPR 2020

Balanced Binary Neural Networks with Gated Residual

Mingzhu Shen and Xianglong Liu and Ruihao Gong and Kai Han

ICASSP 2020

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
Codes		Codes
Docs		Docs
.DS_Store		.DS_Store
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome Model Quantization

Paper list

2020

2019

2018

2017

2016

2015

Related Codes

Docs

Reference

Our Team

Members

Our Work

About

Releases

Packages

Languages

yifu-ding/awesome-model-quantization

Folders and files

Latest commit

History

Repository files navigation

Awesome Model Quantization

Paper list

2020

2019

2018

2017

2016

2015

Related Codes

Docs

Reference

Our Team

Members

Our Work

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages