A curated list of neural network pruning resources.
-
Updated
Apr 4, 2024
A curated list of neural network pruning resources.
A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (papers, repositories) that are missed by the repo.
A list of high-quality (newest) AutoML works and lightweight models including 1.) Neural Architecture Search, 2.) Lightweight Structures, 3.) Model Compression, Quantization and Acceleration, 4.) Hyperparameter Optimization, 5.) Automated Feature Engineering.
Papers for deep neural network compression and acceleration
MUSCO: MUlti-Stage COmpression of neural networks
Resources of our survey paper "A Comprehensive Survey on AI Integration at the Edge: Techniques, Applications, and Challenges"
[NeurIPS'24] Training-Free Adaptive Diffusion with Bounded Difference Approximation Strategy
📚 Collection of awesome generation acceleration resources.
CoDe: Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient
[ICML 2024] CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers.
A list of papers, docs, codes about diffusion distillation.This repo collects various distillation methods for the Diffusion model. Welcome to PR the works (papers, repositories) missed by the repo.
Deep Learning Compression and Acceleration SDK -- deep model compression for Edge and IoT embedded systems, and deep model acceleration for clouds and private servers
(NeurIPS-2019 MicroNet Challenge - 3rd Winner) Open source code for "SIPA: A simple framework for efficient networks"
Bayesian Optimization-Based Global Optimal Rank Selection for Compression of Convolutional Neural Networks, IEEE Access
[IJCNN'19, IEEE JSTSP'19] Caffe code for our paper "Structured Pruning for Efficient ConvNets via Incremental Regularization"; [BMVC'18] "Structured Probabilistic Pruning for Convolutional Neural Network Acceleration"
Learn the ins and outs of efficiently serving Large Language Models (LLMs). Dive into optimization techniques, including KV caching and Low Rank Adapters (LoRA), and gain hands-on experience with Predibase’s LoRAX framework inference server.
Vision-lanugage model example code.
This sample shows how to convert TensorFlow model to OpenVINO IR model and how to quantize OpenVINO model.
Reduce the model complexity by 612 times, and memory footprint by 19.5 times compared to base model, while achieving worst case accuracy threshold.
On Efficient Variants of Segment Anything Model
Add a description, image, and links to the model-acceleration topic page so that developers can more easily learn about it.
To associate your repository with the model-acceleration topic, visit your repo's landing page and select "manage topics."