Skip to content

This repo contains the code for studying the interplay between quantization and sparsity methods

Notifications You must be signed in to change notification settings

parsa-epfl/quantization-sparsity-interplay

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 

Repository files navigation

Effective Interplay between Sparsity and Quantization: From Theory to Practice

This repository is the official implementation of the code used for all analysis and experiments in the paper: Effective Interplay between Sparsity and Quantization: From Theory to Practice.

The paper mathematically investigates the relationship between quantization and sparsity techniques, and how their errors combine when both techniques are used together. The theoretical analysis is validated by experimental results on a wide range of models.

Code Coming Soon!

We are excited to share our code with the community and are working on preparing it for release. Please stay tuned for updates, and thank you for your patience!

About Our Work

Various forms of quantization and sparsity techniques have emerged as promising approaches to compress models, especially in the modern era of LLMs. This paper focuses on the combined application of both of these techniques, and is part of the broader research efforts to make the memory footprint of LLMs smaller, and make them more accessible. Our mathematical analysis and extensive empirical study with large language models (OPT, LLaMA) and vision transformers (ViT) demonstrate that quantization and sparsity are not orthogonal and their combined use can adversely affect model accuracy. Our findings provide valuable insights for optimizing the compression of large models while preserving accuracy.

Citation

If you find the analysis and experimental results useful for your own research, please cite our paper:

@article{quant-sparse-interplay:2024,
    title        = {{Effective Interplay between Sparsity and Quantization:
From Theory to Practice}},
    author       = {Harma, Simla Burcu and Chakraborty, Ayan and Kostenok, Elizaveta and Mishin, Danila and Ha, Dongho and Falsafi, Babak and Jaggi, Martin and Liu, Ming and Oh, Yunho and Subramanian, Suvinay and Yazdanbakhsh, Amir},
    year         = 2024,
    journal      = {arXiv preprint}
}

About

This repo contains the code for studying the interplay between quantization and sparsity methods

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published