We show that we can achieve quantization at a dynamic bit-level by doing per-layer quantization.
The code will be available here in the near future.
The paper is available at: https://arxiv.org/abs/2406.17415 and it is in review for EMNLP 2024.
If you decide to use please consider citing it using:
@misc{dumitru2024layerwisequantizationpragmaticeffective,
title={Layer-Wise Quantization: A Pragmatic and Effective Method for Quantizing LLMs Beyond Integer Bit-Levels},
author={Razvan-Gabriel Dumitru and Vikas Yadav and Rishabh Maheshwary and Paul-Ioan Clotan and Sathwik Tejaswi Madhusudhan and Mihai Surdeanu},
year={2024},
eprint={2406.17415},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2406.17415},
}