-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] 3bit quantization not working #1207
Comments
I have the same issue with Llama 3x and Qwen2.5 models. |
@sidhantls @benjamin-marie I will check this tomorrow. Ran out of time today. Also I want to add we are aware of an potential issue with 3bit where it has lower accuracy than 2bit in our previous ci regression tests. So even after I fix this, we may still have a lingering problem of 3bit quality (model accuracy via llm-eval tests) that is actually lower than 2bit quality which requires more investigation. |
@sidhantls Unknown for now. We normally only test/use 4, 8 bits so 2, 3 bit outlier issues haven't gotten the attention that may it should have. We will do more testing to confirm one way or another. If memory serves me correct, we did an But regardless, 2/3bit scores much much lower than 4bit. Which is also the resaon I haven't bothered to use it myself. =P edit: ref huggingface/transformers#35460 (comment) |
3-bit packing fixed in https://github.com/ModelCloud/GPTQModel/pull/1218/files but there is a bigger issue where inference of 3bit is broken, causing massive quality/ppl degradation. In short, do not use 3bit for now until we have fixed this regression. The 3bit quant error_loss appears to be normal so we are going to backtrack to when the divergence from autogptq broke 3bit (could still be packing related but more likely inference related) |
@Qubitium Great, thanks for letting me know. I've validated 3bit vs 2bit quality on Llama-3.1-8b for AutoGPTQ on MMLU and NQOpen, Given the severe degradation of performance of 3bit and 2bit, I'd recommend using a model greater than 1B parameters to validate if 3 works better than 2 |
@sidhantls @benjamin-marie Fixed on |
Describe the bug
I'm trying to quantize LLM to 3 bits. However, the quantization code runs with an error at the end. Yet, when I set bits=4 for the same code, it works.
Software Info
Windows 10, Python 3.10
To Reproduce
Error:
The text was updated successfully, but these errors were encountered: