-
Notifications
You must be signed in to change notification settings - Fork 311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add FLUX nf4 quantization support #525
Comments
I'm not familiar with nf4. From what I managed to understand it's basically just 4 bit fixed point numbers with a range of [-1,1]? Or am I misunderstanding? |
Not fixed point apparently. Just a very limited non standard float. |
https://www.ai-bites.net/qlora-train-your-llms-on-a-single-gpu/#normalfloat This looks somewhat similar to GGML's IQ4_NL type in principle? It's not quite the same though. |
Pretty sure all was need is the ability to read it and convert to whatever quant the user wants. |
Please add FLUX nf4 quantization support
The text was updated successfully, but these errors were encountered: