Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add FLUX nf4 quantization support #525

Open
agv-zx82 opened this issue Dec 17, 2024 · 4 comments
Open

Add FLUX nf4 quantization support #525

agv-zx82 opened this issue Dec 17, 2024 · 4 comments

Comments

@agv-zx82
Copy link

Please add FLUX nf4 quantization support

@stduhpf
Copy link
Contributor

stduhpf commented Dec 17, 2024

I'm not familiar with nf4. From what I managed to understand it's basically just 4 bit fixed point numbers with a range of [-1,1]? Or am I misunderstanding?

@cb88
Copy link

cb88 commented Dec 19, 2024

I'm not familiar with nf4. From what I managed to understand it's basically just 4 bit fixed point numbers with a range of [-1,1]? Or am I misunderstanding?

Not fixed point apparently. Just a very limited non standard float.

https://huggingface.co/blog/4bit-transformers-bitsandbytes

@stduhpf
Copy link
Contributor

stduhpf commented Dec 20, 2024

https://www.ai-bites.net/qlora-train-your-llms-on-a-single-gpu/#normalfloat
image

This looks somewhat similar to GGML's IQ4_NL type in principle? It's not quite the same though.

@Green-Sky
Copy link
Contributor

Pretty sure all was need is the ability to read it and convert to whatever quant the user wants.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants