-
Notifications
You must be signed in to change notification settings - Fork 360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to run models with the Q4_0_4_4, Q4_0_4_8 and Q4_0_8_8 formats at ARM device. #1117
Comments
At the moment, there is no flag to remove llamafile. I will add one. For now, you need to remove all matches of this |
Can we not just delete the llama.cpp folder, clone it again and run MAKE again? |
With the last version I was able to run Q4_0_4_4 just compiling from the source. Thx! |
@gustrd , which quantization exactly is Q4_0_4_4 ? What quantization config do you have to specify to run this ? And fast is it compared to other quantizations on ARM ? |
@Abhrant , I'm not a specialist about it, but AFAIK Q4_0_4_4 is a special type of Q4 that takes advantage from some arm optimizations, present at some newer devices. Q4_0_4_8 uses i8mm and Q4_0_8_8 uses SVC, that are even newer technologies. I could just test Q4_0_4_4, and it got great prompt processing increase and minor generation speed increase. With a Snapdragon 8G1 I'm getting around 35 t/s processing and 9 t/s generation, for 3B model. |
Describe the Issue
Upstream we have the new feature of ARM optimized models (Q4_0_4_4, Q4_0_4_8 and Q4_0_8_8). I tried to run every one of them at my Snapdragon 8G1, but I was unable to run it with koboldcpp.
Additional Information:
Checking upstream I saw the new documentation (ggerganov#9321), that shows that some flags must be set at compilation. Can you please inform how to compile koboldcpp with those flags so I can try again?
To support `Q4_0_4_4`, you must build with `GGML_NO_LLAMAFILE=1` (`make`) or `-DGGML_LLAMAFILE=OFF` (`cmake`).
The text was updated successfully, but these errors were encountered: