Unable to run models with the Q4_0_4_4, Q4_0_4_8 and Q4_0_8_8 formats at ARM device. #1117

gustrd · 2024-09-06T14:52:20Z

Describe the Issue
Upstream we have the new feature of ARM optimized models (Q4_0_4_4, Q4_0_4_8 and Q4_0_8_8). I tried to run every one of them at my Snapdragon 8G1, but I was unable to run it with koboldcpp.

Additional Information:
Checking upstream I saw the new documentation (ggerganov#9321), that shows that some flags must be set at compilation. Can you please inform how to compile koboldcpp with those flags so I can try again?

To support `Q4_0_4_4`, you must build with `GGML_NO_LLAMAFILE=1` (`make`) or `-DGGML_LLAMAFILE=OFF` (`cmake`).

The text was updated successfully, but these errors were encountered:

LostRuins · 2024-09-07T01:53:04Z

At the moment, there is no flag to remove llamafile. I will add one. For now, you need to remove all matches of this -DGGML_USE_LLAMAFILE from the makefile, and then rebuild

Abhrant · 2024-10-03T17:44:19Z

Can we not just delete the llama.cpp folder, clone it again and run MAKE again?

gustrd · 2024-10-03T17:50:57Z

With the last version I was able to run Q4_0_4_4 just compiling from the source. Thx!

Abhrant · 2024-10-03T18:44:42Z

@gustrd , which quantization exactly is Q4_0_4_4 ? What quantization config do you have to specify to run this ? And fast is it compared to other quantizations on ARM ?

gustrd · 2024-10-04T00:03:45Z

@Abhrant , I'm not a specialist about it, but AFAIK Q4_0_4_4 is a special type of Q4 that takes advantage from some arm optimizations, present at some newer devices.

Q4_0_4_8 uses i8mm and Q4_0_8_8 uses SVC, that are even newer technologies.

I could just test Q4_0_4_4, and it got great prompt processing increase and minor generation speed increase.

With a Snapdragon 8G1 I'm getting around 35 t/s processing and 9 t/s generation, for 3B model.

gustrd changed the title ~~Unable to run models with theQ4_0_4_4, Q4_0_4_8 and Q4_0_8_8 formatsformats at ARM device.~~ Unable to run models with the Q4_0_4_4, Q4_0_4_8 and Q4_0_8_8 formats at ARM device. Sep 6, 2024

gustrd closed this as completed Oct 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to run models with the Q4_0_4_4, Q4_0_4_8 and Q4_0_8_8 formats at ARM device. #1117

Unable to run models with the Q4_0_4_4, Q4_0_4_8 and Q4_0_8_8 formats at ARM device. #1117

gustrd commented Sep 6, 2024 •

edited

Loading

LostRuins commented Sep 7, 2024

Abhrant commented Oct 3, 2024

gustrd commented Oct 3, 2024

Abhrant commented Oct 3, 2024

gustrd commented Oct 4, 2024

Unable to run models with the Q4_0_4_4, Q4_0_4_8 and Q4_0_8_8 formats at ARM device. #1117

Unable to run models with the Q4_0_4_4, Q4_0_4_8 and Q4_0_8_8 formats at ARM device. #1117

Comments

gustrd commented Sep 6, 2024 • edited Loading

LostRuins commented Sep 7, 2024

Abhrant commented Oct 3, 2024

gustrd commented Oct 3, 2024

Abhrant commented Oct 3, 2024

gustrd commented Oct 4, 2024

gustrd commented Sep 6, 2024 •

edited

Loading