idea for a possibly better 5bit dequant function #8173
Swight1423
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
will preface this with the fact that I don't know the code well enough to fully implement this.
The current dequantize_row_q5_0 function in ggml-quants.c is implemented with one of the 5 bits in a separate variable. this leads to some extra processing to reconstruct the value. I think if the values for a block were held in a 20 byte array((5 times 32) divided by 8) without splitting them we might be able to process them more efficiently. Here is my quick notepad level attempt at redefining the function with this idea in mind though I haven't benchmarked it.
`void dequantize_row_q5_0(const block_q5_0* restrict x, float* restrict y, int64_t k) {
int qk = QK5_0;
}`
I am using a 64bit buffer to read in reverse treating it like a I am retrieving values from a stack while targeting the even byte boundary at 40bits.adapting this for 5_1 should be simple. This could probably be easily modified for 7bit with a few changes if desired. for 7 bit:
1.the backwards variable would start at 21 and decrement by 7 each time
2. you would read 7 bytes into the buffer
3. use 127 for the mask
4. shift by 7.
so this could be handled with 3 variables. if you wanted something reusable for both.
going higher than 8 bit with this strategy may require a larger buffer(buffer needs to be able to hold the lowest common multiple of the target and 8 if want to target byte boundaries neatly) or more complexity dealing with carries.
Beta Was this translation helpful? Give feedback.
All reactions