Question about dequantization overhead #23

DD-DuDa · 2024-07-06T07:21:45Z

Thanks for your great work. I want to learn how to calculate the dequantization overhead, like in Figure 18, since the dequantization process is within a single kernel.

ys-2020 · 2024-07-13T17:45:10Z

Hi @DD-DuDa , thank you very much for your interests in QServe. We real-measured the dequant overheads of the above kernels. We compared the actual throughputs of GEMM kernels with dequantization and kernels in which dequantization ops are skipped. The difference of throughputs between the two version of kernels is regarded as dequant overhead.

DD-DuDa · 2024-07-14T01:16:37Z

Got it! Thank you for your response!

brisker · 2024-07-29T11:53:53Z

@ys-2020

In the formula(5) in your paper, why is the per group scale uint8? Why uint4-uint4 multiplied by uint8 can still be sint8?
Is that a typo? This is quite confusing.(In my understanding, the per group scale should be also 4bit to generate a sint8-w)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about dequantization overhead #23

Question about dequantization overhead #23

DD-DuDa commented Jul 6, 2024

ys-2020 commented Jul 13, 2024

DD-DuDa commented Jul 14, 2024

brisker commented Jul 29, 2024

Question about dequantization overhead #23

Question about dequantization overhead #23

Comments

DD-DuDa commented Jul 6, 2024

ys-2020 commented Jul 13, 2024

DD-DuDa commented Jul 14, 2024

brisker commented Jul 29, 2024