You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for your great work. I want to learn how to calculate the dequantization overhead, like in Figure 18, since the dequantization process is within a single kernel.
The text was updated successfully, but these errors were encountered:
Hi @DD-DuDa , thank you very much for your interests in QServe. We real-measured the dequant overheads of the above kernels. We compared the actual throughputs of GEMM kernels with dequantization and kernels in which dequantization ops are skipped. The difference of throughputs between the two version of kernels is regarded as dequant overhead.
@ys-2020
In the formula(5) in your paper, why is the per group scale uint8? Why uint4-uint4 multiplied by uint8 can still be sint8?
Is that a typo? This is quite confusing.(In my understanding, the per group scale should be also 4bit to generate a sint8-w)
Thanks for your great work. I want to learn how to calculate the dequantization overhead, like in Figure 18, since the dequantization process is within a single kernel.
The text was updated successfully, but these errors were encountered: