You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, thank you for your excellent work!
I encountered an issue while running qserve_benchmark.py:
I downloaded several models with the W4A8 per-channel quantization type provided in the QServe Model Zoo. When I tried to set --group-size to -1, I consistently ran into a RuntimeError: probability tensor contains either 'inf', 'nan' or element < 0. Interestingly, changing the parameter to 128 allowed the script to run successfully.
My understanding is that for the W4A8 per-channel quantization type, setting this parameter to -1 is the correct choice, while for the W4A8-g128 type, the correct setting should be 128.
Could you please help explain what might be causing this issue?
The text was updated successfully, but these errors were encountered:
Hi, thank you for your excellent work!
I encountered an issue while running qserve_benchmark.py:
I downloaded several models with the W4A8 per-channel quantization type provided in the QServe Model Zoo. When I tried to set
--group-size
to-1
, I consistently ran into a RuntimeError:probability tensor contains either 'inf', 'nan' or element < 0
. Interestingly, changing the parameter to128
allowed the script to run successfully.My understanding is that for the W4A8 per-channel quantization type, setting this parameter to -1 is the correct choice, while for the W4A8-g128 type, the correct setting should be 128.
Could you please help explain what might be causing this issue?
The text was updated successfully, but these errors were encountered: