A question about the parameter “–group-size” in qserve_benchmark.py #48

oasis-Linmi · 2024-12-16T10:20:33Z

Hi, thank you for your excellent work!
I encountered an issue while running qserve_benchmark.py:
I downloaded several models with the W4A8 per-channel quantization type provided in the QServe Model Zoo. When I tried to set --group-size to -1, I consistently ran into a RuntimeError: probability tensor contains either 'inf', 'nan' or element < 0. Interestingly, changing the parameter to 128 allowed the script to run successfully.
My understanding is that for the W4A8 per-channel quantization type, setting this parameter to -1 is the correct choice, while for the W4A8-g128 type, the correct setting should be 128.
Could you please help explain what might be causing this issue?

The text was updated successfully, but these errors were encountered:

oasis-Linmi closed this as completed Dec 18, 2024

oasis-Linmi reopened this Dec 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A question about the parameter “–group-size” in qserve_benchmark.py #48

A question about the parameter “–group-size” in qserve_benchmark.py #48

oasis-Linmi commented Dec 16, 2024

A question about the parameter “–group-size” in qserve_benchmark.py #48

A question about the parameter “–group-size” in qserve_benchmark.py #48

Comments

oasis-Linmi commented Dec 16, 2024