Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A question about the parameter “–group-size” in qserve_benchmark.py #48

Open
oasis-Linmi opened this issue Dec 16, 2024 · 0 comments

Comments

@oasis-Linmi
Copy link

Hi, thank you for your excellent work!
I encountered an issue while running qserve_benchmark.py:
I downloaded several models with the W4A8 per-channel quantization type provided in the QServe Model Zoo. When I tried to set --group-size to -1, I consistently ran into a RuntimeError: probability tensor contains either 'inf', 'nan' or element < 0. Interestingly, changing the parameter to 128 allowed the script to run successfully.
My understanding is that for the W4A8 per-channel quantization type, setting this parameter to -1 is the correct choice, while for the W4A8-g128 type, the correct setting should be 128.
Could you please help explain what might be causing this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant