-
-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Nonsense output for Qwen2.5 72B after upgrading to latest vllm 0.6.3.post1 [REPROs] #9769
Comments
nvidia-smi:
The other 4 GPUs are doing Qwen VL 2 76B
|
Even after restarting the docker image, I get back the same result. So the above script is a fine repro. It isn't the only way of course, all our longer inputs fail with 0.6.3.post1. |
Note this model is extremely good competitive model for coding and agents, so really needs to be top citizen for vLLM team in terms of testing etc. |
I just posted a similar issue but with totally different params. I wonder if related at all: issue |
Face similar problems |
I had issues with long context. They are related to the issue fixed in this PR: #9549 |
Got it, can try that if I want to upgrade again, but will stick to 0.6.2 for this model for now. |
I fixed my nonsense issue by installing the latest dev version of vLLM #9732 (comment) Maybe that fixes your issue too @pseudotensor |
Same situation when processing 32K context input on qwen-2.5-7B. |
I have this problem when using AWQ and GPTQ. Adding --enforce-eager can solve it normally, but it will be slower. |
The issue is resolved in main with this fix: #9549 You can install the nightly or use —enforce-eager until v0.6.4. You may be able to revert to 0.6.2 but I had issues with 0.6.2 due to a transformers change that breaks Qwen2.5 when you enable long context (>32k) |
same problem |
@cedonley --enforce-eager does same thing in more general cases. |
Closing as #9549 has been released. Please upgrade vLLM to v0.6.4 or above. |
Your current environment
docker 0.6.3.post1
8*A100
Model Input Dumps
No response
🐛 Describe the bug
No such issues with prior vLLM 0.6.2.
Trivial queries work:
But longer inputs lead to nonsense only in new vllm:
qwentest1.py.zip
Gives:
Full logs from that running state. It was just running overnight and was running some benchmarks.
qwen25_72b.bad.log.zip
Related or not? #9732
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: