Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Nonsense output for Qwen2.5 72B after upgrading to latest vllm 0.6.3.post1 [REPROs] #9769

Closed
1 task done
pseudotensor opened this issue Oct 28, 2024 · 14 comments
Closed
1 task done
Labels
bug Something isn't working

Comments

@pseudotensor
Copy link

Your current environment

docker 0.6.3.post1
8*A100

docker pull vllm/vllm-openai:latest
docker stop qwen25_72b ; docker remove qwen25_72b
docker run -d --restart=always \
    --runtime=nvidia \
    --gpus '"device=4,5,6,7"' \
    --shm-size=10.24gb \
    -p 5001:5001 \
        -e NCCL_IGNORE_DISABLED_P2P=1 \
    -v /etc/passwd:/etc/passwd:ro \
    -v /etc/group:/etc/group:ro \
    -u `id -u`:`id -g` \
    -e HUGGING_FACE_HUB_TOKEN=$HUGGING_FACE_HUB_TOKEN \
    -v "${HOME}"/.cache:$HOME/.cache/ -v "${HOME}"/.config:$HOME/.config/   -v "${HOME}"/.triton:$HOME/.triton/  \
    --network host \
    --name qwen25_72b \
     vllm/vllm-openai:latest \
        --port=5001 \
        --host=0.0.0.0 \
        --model=Qwen/Qwen2.5-72B-Instruct \
        --tensor-parallel-size=4 \
        --seed 1234 \
        --trust-remote-code \
        --max-model-len=32768 \
        --max-num-batched-tokens 131072 \
        --max-log-len=100 \
        --api-key=EMPTY \
        --download-dir=$HOME/.cache/huggingface/hub &>> logs.vllm_server.qwen25_72b.txt

Model Input Dumps

No response

🐛 Describe the bug

No such issues with prior vLLM 0.6.2.

Trivial queries work:

from openai import OpenAI

client = OpenAI(base_url='FILL ME', api_key='FILL ME')

messages = [
    {
        "role": "user",
        "content": "Who are you?",
    }
]

response = client.chat.completions.create(
    model="Qwen/Qwen2.5-72B-Instruct",
    messages=messages,
    temperature=0.0,
    max_tokens=4096,
)

print(response.choices[0])

But longer inputs lead to nonsense only in new vllm:

qwentest1.py.zip

Gives:

Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='A\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text>\n\n</text>\n\n</text>\n\n</text>\n\n</text\n</text>\n\n</text>\n\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text>\n\n</text\n</text>\n\n</text\n</text\n\n</text\n</text>\n\n</text\n</text\n</text\n</text>\n\n</text\n</text\n</text>\n\n</text>\n\n</text\n\n</text\n</text\n</text\n</text>\n\n</text>\n\n</text\n</text>\n\n</text\n\n\n</text>\n\n</text\n</text>\n\n</text>\n\n</text>\n\n</text>\n\n</text</text>\n\n</text>\n\n</text>\n\n</text>\n\n</text\n</text\n</text>\n\n</text>\n\n</text>\n\n</text>\n\n</text>\n\n</text\n</text</text</text\n</text</text\n</text</text</text\n</text\n</text\n</text>\n\n</text</text</text</text>\n\n</text>\n\n</text</text>\n\n</text>\n\n</text\n</text</text\n</text\n</text>\n\n</text\n</text>\n\n</text\n</text>\n\n</text\n</text>\n\n</text>\n\n</text>\n\n</text>\n\n</text>\n\n</text>\n\n</text>\n\n</text>\n\n</text</text>\n\n</text</text</text</text</text</text>\n\n</text</text</text</text</text</text</text</text</text</text</text>\n\n</text>\n\n</text</text>\n\n</text</text</text</text</text>\n\n</text</text>\n\n</text>\n\n</text>\n\n</text>\n\n</text>\n\n</text\n</text>\n\n</text\n</text>\n\n</text>\n\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text>\n', refusal=None, role='assistant', function_call=None, tool_calls=[]), stop_reason=None)

Full logs from that running state. It was just running overnight and was running some benchmarks.

qwen25_72b.bad.log.zip

Related or not? #9732

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@pseudotensor pseudotensor added the bug Something isn't working label Oct 28, 2024
@pseudotensor
Copy link
Author

pseudotensor commented Oct 28, 2024

nvidia-smi:

ubuntu@h2ogpt-a100-node-1:~$ nvidia-smi
Mon Oct 28 19:41:45 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA A100-SXM4-80GB          On  |   00000000:0F:00.0 Off |                    0 |
| N/A   43C    P0             69W /  400W |   69883MiB /  81920MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA A100-SXM4-80GB          On  |   00000000:15:00.0 Off |                    0 |
| N/A   41C    P0             71W /  400W |   69787MiB /  81920MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA A100-SXM4-80GB          On  |   00000000:50:00.0 Off |                    0 |
| N/A   41C    P0             72W /  400W |   69787MiB /  81920MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA A100-SXM4-80GB          On  |   00000000:53:00.0 Off |                    0 |
| N/A   41C    P0             67W /  400W |   69499MiB /  81920MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   4  NVIDIA A100-SXM4-80GB          On  |   00000000:8C:00.0 Off |                    0 |
| N/A   68C    P0            332W /  400W |   77735MiB /  81920MiB |     96%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   5  NVIDIA A100-SXM4-80GB          On  |   00000000:91:00.0 Off |                    0 |
| N/A   60C    P0            318W /  400W |   77639MiB /  81920MiB |     92%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   6  NVIDIA A100-SXM4-80GB          On  |   00000000:D6:00.0 Off |                    0 |
| N/A   63C    P0            331W /  400W |   77639MiB /  81920MiB |     93%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   7  NVIDIA A100-SXM4-80GB          On  |   00000000:DA:00.0 Off |                    0 |
| N/A   72C    P0            331W /  400W |   77351MiB /  81920MiB |     94%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A   1815338      C   /usr/bin/python3                            69864MiB |
|    1   N/A  N/A   1815472      C   /usr/bin/python3                            69768MiB |
|    2   N/A  N/A   1815473      C   /usr/bin/python3                            69768MiB |
|    3   N/A  N/A   1815474      C   /usr/bin/python3                            69480MiB |
|    4   N/A  N/A   1980777      C   /usr/bin/python3                            77716MiB |
|    5   N/A  N/A   1981060      C   /usr/bin/python3                            77620MiB |
|    6   N/A  N/A   1981061      C   /usr/bin/python3                            77620MiB |
|    7   N/A  N/A   1981062      C   /usr/bin/python3                            77332MiB |
+-----------------------------------------------------------------------------------------+

The other 4 GPUs are doing Qwen VL 2 76B

ubuntu@h2ogpt-a100-node-1:~$ docker ps
CONTAINER ID   IMAGE                     COMMAND                  CREATED        STATUS        PORTS     NAMES
78dce1c637ec   vllm/vllm-openai:latest   "python3 -m vllm.ent…"   27 hours ago   Up 27 hours             qwen25_72b
d2918b1209aa   vllm/vllm-openai:latest                  "python3 -m vllm.ent…"   4 weeks ago    Up 5 days               qwen72bvll

@pseudotensor
Copy link
Author

Even after restarting the docker image, I get back the same result.

So the above script is a fine repro. It isn't the only way of course, all our longer inputs fail with 0.6.3.post1.

@pseudotensor pseudotensor changed the title [Bug]: Nonsense output for Qwen2.5 72B after upgrading to latest vllm 0.6.3.post1 [Bug]: Nonsense output for Qwen2.5 72B after upgrading to latest vllm 0.6.3.post1 [REPROs] Oct 28, 2024
@pseudotensor
Copy link
Author

Note this model is extremely good competitive model for coding and agents, so really needs to be top citizen for vLLM team in terms of testing etc.

@osilverstein
Copy link

I just posted a similar issue but with totally different params. I wonder if related at all: issue

@HoboRiceone
Copy link

Face similar problems

@cedonley
Copy link
Contributor

I had issues with long context. They are related to the issue fixed in this PR: #9549
If you get better results with --enforce-eager then this is likely the culprit. I'm seeing several similar issues the past few days.

@pseudotensor
Copy link
Author

Got it, can try that if I want to upgrade again, but will stick to 0.6.2 for this model for now.

@SinanAkkoyun
Copy link

I fixed my nonsense issue by installing the latest dev version of vLLM #9732 (comment)

Maybe that fixes your issue too @pseudotensor

@why11699
Copy link

Same situation when processing 32K context input on qwen-2.5-7B.
Works fine turning vllm back to 0.6.2

@frei-x
Copy link

frei-x commented Oct 31, 2024

I have this problem when using AWQ and GPTQ. Adding --enforce-eager can solve it normally, but it will be slower.

@cedonley
Copy link
Contributor

The issue is resolved in main with this fix: #9549

You can install the nightly or use —enforce-eager until v0.6.4. You may be able to revert to 0.6.2 but I had issues with 0.6.2 due to a transformers change that breaks Qwen2.5 when you enable long context (>32k)

@xinfanmeng
Copy link

same problem

@pseudotensor
Copy link
Author

@cedonley --enforce-eager does same thing in more general cases.

@DarkLight1337
Copy link
Member

DarkLight1337 commented Dec 25, 2024

Closing as #9549 has been released. Please upgrade vLLM to v0.6.4 or above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

9 participants