You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Log:
ERROR 10-30 11:32:21 engine.py:158] ValueError('could not broadcast input array from shape (944,) into shape (512,)')
ERROR 10-30 11:32:21 engine.py:158] Traceback (most recent call last):
ERROR 10-30 11:32:21 engine.py:158] File "/home/user/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/engine/multiprocessing/engine.py", line 156, in start
ERROR 10-30 11:32:21 engine.py:158] self.run_engine_loop()
ERROR 10-30 11:32:21 engine.py:158] File "/home/user/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/engine/multiprocessing/engine.py", line 219, in run_engine_loop
ERROR 10-30 11:32:21 engine.py:158] request_outputs = self.engine_step()
ERROR 10-30 11:32:21 engine.py:158] File "/home/user/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/engine/multiprocessing/engine.py", line 237, in engine_step
ERROR 10-30 11:32:21 engine.py:158] raise e
ERROR 10-30 11:32:21 engine.py:158] File "/home/user/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/engine/multiprocessing/engine.py", line 228, in engine_step
ERROR 10-30 11:32:21 engine.py:158] return self.engine.step()
ERROR 10-30 11:32:21 engine.py:158] File "/home/user/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 1389, in step
ERROR 10-30 11:32:21 engine.py:158] outputs = self.model_executor.execute_model(
ERROR 10-30 11:32:21 engine.py:158] File "/home/user/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/executor/distributed_gpu_executor.py", line 82, in execute_model
ERROR 10-30 11:32:21 engine.py:158] driver_outputs = self._driver_execute_model(execute_model_req)
ERROR 10-30 11:32:21 engine.py:158] File "/home/user/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/executor/multiproc_gpu_executor.py", line 155, in _driver_execute_model
ERROR 10-30 11:32:21 engine.py:158] return self.driver_worker.execute_model(execute_model_req)
ERROR 10-30 11:32:21 engine.py:158] File "/home/user/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 303, in execute_model
ERROR 10-30 11:32:21 engine.py:158] inputs = self.prepare_input(execute_model_req)
ERROR 10-30 11:32:21 engine.py:158] File "/home/user/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 291, in prepare_input
ERROR 10-30 11:32:21 engine.py:158] return self._get_driver_input_and_broadcast(execute_model_req)
ERROR 10-30 11:32:21 engine.py:158] File "/home/user/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 253, in _get_driver_input_and_broadcast
ERROR 10-30 11:32:21 engine.py:158] self.model_runner.prepare_model_input(
ERROR 10-30 11:32:21 engine.py:158] File "/home/user/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 1586, in prepare_model_input
ERROR 10-30 11:32:21 engine.py:158] model_input = self._prepare_model_input_tensors(
ERROR 10-30 11:32:21 engine.py:158] File "/home/user/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 1196, in _prepare_model_input_tensors
ERROR 10-30 11:32:21 engine.py:158] return builder.build() # type: ignore
ERROR 10-30 11:32:21 engine.py:158] File "/home/user/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 867, in build
ERROR 10-30 11:32:21 engine.py:158] attn_metadata = self.attn_metadata_builder.build(
ERROR 10-30 11:32:21 engine.py:158] File "/home/user/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/attention/backends/utils.py", line 215, in build
ERROR 10-30 11:32:21 engine.py:158] input_block_tables[i, :len(block_table)] = block_table
ERROR 10-30 11:32:21 engine.py:158] ValueError: could not broadcast input array from shape (944,) into shape (512,)
ERROR 10-30 11:32:21 serving_chat.py:603] Error in chat completion stream generator.
ERROR 10-30 11:32:21 serving_chat.py:603] Traceback (most recent call last):
ERROR 10-30 11:32:21 serving_chat.py:603] File "/home/user/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/entrypoints/openai/serving_chat.py", line 342, in chat_completion_stream_generator
ERROR 10-30 11:32:21 serving_chat.py:603] async for res in result_generator:
ERROR 10-30 11:32:21 serving_chat.py:603] File "/home/user/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/utils.py", line 458, in iterate_with_cancellation
ERROR 10-30 11:32:21 serving_chat.py:603] item = await awaits[0]
ERROR 10-30 11:32:21 serving_chat.py:603] File "/home/user/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/engine/multiprocessing/client.py", line 598, in _process_request
ERROR 10-30 11:32:21 serving_chat.py:603] raise request_output
ERROR 10-30 11:32:21 serving_chat.py:603] ValueError: could not broadcast input array from shape (944,) into shape (512,)
ERROR 10-30 11:32:22 multiproc_worker_utils.py:116] Worker VllmWorkerProcess pid 222405 died, exit code: -15
INFO 10-30 11:32:22 multiproc_worker_utils.py:120] Killing local vLLM worker processes
ERROR 10-30 11:32:30 client.py:250] TimeoutError('No heartbeat received from MQLLMEngine')
ERROR 10-30 11:32:30 client.py:250] NoneType: None
CRITICAL 10-30 11:32:31 launcher.py:99] MQLLMEngine is already dead, terminating server process
Before submitting a new issue...
Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
The text was updated successfully, but these errors were encountered:
Workaround from this comment worked in my case:
vllm serve "meta-llama/Meta-Llama-3.1-70B-Instruct" --port 7000 --max-num-seqs 128 --tensor-parallel-size=8 --max_model_len=32768 --max-seq-len-to-capture=32768 --distributed-executor-backend=mp --dtype=half
Your current environment
The output of `python collect_env.py`
Model Input Dumps
No response
🐛 Describe the bug
vllm serve "meta-llama/Meta-Llama-3.1-70B-Instruct" --port 7000 --max-num-seqs 64 --tensor-parallel-size=8 --max_model_len=32768 --distributed-executor-backend=mp --dtype=half
Log:
ERROR 10-30 11:32:21 engine.py:158] ValueError('could not broadcast input array from shape (944,) into shape (512,)')
ERROR 10-30 11:32:21 engine.py:158] Traceback (most recent call last):
ERROR 10-30 11:32:21 engine.py:158] File "/home/user/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/engine/multiprocessing/engine.py", line 156, in start
ERROR 10-30 11:32:21 engine.py:158] self.run_engine_loop()
ERROR 10-30 11:32:21 engine.py:158] File "/home/user/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/engine/multiprocessing/engine.py", line 219, in run_engine_loop
ERROR 10-30 11:32:21 engine.py:158] request_outputs = self.engine_step()
ERROR 10-30 11:32:21 engine.py:158] File "/home/user/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/engine/multiprocessing/engine.py", line 237, in engine_step
ERROR 10-30 11:32:21 engine.py:158] raise e
ERROR 10-30 11:32:21 engine.py:158] File "/home/user/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/engine/multiprocessing/engine.py", line 228, in engine_step
ERROR 10-30 11:32:21 engine.py:158] return self.engine.step()
ERROR 10-30 11:32:21 engine.py:158] File "/home/user/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 1389, in step
ERROR 10-30 11:32:21 engine.py:158] outputs = self.model_executor.execute_model(
ERROR 10-30 11:32:21 engine.py:158] File "/home/user/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/executor/distributed_gpu_executor.py", line 82, in execute_model
ERROR 10-30 11:32:21 engine.py:158] driver_outputs = self._driver_execute_model(execute_model_req)
ERROR 10-30 11:32:21 engine.py:158] File "/home/user/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/executor/multiproc_gpu_executor.py", line 155, in _driver_execute_model
ERROR 10-30 11:32:21 engine.py:158] return self.driver_worker.execute_model(execute_model_req)
ERROR 10-30 11:32:21 engine.py:158] File "/home/user/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 303, in execute_model
ERROR 10-30 11:32:21 engine.py:158] inputs = self.prepare_input(execute_model_req)
ERROR 10-30 11:32:21 engine.py:158] File "/home/user/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 291, in prepare_input
ERROR 10-30 11:32:21 engine.py:158] return self._get_driver_input_and_broadcast(execute_model_req)
ERROR 10-30 11:32:21 engine.py:158] File "/home/user/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 253, in _get_driver_input_and_broadcast
ERROR 10-30 11:32:21 engine.py:158] self.model_runner.prepare_model_input(
ERROR 10-30 11:32:21 engine.py:158] File "/home/user/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 1586, in prepare_model_input
ERROR 10-30 11:32:21 engine.py:158] model_input = self._prepare_model_input_tensors(
ERROR 10-30 11:32:21 engine.py:158] File "/home/user/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 1196, in _prepare_model_input_tensors
ERROR 10-30 11:32:21 engine.py:158] return builder.build() # type: ignore
ERROR 10-30 11:32:21 engine.py:158] File "/home/user/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 867, in build
ERROR 10-30 11:32:21 engine.py:158] attn_metadata = self.attn_metadata_builder.build(
ERROR 10-30 11:32:21 engine.py:158] File "/home/user/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/attention/backends/utils.py", line 215, in build
ERROR 10-30 11:32:21 engine.py:158] input_block_tables[i, :len(block_table)] = block_table
ERROR 10-30 11:32:21 engine.py:158] ValueError: could not broadcast input array from shape (944,) into shape (512,)
ERROR 10-30 11:32:21 serving_chat.py:603] Error in chat completion stream generator.
ERROR 10-30 11:32:21 serving_chat.py:603] Traceback (most recent call last):
ERROR 10-30 11:32:21 serving_chat.py:603] File "/home/user/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/entrypoints/openai/serving_chat.py", line 342, in chat_completion_stream_generator
ERROR 10-30 11:32:21 serving_chat.py:603] async for res in result_generator:
ERROR 10-30 11:32:21 serving_chat.py:603] File "/home/user/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/utils.py", line 458, in iterate_with_cancellation
ERROR 10-30 11:32:21 serving_chat.py:603] item = await awaits[0]
ERROR 10-30 11:32:21 serving_chat.py:603] File "/home/user/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/engine/multiprocessing/client.py", line 598, in _process_request
ERROR 10-30 11:32:21 serving_chat.py:603] raise request_output
ERROR 10-30 11:32:21 serving_chat.py:603] ValueError: could not broadcast input array from shape (944,) into shape (512,)
ERROR 10-30 11:32:22 multiproc_worker_utils.py:116] Worker VllmWorkerProcess pid 222405 died, exit code: -15
INFO 10-30 11:32:22 multiproc_worker_utils.py:120] Killing local vLLM worker processes
ERROR 10-30 11:32:30 client.py:250] TimeoutError('No heartbeat received from MQLLMEngine')
ERROR 10-30 11:32:30 client.py:250] NoneType: None
CRITICAL 10-30 11:32:31 launcher.py:99] MQLLMEngine is already dead, terminating server process
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: