[BugFix] Fix server crash on empty prompt #7746

maxdebayser · 2024-08-21T17:56:35Z

To reproduce, start the server with python -m vllm.entrypoints.openai.api_server --model gpt2 and send an empty prompt:

$ curl http://localhost:8000/v1/completions    -H "Content-Type: application/json"    -d '{
     "model": "gpt2",
     "prompt": [""],
     "max_tokens": 20,
     "temperature": 0
}'
Internal Server Error

On the server side this log will show and the server will be dead:

INFO 08-21 14:49:19 logger.py:36] Received request cmpl-470c7a46582b4554b7926ce4559b0337-0: prompt: '', params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.0, top_p=1.0, top_k=-1, min_p=0.0, seed=None, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=20, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None), prompt_token_ids: [], lora_request: None, prompt_adapter_request: None.
INFO 08-21 14:49:19 async_llm_engine.py:208] Added request cmpl-470c7a46582b4554b7926ce4559b0337-0.
ERROR 08-21 14:49:19 async_llm_engine.py:65] Engine background task failed
ERROR 08-21 14:49:19 async_llm_engine.py:65] Traceback (most recent call last):
ERROR 08-21 14:49:19 async_llm_engine.py:65]   File "/home/mbayser/IBMProjects/FoundationModels/inference/vllm/vllm/engine/async_llm_engine.py", line 55, in _log_task_completion
ERROR 08-21 14:49:19 async_llm_engine.py:65]     return_value = task.result()
ERROR 08-21 14:49:19 async_llm_engine.py:65]                    ^^^^^^^^^^^^^
ERROR 08-21 14:49:19 async_llm_engine.py:65]   File "/home/mbayser/IBMProjects/FoundationModels/inference/vllm/vllm/engine/async_llm_engine.py", line 930, in run_engine_loop
ERROR 08-21 14:49:19 async_llm_engine.py:65]     result = task.result()
ERROR 08-21 14:49:19 async_llm_engine.py:65]              ^^^^^^^^^^^^^
ERROR 08-21 14:49:19 async_llm_engine.py:65]   File "/home/mbayser/IBMProjects/FoundationModels/inference/vllm/vllm/engine/async_llm_engine.py", line 873, in engine_step
ERROR 08-21 14:49:19 async_llm_engine.py:65]     request_outputs = await self.engine.step_async(virtual_engine)
ERROR 08-21 14:49:19 async_llm_engine.py:65]                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 08-21 14:49:19 async_llm_engine.py:65]   File "/home/mbayser/IBMProjects/FoundationModels/inference/vllm/vllm/engine/async_llm_engine.py", line 301, in step_async
ERROR 08-21 14:49:19 async_llm_engine.py:65]     virtual_engine].schedule()
ERROR 08-21 14:49:19 async_llm_engine.py:65]                     ^^^^^^^^^^
ERROR 08-21 14:49:19 async_llm_engine.py:65]   File "/home/mbayser/IBMProjects/FoundationModels/inference/vllm/vllm/core/scheduler.py", line 1039, in schedule
ERROR 08-21 14:49:19 async_llm_engine.py:65]     scheduler_outputs = self._schedule()
ERROR 08-21 14:49:19 async_llm_engine.py:65]                         ^^^^^^^^^^^^^^^^
ERROR 08-21 14:49:19 async_llm_engine.py:65]   File "/home/mbayser/IBMProjects/FoundationModels/inference/vllm/vllm/core/scheduler.py", line 1013, in _schedule
ERROR 08-21 14:49:19 async_llm_engine.py:65]     return self._schedule_default()
ERROR 08-21 14:49:19 async_llm_engine.py:65]            ^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 08-21 14:49:19 async_llm_engine.py:65]   File "/home/mbayser/IBMProjects/FoundationModels/inference/vllm/vllm/core/scheduler.py", line 857, in _schedule_default
ERROR 08-21 14:49:19 async_llm_engine.py:65]     prefills = self._schedule_prefills(budget,
ERROR 08-21 14:49:19 async_llm_engine.py:65]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 08-21 14:49:19 async_llm_engine.py:65]   File "/home/mbayser/IBMProjects/FoundationModels/inference/vllm/vllm/core/scheduler.py", line 752, in _schedule_prefills
ERROR 08-21 14:49:19 async_llm_engine.py:65]     num_new_tokens = self._get_num_new_tokens(seq_group,
ERROR 08-21 14:49:19 async_llm_engine.py:65]                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 08-21 14:49:19 async_llm_engine.py:65]   File "/home/mbayser/IBMProjects/FoundationModels/inference/vllm/vllm/core/scheduler.py", line 1349, in _get_num_new_tokens
ERROR 08-21 14:49:19 async_llm_engine.py:65]     assert num_new_tokens > 0
ERROR 08-21 14:49:19 async_llm_engine.py:65]            ^^^^^^^^^^^^^^^^^^
ERROR 08-21 14:49:19 async_llm_engine.py:65] AssertionError
Exception in callback functools.partial(<function _log_task_completion at 0x7f1abf494220>, error_callback=<bound method AsyncLLMEngine._error_callback of <vllm.engine.async_llm_engine.AsyncLLMEngine object at 0x7f1aa38d0050>>)
handle: <Handle functools.partial(<function _log_task_completion at 0x7f1abf494220>, error_callback=<bound method AsyncLLMEngine._error_callback of <vllm.engine.async_llm_engine.AsyncLLMEngine object at 0x7f1aa38d0050>>)>
Traceback (most recent call last):
  File "/home/mbayser/IBMProjects/FoundationModels/inference/vllm/vllm/engine/async_llm_engine.py", line 55, in _log_task_completion
    return_value = task.result()
                   ^^^^^^^^^^^^^
  File "/home/mbayser/IBMProjects/FoundationModels/inference/vllm/vllm/engine/async_llm_engine.py", line 930, in run_engine_loop
    result = task.result()
             ^^^^^^^^^^^^^
  File "/home/mbayser/IBMProjects/FoundationModels/inference/vllm/vllm/engine/async_llm_engine.py", line 873, in engine_step
    request_outputs = await self.engine.step_async(virtual_engine)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mbayser/IBMProjects/FoundationModels/inference/vllm/vllm/engine/async_llm_engine.py", line 301, in step_async
    virtual_engine].schedule()
                    ^^^^^^^^^^
  File "/home/mbayser/IBMProjects/FoundationModels/inference/vllm/vllm/core/scheduler.py", line 1039, in schedule
    scheduler_outputs = self._schedule()
                        ^^^^^^^^^^^^^^^^
  File "/home/mbayser/IBMProjects/FoundationModels/inference/vllm/vllm/core/scheduler.py", line 1013, in _schedule
    return self._schedule_default()
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mbayser/IBMProjects/FoundationModels/inference/vllm/vllm/core/scheduler.py", line 857, in _schedule_default
    prefills = self._schedule_prefills(budget,
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mbayser/IBMProjects/FoundationModels/inference/vllm/vllm/core/scheduler.py", line 752, in _schedule_prefills
    num_new_tokens = self._get_num_new_tokens(seq_group,
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mbayser/IBMProjects/FoundationModels/inference/vllm/vllm/core/scheduler.py", line 1349, in _get_num_new_tokens
    assert num_new_tokens > 0
           ^^^^^^^^^^^^^^^^^^
AssertionError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "uvloop/cbhandles.pyx", line 63, in uvloop.loop.Handle._run
  File "/home/mbayser/IBMProjects/FoundationModels/inference/vllm/vllm/engine/async_llm_engine.py", line 67, in _log_task_completion
    raise AsyncEngineDeadError(

After the fix, an error 400 is returned:

$ curl http://localhost:8000/v1/completions    -H "Content-Type: application/json"    -d '{
     "model": "gpt2",
     "prompt": [""],
     "max_tokens": 20,
     "temperature": 0
}'
{"object":"error","message":"Empty prompt","type":"BadRequestError","param":null,"code":400}```

This avoids a async loop crash that takes down the server Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

github-actions · 2024-08-21T17:56:47Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which consists a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of default ones by unblocking the steps in your fast-check build on Buildkite UI.

Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge).

To run full CI, you can do one of these:

Comment /ready on the PR
Add ready label to the PR
Enable auto-merge.

🚀

DarkLight1337 · 2024-08-22T01:58:20Z

Let's perform the check inside add_request instead of process_model_inputs to move it closer to the cause of the crash.

Validate the that the input prompts aren't empty This avoids an async loop crash that takes down the server Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Signed-off-by: Jefferson Fialho <jfialho@ibm.com>

njhill · 2024-08-22T13:35:25Z

@maxdebayser could you add a simple test for this too?

Also add unit tests for LLM and OpenAI entrypoints Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

maxdebayser · 2024-08-22T16:34:23Z

I've added the tests and moved the validation as requested.