Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Fix server crash on empty prompt #7746

Merged
merged 10 commits into from
Aug 23, 2024

Conversation

maxdebayser
Copy link
Contributor

Fixes #7632

To reproduce, start the server with python -m vllm.entrypoints.openai.api_server --model gpt2 and send an empty prompt:

$ curl http://localhost:8000/v1/completions    -H "Content-Type: application/json"    -d '{
     "model": "gpt2",
     "prompt": [""],
     "max_tokens": 20,
     "temperature": 0
}'
Internal Server Error

On the server side this log will show and the server will be dead:

INFO 08-21 14:49:19 logger.py:36] Received request cmpl-470c7a46582b4554b7926ce4559b0337-0: prompt: '', params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.0, top_p=1.0, top_k=-1, min_p=0.0, seed=None, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=20, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None), prompt_token_ids: [], lora_request: None, prompt_adapter_request: None.
INFO 08-21 14:49:19 async_llm_engine.py:208] Added request cmpl-470c7a46582b4554b7926ce4559b0337-0.
ERROR 08-21 14:49:19 async_llm_engine.py:65] Engine background task failed
ERROR 08-21 14:49:19 async_llm_engine.py:65] Traceback (most recent call last):
ERROR 08-21 14:49:19 async_llm_engine.py:65]   File "/home/mbayser/IBMProjects/FoundationModels/inference/vllm/vllm/engine/async_llm_engine.py", line 55, in _log_task_completion
ERROR 08-21 14:49:19 async_llm_engine.py:65]     return_value = task.result()
ERROR 08-21 14:49:19 async_llm_engine.py:65]                    ^^^^^^^^^^^^^
ERROR 08-21 14:49:19 async_llm_engine.py:65]   File "/home/mbayser/IBMProjects/FoundationModels/inference/vllm/vllm/engine/async_llm_engine.py", line 930, in run_engine_loop
ERROR 08-21 14:49:19 async_llm_engine.py:65]     result = task.result()
ERROR 08-21 14:49:19 async_llm_engine.py:65]              ^^^^^^^^^^^^^
ERROR 08-21 14:49:19 async_llm_engine.py:65]   File "/home/mbayser/IBMProjects/FoundationModels/inference/vllm/vllm/engine/async_llm_engine.py", line 873, in engine_step
ERROR 08-21 14:49:19 async_llm_engine.py:65]     request_outputs = await self.engine.step_async(virtual_engine)
ERROR 08-21 14:49:19 async_llm_engine.py:65]                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 08-21 14:49:19 async_llm_engine.py:65]   File "/home/mbayser/IBMProjects/FoundationModels/inference/vllm/vllm/engine/async_llm_engine.py", line 301, in step_async
ERROR 08-21 14:49:19 async_llm_engine.py:65]     virtual_engine].schedule()
ERROR 08-21 14:49:19 async_llm_engine.py:65]                     ^^^^^^^^^^
ERROR 08-21 14:49:19 async_llm_engine.py:65]   File "/home/mbayser/IBMProjects/FoundationModels/inference/vllm/vllm/core/scheduler.py", line 1039, in schedule
ERROR 08-21 14:49:19 async_llm_engine.py:65]     scheduler_outputs = self._schedule()
ERROR 08-21 14:49:19 async_llm_engine.py:65]                         ^^^^^^^^^^^^^^^^
ERROR 08-21 14:49:19 async_llm_engine.py:65]   File "/home/mbayser/IBMProjects/FoundationModels/inference/vllm/vllm/core/scheduler.py", line 1013, in _schedule
ERROR 08-21 14:49:19 async_llm_engine.py:65]     return self._schedule_default()
ERROR 08-21 14:49:19 async_llm_engine.py:65]            ^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 08-21 14:49:19 async_llm_engine.py:65]   File "/home/mbayser/IBMProjects/FoundationModels/inference/vllm/vllm/core/scheduler.py", line 857, in _schedule_default
ERROR 08-21 14:49:19 async_llm_engine.py:65]     prefills = self._schedule_prefills(budget,
ERROR 08-21 14:49:19 async_llm_engine.py:65]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 08-21 14:49:19 async_llm_engine.py:65]   File "/home/mbayser/IBMProjects/FoundationModels/inference/vllm/vllm/core/scheduler.py", line 752, in _schedule_prefills
ERROR 08-21 14:49:19 async_llm_engine.py:65]     num_new_tokens = self._get_num_new_tokens(seq_group,
ERROR 08-21 14:49:19 async_llm_engine.py:65]                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 08-21 14:49:19 async_llm_engine.py:65]   File "/home/mbayser/IBMProjects/FoundationModels/inference/vllm/vllm/core/scheduler.py", line 1349, in _get_num_new_tokens
ERROR 08-21 14:49:19 async_llm_engine.py:65]     assert num_new_tokens > 0
ERROR 08-21 14:49:19 async_llm_engine.py:65]            ^^^^^^^^^^^^^^^^^^
ERROR 08-21 14:49:19 async_llm_engine.py:65] AssertionError
Exception in callback functools.partial(<function _log_task_completion at 0x7f1abf494220>, error_callback=<bound method AsyncLLMEngine._error_callback of <vllm.engine.async_llm_engine.AsyncLLMEngine object at 0x7f1aa38d0050>>)
handle: <Handle functools.partial(<function _log_task_completion at 0x7f1abf494220>, error_callback=<bound method AsyncLLMEngine._error_callback of <vllm.engine.async_llm_engine.AsyncLLMEngine object at 0x7f1aa38d0050>>)>
Traceback (most recent call last):
  File "/home/mbayser/IBMProjects/FoundationModels/inference/vllm/vllm/engine/async_llm_engine.py", line 55, in _log_task_completion
    return_value = task.result()
                   ^^^^^^^^^^^^^
  File "/home/mbayser/IBMProjects/FoundationModels/inference/vllm/vllm/engine/async_llm_engine.py", line 930, in run_engine_loop
    result = task.result()
             ^^^^^^^^^^^^^
  File "/home/mbayser/IBMProjects/FoundationModels/inference/vllm/vllm/engine/async_llm_engine.py", line 873, in engine_step
    request_outputs = await self.engine.step_async(virtual_engine)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mbayser/IBMProjects/FoundationModels/inference/vllm/vllm/engine/async_llm_engine.py", line 301, in step_async
    virtual_engine].schedule()
                    ^^^^^^^^^^
  File "/home/mbayser/IBMProjects/FoundationModels/inference/vllm/vllm/core/scheduler.py", line 1039, in schedule
    scheduler_outputs = self._schedule()
                        ^^^^^^^^^^^^^^^^
  File "/home/mbayser/IBMProjects/FoundationModels/inference/vllm/vllm/core/scheduler.py", line 1013, in _schedule
    return self._schedule_default()
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mbayser/IBMProjects/FoundationModels/inference/vllm/vllm/core/scheduler.py", line 857, in _schedule_default
    prefills = self._schedule_prefills(budget,
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mbayser/IBMProjects/FoundationModels/inference/vllm/vllm/core/scheduler.py", line 752, in _schedule_prefills
    num_new_tokens = self._get_num_new_tokens(seq_group,
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mbayser/IBMProjects/FoundationModels/inference/vllm/vllm/core/scheduler.py", line 1349, in _get_num_new_tokens
    assert num_new_tokens > 0
           ^^^^^^^^^^^^^^^^^^
AssertionError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "uvloop/cbhandles.pyx", line 63, in uvloop.loop.Handle._run
  File "/home/mbayser/IBMProjects/FoundationModels/inference/vllm/vllm/engine/async_llm_engine.py", line 67, in _log_task_completion
    raise AsyncEngineDeadError(

After the fix, an error 400 is returned:

$ curl http://localhost:8000/v1/completions    -H "Content-Type: application/json"    -d '{
     "model": "gpt2",
     "prompt": [""],
     "max_tokens": 20,
     "temperature": 0
}'
{"object":"error","message":"Empty prompt","type":"BadRequestError","param":null,"code":400}```

This avoids a async loop crash that takes down the server

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Copy link

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which consists a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of default ones by unblocking the steps in your fast-check build on Buildkite UI.

Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge).

To run full CI, you can do one of these:

  • Comment /ready on the PR
  • Add ready label to the PR
  • Enable auto-merge.

🚀

@DarkLight1337
Copy link
Member

DarkLight1337 commented Aug 22, 2024

Let's perform the check inside add_request instead of process_model_inputs to move it closer to the cause of the crash.

fialhocoelho pushed a commit to opendatahub-io/vllm that referenced this pull request Aug 22, 2024
Validate the that the input prompts aren't empty

This avoids an async loop crash that takes down the server

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Jefferson Fialho <jfialho@ibm.com>
@njhill
Copy link
Member

njhill commented Aug 22, 2024

@maxdebayser could you add a simple test for this too?

maxdebayser and others added 2 commits August 22, 2024 11:12
Also add unit tests for LLM and OpenAI entrypoints

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
@maxdebayser
Copy link
Contributor Author

I've added the tests and moved the validation as requested.

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
vllm/engine/async_llm_engine.py Outdated Show resolved Hide resolved
vllm/engine/llm_engine.py Outdated Show resolved Hide resolved
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
@njhill njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 22, 2024
Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @maxdebayser!

Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticed additional simplification

vllm/engine/llm_engine.py Outdated Show resolved Hide resolved
@njhill njhill changed the title Fix server crash on empty prompt [BugFix] Fix server crash on empty prompt Aug 22, 2024
@njhill njhill mentioned this pull request Aug 23, 2024
@njhill njhill enabled auto-merge (squash) August 23, 2024 02:35
@njhill njhill merged commit e25fee5 into vllm-project:main Aug 23, 2024
45 checks passed
omrishiv pushed a commit to omrishiv/vllm that referenced this pull request Aug 26, 2024
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
@maxdebayser maxdebayser deleted the fix_empty_prompt_crash branch August 27, 2024 16:10
Anyonering added a commit to iidsample/chatdatagen that referenced this pull request Oct 2, 2024
Tested in local machines. Have assertiona failed on server side which
causes the vllm worker to crash. This may be caused by sending empty
prompt to the server as described in
vllm-project/vllm#7632 and
vllm-project/vllm#7746. Need to further
inspection on this later.
Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Alvant <alvasian@yandex.ru>
KuntaiDu pushed a commit to KuntaiDu/vllm that referenced this pull request Nov 20, 2024
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: assert num_new_tokens > 0 crashes entire worker instead of just failing single API call
4 participants