-
-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Empty prompt kills vllm server (AsyncEngineDeadError: Background loop is stopped.) #7283
Comments
This also happens for the offline LLM entrypoint:
|
Just curious, I think LLM always starts with a |
@youkaichao this depends on the tokenizer. I just tested llama 3.1 8b instruct and it doesn't have this issue because it has a BOS token:
You can see the prompt is empty but the prompt token ids is not Either way, I think we should return an empty response or otherwise follow what openai does for empty prompt. Crashing LLM or the server is not good behavior. |
agree. we should never let user request crash the engine. |
Good catch that this depends on the tokenizer. The models I tested do not have the bos token defined in the tokenizer_config.json. |
Same as this: #7632 |
Thanks for ping, closing as resolved |
Your current environment
🐛 Describe the bug
Spin up the vllm server in a pod using the vllm base image (
vllm/vllm-openai:v0.5.3.post1
)where $MODEL_PATH points to some model. I've tried gpt2-medium and Meta-Llama-3-8B.
Generation works fine, but if you pass in an empty prompt, it immediately kills the server and is unrecoverable:
Expected Behavior
If an empty prompt is not allowed, I would expect a 400 invalid input response vs. a 500 that stops the server.
The text was updated successfully, but these errors were encountered: