We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vLLM setup:
servingEngineSpec: modelSpec: - name: "llama3" repository: "vllm/vllm-openai" tag: "latest" modelURL: "meta-llama/Llama-3.2-11B-Vision" replicaCount: 4 requestCPU: 50 requestMemory: "1000Gi" requestGPU: 8 pvcStorage: "750Gi" vllmConfig: enableChunkedPrefill: false enablePrefixCaching: false maxModelLen: 4096 dtype: "bfloat16" extraArgs: ["--disable-log-requests", "--gpu-memory-utilization", "0.9", "--tensor-parallel-size", "1", "--max-num-seqs", "1"]
Exception:
INFO: 10.0.135.38:48284 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request INFO: 10.0.146.90:32768 - "GET /health HTTP/1.1" 200 OK INFO: 10.0.146.90:42448 - "GET /health HTTP/1.1" 200 OK INFO: 10.0.135.38:42064 - "GET /metrics HTTP/1.1" 200 OK INFO: 10.0.146.90:37466 - "GET /health HTTP/1.1" 200 OK INFO: 10.0.146.90:44134 - "GET /health HTTP/1.1" 200 OK INFO: 10.0.146.90:52820 - "GET /health HTTP/1.1" 200 OK INFO: 10.0.135.38:53544 - "GET /metrics HTTP/1.1" 200 OK INFO: 10.0.146.90:36512 - "GET /health HTTP/1.1" 200 OK INFO: 10.0.146.90:39906 - "GET /health HTTP/1.1" 200 OK INFO: 10.0.146.90:51268 - "GET /health HTTP/1.1" 200 OK INFO: 10.0.135.38:49522 - "GET /metrics HTTP/1.1" 200 OK INFO: 10.0.146.90:49260 - "GET /health HTTP/1.1" 200 OK ERROR 01-29 20:28:04 serving_chat.py:175] Error in preprocessing prompt inputs ERROR 01-29 20:28:04 serving_chat.py:175] Traceback (most recent call last): ERROR 01-29 20:28:04 serving_chat.py:175] File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_chat.py", line 159, in create_chat_completion ERROR 01-29 20:28:04 serving_chat.py:175] ) = await self._preprocess_chat( ERROR 01-29 20:28:04 serving_chat.py:175] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 01-29 20:28:04 serving_chat.py:175] File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_engine.py", line 409, in _preprocess_chat ERROR 01-29 20:28:04 serving_chat.py:175] request_prompt = apply_hf_chat_template( ERROR 01-29 20:28:04 serving_chat.py:175] ^^^^^^^^^^^^^^^^^^^^^^^ ERROR 01-29 20:28:04 serving_chat.py:175] File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/chat_utils.py", line 967, in apply_hf_chat_template ERROR 01-29 20:28:04 serving_chat.py:175] raise ValueError( ERROR 01-29 20:28:04 serving_chat.py:175] ValueError: As of transformers v4.44, default chat template is no longer allowed, so you must provide a chat template if the tokenizer does not define one. INFO: 10.0.135.38:40834 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request
seems related to this issue? vllm-project/vllm#7978
The text was updated successfully, but these errors were encountered:
was able to work this out with a new deployment. chalking it up to a transient issue for now, sorry for noise.
Sorry, something went wrong.
No branches or pull requests
vLLM setup:
Exception:
seems related to this issue? vllm-project/vllm#7978
The text was updated successfully, but these errors were encountered: