Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chat template exception with llama-3.2-11b-vision #43

Closed
nrober734 opened this issue Jan 30, 2025 · 1 comment
Closed

Chat template exception with llama-3.2-11b-vision #43

nrober734 opened this issue Jan 30, 2025 · 1 comment

Comments

@nrober734
Copy link

vLLM setup:

servingEngineSpec:
  modelSpec:
  - name: "llama3"
    repository: "vllm/vllm-openai"
    tag: "latest"
    modelURL: "meta-llama/Llama-3.2-11B-Vision"
    replicaCount: 4

    requestCPU: 50
    requestMemory: "1000Gi"
    requestGPU: 8

    pvcStorage: "750Gi"

    vllmConfig:
      enableChunkedPrefill: false
      enablePrefixCaching: false
      maxModelLen: 4096
      dtype: "bfloat16"
      extraArgs: ["--disable-log-requests", "--gpu-memory-utilization", "0.9", "--tensor-parallel-size", "1", "--max-num-seqs", "1"]

Exception:

INFO:     10.0.135.38:48284 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request
INFO:     10.0.146.90:32768 - "GET /health HTTP/1.1" 200 OK
INFO:     10.0.146.90:42448 - "GET /health HTTP/1.1" 200 OK
INFO:     10.0.135.38:42064 - "GET /metrics HTTP/1.1" 200 OK
INFO:     10.0.146.90:37466 - "GET /health HTTP/1.1" 200 OK
INFO:     10.0.146.90:44134 - "GET /health HTTP/1.1" 200 OK
INFO:     10.0.146.90:52820 - "GET /health HTTP/1.1" 200 OK
INFO:     10.0.135.38:53544 - "GET /metrics HTTP/1.1" 200 OK
INFO:     10.0.146.90:36512 - "GET /health HTTP/1.1" 200 OK
INFO:     10.0.146.90:39906 - "GET /health HTTP/1.1" 200 OK
INFO:     10.0.146.90:51268 - "GET /health HTTP/1.1" 200 OK
INFO:     10.0.135.38:49522 - "GET /metrics HTTP/1.1" 200 OK
INFO:     10.0.146.90:49260 - "GET /health HTTP/1.1" 200 OK
ERROR 01-29 20:28:04 serving_chat.py:175] Error in preprocessing prompt inputs
ERROR 01-29 20:28:04 serving_chat.py:175] Traceback (most recent call last):
ERROR 01-29 20:28:04 serving_chat.py:175]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_chat.py", line 159, in create_chat_completion
ERROR 01-29 20:28:04 serving_chat.py:175]     ) = await self._preprocess_chat(
ERROR 01-29 20:28:04 serving_chat.py:175]         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 01-29 20:28:04 serving_chat.py:175]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_engine.py", line 409, in _preprocess_chat
ERROR 01-29 20:28:04 serving_chat.py:175]     request_prompt = apply_hf_chat_template(
ERROR 01-29 20:28:04 serving_chat.py:175]                      ^^^^^^^^^^^^^^^^^^^^^^^
ERROR 01-29 20:28:04 serving_chat.py:175]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/chat_utils.py", line 967, in apply_hf_chat_template
ERROR 01-29 20:28:04 serving_chat.py:175]     raise ValueError(
ERROR 01-29 20:28:04 serving_chat.py:175] ValueError: As of transformers v4.44, default chat template is no longer allowed, so you must provide a chat template if the tokenizer does not define one.
INFO:     10.0.135.38:40834 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request

seems related to this issue? vllm-project/vllm#7978

@nrober734
Copy link
Author

was able to work this out with a new deployment. chalking it up to a transient issue for now, sorry for noise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant