Chat template exception with llama-3.2-11b-vision #43

nrober734 · 2025-01-30T04:35:32Z

vLLM setup:

servingEngineSpec:
  modelSpec:
  - name: "llama3"
    repository: "vllm/vllm-openai"
    tag: "latest"
    modelURL: "meta-llama/Llama-3.2-11B-Vision"
    replicaCount: 4

    requestCPU: 50
    requestMemory: "1000Gi"
    requestGPU: 8

    pvcStorage: "750Gi"

    vllmConfig:
      enableChunkedPrefill: false
      enablePrefixCaching: false
      maxModelLen: 4096
      dtype: "bfloat16"
      extraArgs: ["--disable-log-requests", "--gpu-memory-utilization", "0.9", "--tensor-parallel-size", "1", "--max-num-seqs", "1"]

Exception:

INFO:     10.0.135.38:48284 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request
INFO:     10.0.146.90:32768 - "GET /health HTTP/1.1" 200 OK
INFO:     10.0.146.90:42448 - "GET /health HTTP/1.1" 200 OK
INFO:     10.0.135.38:42064 - "GET /metrics HTTP/1.1" 200 OK
INFO:     10.0.146.90:37466 - "GET /health HTTP/1.1" 200 OK
INFO:     10.0.146.90:44134 - "GET /health HTTP/1.1" 200 OK
INFO:     10.0.146.90:52820 - "GET /health HTTP/1.1" 200 OK
INFO:     10.0.135.38:53544 - "GET /metrics HTTP/1.1" 200 OK
INFO:     10.0.146.90:36512 - "GET /health HTTP/1.1" 200 OK
INFO:     10.0.146.90:39906 - "GET /health HTTP/1.1" 200 OK
INFO:     10.0.146.90:51268 - "GET /health HTTP/1.1" 200 OK
INFO:     10.0.135.38:49522 - "GET /metrics HTTP/1.1" 200 OK
INFO:     10.0.146.90:49260 - "GET /health HTTP/1.1" 200 OK
ERROR 01-29 20:28:04 serving_chat.py:175] Error in preprocessing prompt inputs
ERROR 01-29 20:28:04 serving_chat.py:175] Traceback (most recent call last):
ERROR 01-29 20:28:04 serving_chat.py:175]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_chat.py", line 159, in create_chat_completion
ERROR 01-29 20:28:04 serving_chat.py:175]     ) = await self._preprocess_chat(
ERROR 01-29 20:28:04 serving_chat.py:175]         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 01-29 20:28:04 serving_chat.py:175]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_engine.py", line 409, in _preprocess_chat
ERROR 01-29 20:28:04 serving_chat.py:175]     request_prompt = apply_hf_chat_template(
ERROR 01-29 20:28:04 serving_chat.py:175]                      ^^^^^^^^^^^^^^^^^^^^^^^
ERROR 01-29 20:28:04 serving_chat.py:175]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/chat_utils.py", line 967, in apply_hf_chat_template
ERROR 01-29 20:28:04 serving_chat.py:175]     raise ValueError(
ERROR 01-29 20:28:04 serving_chat.py:175] ValueError: As of transformers v4.44, default chat template is no longer allowed, so you must provide a chat template if the tokenizer does not define one.
INFO:     10.0.135.38:40834 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request

seems related to this issue? vllm-project/vllm#7978

The text was updated successfully, but these errors were encountered:

nrober734 · 2025-01-30T07:48:38Z

was able to work this out with a new deployment. chalking it up to a transient issue for now, sorry for noise.

nrober734 closed this as completed Jan 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chat template exception with llama-3.2-11b-vision #43

Chat template exception with llama-3.2-11b-vision #43

nrober734 commented Jan 30, 2025

nrober734 commented Jan 30, 2025

Chat template exception with llama-3.2-11b-vision #43

Chat template exception with llama-3.2-11b-vision #43

Comments

nrober734 commented Jan 30, 2025

nrober734 commented Jan 30, 2025