[Usage]: run gguf model need template，how to write？ #7978

lonngxiang · 2024-08-29T02:46:25Z

Your current environment

BadRequestError: Error code: 400 - {'object': 'error', 'message': 'As of transformers v4.44, default chat template is no longer allowed, so you must provide a chat template if the tokenizer does not define one.', 'type': 'BadRequestError', 'param': None, 'code': 400}

How would you like to use vllm

CUDA_VISIBLE_DEVICES=1 vllm serve /ai/qwen1.5-1.8b.gguf --host 0.0.0.0 --port 10868 --max-model-len 4096 --trust-remote-code --tensor-parallel-size 1 --dtype=half --quantization gguf --load-format gguf

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

I321065 · 2024-08-29T04:00:42Z

+1

Isotr0py · 2024-08-29T05:05:08Z

This may result by the missing chat_template in tokenizer which is a bug fixed by transformers#32908.
Can you check if installing latest transformers from source code would fix this issue?

I321065 · 2024-08-29T06:21:23Z

@Isotr0py thanks for your reply. Now I used docker for running vllm, could you push a temp docker image for this issue?

Isotr0py · 2024-08-29T08:24:19Z

I think you just need to add this line to the dockerfile:

RUN --mount=type=cache,target=/root/.cache/pip \
    python3 -m pip install git+https://github.com/huggingface/transformers

Before line39-44:

# install build and runtime dependencies
COPY requirements-common.txt requirements-common.txt
COPY requirements-adag.txt requirements-adag.txt
COPY requirements-cuda.txt requirements-cuda.txt
RUN --mount=type=cache,target=/root/.cache/pip \
    python3 -m pip install -r requirements-cuda.txt

simaotwx · 2024-09-09T14:07:32Z

Why does this not work out of the box? How does one specify such a template?
Is it really necessary to work around this issue by using the transformers trunk?

EDIT:
Here's some info I found: https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html?ref=blog.mozilla.ai#chat-template

haitwang-cloud · 2024-09-11T05:44:39Z

@I321065 @simaotwx @lonngxiang I am using the k8s to deploy the LLM models via vLLM 2, and I am using mount a configMap to vLLM pod to fix my template issue , let me know if you more the full yaml file about how to deploy it

      - name: ssdl-mistral-7b
        image:vllm/vllm-openai:latest
        command: ["/bin/sh", "-c"]
        args: [
          "vllm serve mistralai/Mistral-7B-v0.3 --chat-template /etc/chat-template-config/chat-template.j2 --trust-remote-code --enable-chunked-prefill --max_num_batched_tokens 1024"
        ]
        env:
        - name: HUGGING_FACE_HUB_TOKEN
          valueFrom:
            secretKeyRef:
              name: hf-token-secret
              key: token
        - name: VLLM_NO_USAGE_STATS
          value: "1"
        - name: DO_NOT_TRACK
          value: "1"
        - name: PYTHONPATH
          value: "/app/deps"
        ports:
        - containerPort: 8000
        resources:
          limits:
            cpu: "10"
            memory: 20G
            nvidia.com/mig-3g.40gb: "1"
          requests:
            cpu: "2"
            memory: 6G
            nvidia.com/mig-3g.40gb: "1"
        volumeMounts:
        - mountPath: /root/.cache/huggingface
          name: cache-volume
        - name: shm
          mountPath: /dev/shm
        - name: config-volume
          mountPath: /etc/config
        - name: deps-volume
          mountPath: /app/deps
        - name: chat-template-volume
          mountPath: /etc/chat-template-config
--- 
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: chat-template-config
  namespace: ssdl-llm
data:
  chat-template.j2: |
    {%- if messages[0]["role"] == "system" %}
        {%- set system_message = messages[0]["content"] %}
        {%- set loop_messages = messages[1:] %}
    {%- else %}
        {%- set loop_messages = messages %}
    {%- endif %}
    {%- if not tools is defined %}
        {%- set tools = none %}
    {%- endif %}
    {%- set user_messages = loop_messages | selectattr("role", "equalto", "user") | list %}

    {%- for message in loop_messages | rejectattr("role", "equalto", "tool") | rejectattr("role", "equalto", "tool_results") | selectattr("tool_calls", "undefined") %}
        {%- if (message["role"] == "user") != (loop.index0 % 2 == 0) %}
            {{- raise_exception("After the optional system message, conversation roles must alternate user/assistant/user/assistant/...") }}
        {%- endif %}
    {%- endfor %}

    {{- bos_token }}
    {%- for message in loop_messages %}
        {%- if message["role"] == "user" %}
            {%- if tools is not none and (message == user_messages[-1]) %}
                {{- "[AVAILABLE_TOOLS] [" }}
                {%- for tool in tools %}
                    {%- set tool = tool.function %}
                    {{- '{"type": "function", "function": {' }}
                    {%- for key, val in tool.items() if key != "return" %}
                        {%- if val is string %}
                            {{- '"' + key + '": "' + val + '"' }}
                        {%- else %}
                            {{- '"' + key + '": ' + val|tojson }}
                        {%- endif %}
                        {%- if not loop.last %}
                            {{- ", " }}
                        {%- endif %}
                    {%- endfor %}
                    {{- "}}" }}
                    {%- if not loop.last %}
                        {{- ", " }}
                    {%- else %}
                        {{- "]" }}
                    {%- endif %}
                {%- endfor %}
                {{- "[/AVAILABLE_TOOLS]" }}
            {%- endif %}
            {%- if loop.last and system_message is defined %}
                {{- "[INST] " + system_message + "\n\n" + message["content"] + "[/INST]" }}
            {%- else %}
                {{- "[INST] " + message["content"] + "[/INST]" }}
            {%- endif %}
        {%- elif message["role"] == "tool_calls" or message.tool_calls is defined %}
            {%- if message.tool_calls is defined %}
                {%- set tool_calls = message.tool_calls %}
            {%- else %}
                {%- set tool_calls = message.content %}
            {%- endif %}
            {{- "[TOOL_CALLS] [" }}
            {%- for tool_call in tool_calls %}
                {%- set out = tool_call.function|tojson %}
                {{- out[:-1] }}
                {%- if not tool_call.id is defined or tool_call.id|length < 9 %}
                    {{- raise_exception("Tool call IDs should be alphanumeric strings with length >= 9! (1)" + tool_call.id) }}
                {%- endif %}
                {{- ', "id": "' + tool_call.id[-9:] + '"}' }}
                {%- if not loop.last %}
                    {{- ", " }}
                {%- else %}
                    {{- "]" + eos_token }}
                {%- endif %}
            {%- endfor %}
        {%- elif message["role"] == "assistant" %}
            {{- " " + message["content"] + eos_token }}
        {%- elif message["role"] == "tool_results" or message["role"] == "tool" %}
            {%- if message.content is defined and message.content.content is defined %}
                {%- set content = message.content.content %}
            {%- else %}
                {%- set content = message.content %}
            {%- endif %}
            {{- '[TOOL_RESULTS] {"content": ' + content|string + ", " }}
            {%- if not message.tool_call_id is defined or message.tool_call_id|length < 9 %}
                {{- raise_exception("Tool call IDs should be alphanumeric strings with length >= 9! (2)" + message.tool_call_id) }}
            {%- endif %}
            {{- '"call_id": "' + message.tool_call_id[-9:] + '"}[/TOOL_RESULTS]' }}
        {%- else %}
            {{- raise_exception("Only user and assistant roles are supported, with the exception of an initial optional system message!") }}
        {%- endif %}
    {%- endfor %}

Mahamadoulng · 2024-10-21T12:28:22Z

Hi same issue for me. I am trying vllm with facebook/opt-125m using openai template, can someone help ? ValueError: As of transformers v4.44, default chat template is no longer allowed, so you must provide a chat template if the tokenizer does not define one.

byerose · 2024-10-24T05:34:25Z

Hi same issue for me. I am trying vllm with facebook/opt-125m using openai template, can someone help ? ValueError: As of transformers v4.44, default chat template is no longer allowed, so you must provide a chat template if the tokenizer does not define one.你好，对我来说同样的问题。我正在尝试使用 openai 模板通过 facebook/opt-125m 进行 vllm，有人可以帮忙吗？ ValueError：从 Transformer v4.44 开始，不再允许使用默认聊天模板，因此如果标记生成器未定义聊天模板，则必须提供聊天模板。

A solution: https://blog.csdn.net/yuanlulu/article/details/142929234

lonngxiang added the usage How to use vllm label Aug 29, 2024

Isotr0py mentioned this issue Sep 19, 2024

[Doc] Add documentation for GGUF quantization #8618

Merged

mgoin closed this as completed in #8618 Sep 19, 2024

nrober734 mentioned this issue Jan 30, 2025

Chat template exception with llama-3.2-11b-vision vllm-project/production-stack#43

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Usage]: run gguf model need template，how to write？ #7978

[Usage]: run gguf model need template，how to write？ #7978

lonngxiang commented Aug 29, 2024

I321065 commented Aug 29, 2024

Isotr0py commented Aug 29, 2024

I321065 commented Aug 29, 2024

Isotr0py commented Aug 29, 2024

simaotwx commented Sep 9, 2024 •

edited

Loading

haitwang-cloud commented Sep 11, 2024 •

edited

Loading

Mahamadoulng commented Oct 21, 2024

byerose commented Oct 24, 2024

[Usage]: run gguf model need template，how to write？ #7978

[Usage]: run gguf model need template，how to write？ #7978

Comments

lonngxiang commented Aug 29, 2024

Your current environment

How would you like to use vllm

Before submitting a new issue...

I321065 commented Aug 29, 2024

Isotr0py commented Aug 29, 2024

I321065 commented Aug 29, 2024

Isotr0py commented Aug 29, 2024

simaotwx commented Sep 9, 2024 • edited Loading

haitwang-cloud commented Sep 11, 2024 • edited Loading

Mahamadoulng commented Oct 21, 2024

byerose commented Oct 24, 2024

simaotwx commented Sep 9, 2024 •

edited

Loading

haitwang-cloud commented Sep 11, 2024 •

edited

Loading