Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Usage]: run gguf model need template,how to write? #7978

Closed
1 task done
lonngxiang opened this issue Aug 29, 2024 · 8 comments · Fixed by #8618
Closed
1 task done

[Usage]: run gguf model need template,how to write? #7978

lonngxiang opened this issue Aug 29, 2024 · 8 comments · Fixed by #8618
Labels
usage How to use vllm

Comments

@lonngxiang
Copy link

Your current environment

BadRequestError: Error code: 400 - {'object': 'error', 'message': 'As of transformers v4.44, default chat template is no longer allowed, so you must provide a chat template if the tokenizer does not define one.', 'type': 'BadRequestError', 'param': None, 'code': 400}

How would you like to use vllm

CUDA_VISIBLE_DEVICES=1 vllm serve /ai/qwen1.5-1.8b.gguf --host 0.0.0.0 --port 10868 --max-model-len 4096 --trust-remote-code --tensor-parallel-size 1 --dtype=half --quantization gguf --load-format gguf

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@lonngxiang lonngxiang added the usage How to use vllm label Aug 29, 2024
@I321065
Copy link

I321065 commented Aug 29, 2024

+1

@Isotr0py
Copy link
Collaborator

This may result by the missing chat_template in tokenizer which is a bug fixed by transformers#32908.
Can you check if installing latest transformers from source code would fix this issue?

@I321065
Copy link

I321065 commented Aug 29, 2024

@Isotr0py thanks for your reply. Now I used docker for running vllm, could you push a temp docker image for this issue?

@Isotr0py
Copy link
Collaborator

I think you just need to add this line to the dockerfile:

RUN --mount=type=cache,target=/root/.cache/pip \
    python3 -m pip install git+https://github.com/huggingface/transformers

Before line39-44:

# install build and runtime dependencies
COPY requirements-common.txt requirements-common.txt
COPY requirements-adag.txt requirements-adag.txt
COPY requirements-cuda.txt requirements-cuda.txt
RUN --mount=type=cache,target=/root/.cache/pip \
    python3 -m pip install -r requirements-cuda.txt

@simaotwx
Copy link

simaotwx commented Sep 9, 2024

Why does this not work out of the box? How does one specify such a template?
Is it really necessary to work around this issue by using the transformers trunk?

EDIT:
Here's some info I found: https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html?ref=blog.mozilla.ai#chat-template

@haitwang-cloud
Copy link
Contributor

haitwang-cloud commented Sep 11, 2024

@I321065 @simaotwx @lonngxiang I am using the k8s to deploy the LLM models via vLLM 2, and I am using mount a configMap to vLLM pod to fix my template issue , let me know if you more the full yaml file about how to deploy it

      - name: ssdl-mistral-7b
        image:vllm/vllm-openai:latest
        command: ["/bin/sh", "-c"]
        args: [
          "vllm serve mistralai/Mistral-7B-v0.3 --chat-template /etc/chat-template-config/chat-template.j2 --trust-remote-code --enable-chunked-prefill --max_num_batched_tokens 1024"
        ]
        env:
        - name: HUGGING_FACE_HUB_TOKEN
          valueFrom:
            secretKeyRef:
              name: hf-token-secret
              key: token
        - name: VLLM_NO_USAGE_STATS
          value: "1"
        - name: DO_NOT_TRACK
          value: "1"
        - name: PYTHONPATH
          value: "/app/deps"
        ports:
        - containerPort: 8000
        resources:
          limits:
            cpu: "10"
            memory: 20G
            nvidia.com/mig-3g.40gb: "1"
          requests:
            cpu: "2"
            memory: 6G
            nvidia.com/mig-3g.40gb: "1"
        volumeMounts:
        - mountPath: /root/.cache/huggingface
          name: cache-volume
        - name: shm
          mountPath: /dev/shm
        - name: config-volume
          mountPath: /etc/config
        - name: deps-volume
          mountPath: /app/deps
        - name: chat-template-volume
          mountPath: /etc/chat-template-config
--- 
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: chat-template-config
  namespace: ssdl-llm
data:
  chat-template.j2: |
    {%- if messages[0]["role"] == "system" %}
        {%- set system_message = messages[0]["content"] %}
        {%- set loop_messages = messages[1:] %}
    {%- else %}
        {%- set loop_messages = messages %}
    {%- endif %}
    {%- if not tools is defined %}
        {%- set tools = none %}
    {%- endif %}
    {%- set user_messages = loop_messages | selectattr("role", "equalto", "user") | list %}

    {%- for message in loop_messages | rejectattr("role", "equalto", "tool") | rejectattr("role", "equalto", "tool_results") | selectattr("tool_calls", "undefined") %}
        {%- if (message["role"] == "user") != (loop.index0 % 2 == 0) %}
            {{- raise_exception("After the optional system message, conversation roles must alternate user/assistant/user/assistant/...") }}
        {%- endif %}
    {%- endfor %}

    {{- bos_token }}
    {%- for message in loop_messages %}
        {%- if message["role"] == "user" %}
            {%- if tools is not none and (message == user_messages[-1]) %}
                {{- "[AVAILABLE_TOOLS] [" }}
                {%- for tool in tools %}
                    {%- set tool = tool.function %}
                    {{- '{"type": "function", "function": {' }}
                    {%- for key, val in tool.items() if key != "return" %}
                        {%- if val is string %}
                            {{- '"' + key + '": "' + val + '"' }}
                        {%- else %}
                            {{- '"' + key + '": ' + val|tojson }}
                        {%- endif %}
                        {%- if not loop.last %}
                            {{- ", " }}
                        {%- endif %}
                    {%- endfor %}
                    {{- "}}" }}
                    {%- if not loop.last %}
                        {{- ", " }}
                    {%- else %}
                        {{- "]" }}
                    {%- endif %}
                {%- endfor %}
                {{- "[/AVAILABLE_TOOLS]" }}
            {%- endif %}
            {%- if loop.last and system_message is defined %}
                {{- "[INST] " + system_message + "\n\n" + message["content"] + "[/INST]" }}
            {%- else %}
                {{- "[INST] " + message["content"] + "[/INST]" }}
            {%- endif %}
        {%- elif message["role"] == "tool_calls" or message.tool_calls is defined %}
            {%- if message.tool_calls is defined %}
                {%- set tool_calls = message.tool_calls %}
            {%- else %}
                {%- set tool_calls = message.content %}
            {%- endif %}
            {{- "[TOOL_CALLS] [" }}
            {%- for tool_call in tool_calls %}
                {%- set out = tool_call.function|tojson %}
                {{- out[:-1] }}
                {%- if not tool_call.id is defined or tool_call.id|length < 9 %}
                    {{- raise_exception("Tool call IDs should be alphanumeric strings with length >= 9! (1)" + tool_call.id) }}
                {%- endif %}
                {{- ', "id": "' + tool_call.id[-9:] + '"}' }}
                {%- if not loop.last %}
                    {{- ", " }}
                {%- else %}
                    {{- "]" + eos_token }}
                {%- endif %}
            {%- endfor %}
        {%- elif message["role"] == "assistant" %}
            {{- " " + message["content"] + eos_token }}
        {%- elif message["role"] == "tool_results" or message["role"] == "tool" %}
            {%- if message.content is defined and message.content.content is defined %}
                {%- set content = message.content.content %}
            {%- else %}
                {%- set content = message.content %}
            {%- endif %}
            {{- '[TOOL_RESULTS] {"content": ' + content|string + ", " }}
            {%- if not message.tool_call_id is defined or message.tool_call_id|length < 9 %}
                {{- raise_exception("Tool call IDs should be alphanumeric strings with length >= 9! (2)" + message.tool_call_id) }}
            {%- endif %}
            {{- '"call_id": "' + message.tool_call_id[-9:] + '"}[/TOOL_RESULTS]' }}
        {%- else %}
            {{- raise_exception("Only user and assistant roles are supported, with the exception of an initial optional system message!") }}
        {%- endif %}
    {%- endfor %}

@Mahamadoulng
Copy link

Hi same issue for me. I am trying vllm with facebook/opt-125m using openai template, can someone help ? ValueError: As of transformers v4.44, default chat template is no longer allowed, so you must provide a chat template if the tokenizer does not define one.

@byerose
Copy link

byerose commented Oct 24, 2024

Hi same issue for me. I am trying vllm with facebook/opt-125m using openai template, can someone help ? ValueError: As of transformers v4.44, default chat template is no longer allowed, so you must provide a chat template if the tokenizer does not define one.你好,对我来说同样的问题。我正在尝试使用 openai 模板通过 facebook/opt-125m 进行 vllm,有人可以帮忙吗? ValueError:从 Transformer v4.44 开始,不再允许使用默认聊天模板,因此如果标记生成器未定义聊天模板,则必须提供聊天模板。

A solution: https://blog.csdn.net/yuanlulu/article/details/142929234

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
usage How to use vllm
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants