-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Usage]: run gguf model need template,how to write? #7978
Comments
+1 |
This may result by the missing chat_template in tokenizer which is a bug fixed by transformers#32908. |
@Isotr0py thanks for your reply. Now I used docker for running vllm, could you push a temp docker image for this issue? |
I think you just need to add this line to the dockerfile: RUN --mount=type=cache,target=/root/.cache/pip \
python3 -m pip install git+https://github.com/huggingface/transformers Before line39-44: # install build and runtime dependencies
COPY requirements-common.txt requirements-common.txt
COPY requirements-adag.txt requirements-adag.txt
COPY requirements-cuda.txt requirements-cuda.txt
RUN --mount=type=cache,target=/root/.cache/pip \
python3 -m pip install -r requirements-cuda.txt |
Why does this not work out of the box? How does one specify such a template? EDIT: |
@I321065 @simaotwx @lonngxiang I am using the k8s to deploy the LLM models via vLLM 2, and I am using mount a configMap to vLLM pod to fix my template issue , let me know if you more the full yaml file about how to deploy it
|
Hi same issue for me. I am trying vllm with facebook/opt-125m using openai template, can someone help ? ValueError: As of transformers v4.44, default chat template is no longer allowed, so you must provide a chat template if the tokenizer does not define one. |
A solution: https://blog.csdn.net/yuanlulu/article/details/142929234 |
Your current environment
BadRequestError: Error code: 400 - {'object': 'error', 'message': 'As of transformers v4.44, default chat template is no longer allowed, so you must provide a chat template if the tokenizer does not define one.', 'type': 'BadRequestError', 'param': None, 'code': 400}
How would you like to use vllm
CUDA_VISIBLE_DEVICES=1 vllm serve /ai/qwen1.5-1.8b.gguf --host 0.0.0.0 --port 10868 --max-model-len 4096 --trust-remote-code --tensor-parallel-size 1 --dtype=half --quantization gguf --load-format gguf
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: