-
Notifications
You must be signed in to change notification settings - Fork 585
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] GGUF support #1616
Comments
It should be easy. Could you give us an example command you want us to have? |
|
@remixer-dec @merrymercy But how to inference with sglang? @merrymercy could you please provide a command to do that? |
If you haven't tried it, please don't reply.. This doesn't work at all.
|
@XYZliang maybe read the previous comment? It's not supposed to be working, I replied the command that I was asked for |
Sorry, I didn't pay attention to the users who commented... |
not work. error is: (python311) whk@VM-2-13-ubuntu:~/code/qwen25-3b$ python -m sglang.launch_server --model-path Qwen2.5-3B-Instruct-q5_k_m.gguf --port 8075 --host 0.0.0.0 --mem-fraction-static 0.2 --chat-template template.json During handling of the above exception, another exception occurred: Traceback (most recent call last): |
It should be easy to support. Contributions are welcome! Or you can convert that to HF format. |
Please take a look at this PR: #2215 |
supported by #2215 |
Let's go! Thank you! |
@remixer-dec |
@zhengy001 it is better (at least no collapse), but it keeps generating text without ever stopping (by default) </s>in each request, it does stop correctly, but such information should be loaded from model metadata and, if specified, from --chat-template template. Currently when custom chat template is specified: |
Model EOS is not loaded correctly. Pls check this PR |
Checklist
Motivation
Hi! Since .gguf format is already supported by vLLM, is it be possible to add support for it in SGLang server?
Related resources
No response
The text was updated successfully, but these errors were encountered: