-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support vllm engine. #40
Conversation
Test results
Benchmark (
|
So is there any solution about the fake stream output? Maybe we can implement the true stream output referring to Qwen's implementation or vllm openai api server implementation. |
vllm 0.2.7 requires pydantic==1.10.13, but you have pydantic 2.5.3 which is incompatible.
vllm 0.2.7 requires transformers>=4.36.0, but you have transformers 4.33.2 which is incompatible. AFAIK,
python3 server.py --listen 0.0.0.0:5000 --trust_remote_code --model_name_or_path ./models/Sakura-13B-LNovel-v0_8-4bit --model_version 0.8 --no-auth --log debug --vllm ok my fault, just because 3090 out of memory when try to run |
It's strange that 3090 will OOM. It seems that it's because
|
ok, |
OK, the fake stream output problem should be solved now. |
This can be very helpful for those who don't know much about how the params work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, tested on 0.8 4bit gptq
Still, we need to solve the transformer == 4.33.2
issue soon or later.
I'll update README.md and pyinstaller settings soon. |
Related issue: #39
TODO:
AsyncLLMEngine
) ->Update: need to fixFixedLLM
)GPTQ
/AWQ
量化模型推理添加 requirements-> Update: conflict withtransformers==4.33.2
, won't add to requirements.txt尚未完成所有测试,先提个draftDone.
Install
Run:
or
before running
pip3 install transformers==4.33.2 sentencepiece xformers