-
Notifications
You must be signed in to change notification settings - Fork 588
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Input_embeds support #2052
Input_embeds support #2052
Conversation
Thanks for the contribution. There is a related PR recently. Can you take a review on that? #2082 |
It seems that this PR update the documents and did more alteration with batch in mind. If their implementation doesn't require radix disabled then it'd be the perfect one among the 2 pr. My changes follow the guide in that open issue and is pretty minimal, with test to try out the feature. Also can you please launch the checks again, I fixed some conditions since the last check you triggered, thank you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution. I feel your solution is better than the other one. We can try to merge yours! I left a few comments.
|
I apologize for the slow reply. |
Motivation
Adding input_embeds support for sglang, motivated and inspired from
#745
Modifications
Changes in /manager to support new io_struct
Changes in /model_executor to support new io_struct
test cases under test/srt to demonstrate usage
In this implementation, curl request has to be made to use input_embeds.
This implementation DOES NOT have ARGS made to compliment input_embeds (I don't know where to make this edit),
When serving, to use input embeds please add
--disable-radix
tagi.e.
python3 -m sglang.launch_server --model-path Qwen/Qwen2.5-0.5B --port 30000 --disable-radix
test_input_embeddings.py is built and tested on Qwen2.5-0.5B, please edit the model name according to the model that you'd like to test
input_embeddings.json is my test case embeddings for Qwen2.5-0.5B model and comparison_response.txt is a comparison between text based input and input_embeds input.
Checklist