-
Notifications
You must be signed in to change notification settings - Fork 583
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update model_loader deps and qqq quantization deps #2220
Update model_loader deps and qqq quantization deps #2220
Conversation
There are some failures due to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM left some comments
Except for rope, vllm.distributed and quant, everything else related to vllm needs to be removed, such as some utils
BTW python/sglang/srt/models/phi3_small.py should also be handled
b17b685
to
29e0eed
Compare
I have updated the code according to your review. |
30d8990
to
b6089f9
Compare
@HandH1998 May you change the permissions of this PR to allow maintainers to update your branch's code? This way, we can also help fix it and speed up the merging process. |
@zhyncs ok, I have added the permission. |
I think I have fixed all the issues in the CR. Please review the latest code. |
b6089f9
to
71bcc5f
Compare
I'll merge this PR into |
9255020
into
sgl-project:HandH1998/sgl_model_loader
Motivation
Update the model_loader deps and qqq quantization deps for SGLang.
Modifications
We modified the relevant code primarily according to vLLM. Thanks the vLLM team for their significant contributions. Here we list the main modifications.
model_loader
code from https://github.com/vllm-project/vllm/tree/main/vllm/model_executor/model_loader and modified it adaptively for SGLang. The updatedmodel_loader
code is located atpython/sglang/srt/model_loader
.registry.py
atpython/sglang/srt/models/registry.py
and registered all the models into classModelRegistry
. Consequently, we removed all monkey patches inpython/sglang/srt/model_executor/model_runner.py
.load_config.py
anddevice_config.py
topython/sglang/srt/configs
. Additionally, we removedLoraConfig
,CacheConfig
,ParallelConfig
, andSchedulerConfig
as they are set toNone
and are not being utilized.