Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update model_loader deps and qqq quantization deps #2220

Merged

Conversation

HandH1998
Copy link
Collaborator

@HandH1998 HandH1998 commented Nov 27, 2024

Motivation

Update the model_loader deps and qqq quantization deps for SGLang.

Modifications

We modified the relevant code primarily according to vLLM. Thanks the vLLM team for their significant contributions. Here we list the main modifications.

  • We adapted the model_loader code from https://github.com/vllm-project/vllm/tree/main/vllm/model_executor/model_loader and modified it adaptively for SGLang. The updated model_loader code is located at python/sglang/srt/model_loader.
  • We added registry.py at python/sglang/srt/models/registry.py and registered all the models into class ModelRegistry. Consequently, we removed all monkey patches in python/sglang/srt/model_executor/model_runner.py.
  • We have added load_config.py and device_config.py to python/sglang/srt/configs. Additionally, we removed LoraConfig, CacheConfig, ParallelConfig, and SchedulerConfig as they are set to None and are not being utilized.

@HandH1998
Copy link
Collaborator Author

There are some failures due to cannot import name 'marlin_qqq_gemm' from 'torchao.ops' (/usr/local/lib/python3.10/dist-packages/torchao/ops.py) in the CR. This issue arises because the installed version of torchao is v0.6.1, which does not support marlin_qqq_gemm. Although our marlin_qqq_gemm has been merged into the main branch of torchao, the torchao team has not yet released a new version which supports marlin_qqq_gemm.

Copy link
Member

@zhyncs zhyncs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM left some comments
Except for rope, vllm.distributed and quant, everything else related to vllm needs to be removed, such as some utils
BTW python/sglang/srt/models/phi3_small.py should also be handled

python/sglang/srt/layers/quantization/qqq.py Outdated Show resolved Hide resolved
python/sglang/srt/layers/quantization/qqq.py Outdated Show resolved Hide resolved
python/sglang/srt/model_loader/__init__.py Outdated Show resolved Hide resolved
python/sglang/srt/model_loader/loader.py Outdated Show resolved Hide resolved
python/sglang/srt/model_loader/loader.py Outdated Show resolved Hide resolved
python/sglang/srt/model_loader/utils.py Outdated Show resolved Hide resolved
python/sglang/srt/model_loader/weight_utils.py Outdated Show resolved Hide resolved
@HandH1998
Copy link
Collaborator Author

HandH1998 commented Nov 29, 2024

I have updated the code according to your review.

@HandH1998 HandH1998 force-pushed the sgl_model_loader branch 2 times, most recently from 30d8990 to b6089f9 Compare December 2, 2024 13:44
@zhyncs
Copy link
Member

zhyncs commented Dec 2, 2024

@HandH1998 May you change the permissions of this PR to allow maintainers to update your branch's code? This way, we can also help fix it and speed up the merging process.

@HandH1998
Copy link
Collaborator Author

@zhyncs ok, I have added the permission.

@HandH1998
Copy link
Collaborator Author

I think I have fixed all the issues in the CR. Please review the latest code.

@zhyncs zhyncs changed the base branch from main to HandH1998/sgl_model_loader December 2, 2024 14:35
@zhyncs
Copy link
Member

zhyncs commented Dec 2, 2024

I'll merge this PR into sgl-project:HandH1998/sgl_model_loader first for the nightly gsm8k evaluation. Then, I'll grant you permission to update the PR. @HandH1998 cc @merrymercy @Ying1123

@zhyncs zhyncs merged commit 9255020 into sgl-project:HandH1998/sgl_model_loader Dec 2, 2024
15 checks passed
@zhyncs
Copy link
Member

zhyncs commented Dec 2, 2024

zhyncs added a commit that referenced this pull request Dec 2, 2024
Co-authored-by: HandH1998 <1335248067@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants