Update model_loader deps and qqq quantization deps #2220

HandH1998 · 2024-11-27T09:27:00Z

Motivation

Update the model_loader deps and qqq quantization deps for SGLang.

Modifications

We modified the relevant code primarily according to vLLM. Thanks the vLLM team for their significant contributions. Here we list the main modifications.

We adapted the model_loader code from https://github.com/vllm-project/vllm/tree/main/vllm/model_executor/model_loader and modified it adaptively for SGLang. The updated model_loader code is located at python/sglang/srt/model_loader.
We added registry.py at python/sglang/srt/models/registry.py and registered all the models into class ModelRegistry. Consequently, we removed all monkey patches in python/sglang/srt/model_executor/model_runner.py.
We have added load_config.py and device_config.py to python/sglang/srt/configs. Additionally, we removed LoraConfig, CacheConfig, ParallelConfig, and SchedulerConfig as they are set to None and are not being utilized.

HandH1998 · 2024-11-27T09:51:29Z

There are some failures due to cannot import name 'marlin_qqq_gemm' from 'torchao.ops' (/usr/local/lib/python3.10/dist-packages/torchao/ops.py) in the CR. This issue arises because the installed version of torchao is v0.6.1, which does not support marlin_qqq_gemm. Although our marlin_qqq_gemm has been merged into the main branch of torchao, the torchao team has not yet released a new version which supports marlin_qqq_gemm.

zhyncs

Overall LGTM left some comments
Except for rope, vllm.distributed and quant, everything else related to vllm needs to be removed, such as some utils
BTW python/sglang/srt/models/phi3_small.py should also be handled

python/sglang/srt/layers/quantization/qqq.py

python/sglang/srt/model_loader/__init__.py

python/sglang/srt/model_loader/loader.py

python/sglang/srt/model_loader/utils.py

python/sglang/srt/model_loader/weight_utils.py

HandH1998 · 2024-11-29T08:33:05Z

I have updated the code according to your review.

python/sglang/srt/configs/load_config.py

python/sglang/srt/model_loader/weight_utils.py

zhyncs · 2024-12-02T13:47:04Z

@HandH1998 May you change the permissions of this PR to allow maintainers to update your branch's code? This way, we can also help fix it and speed up the merging process.

HandH1998 · 2024-12-02T13:52:43Z

@zhyncs ok, I have added the permission.

HandH1998 · 2024-12-02T14:14:47Z

I think I have fixed all the issues in the CR. Please review the latest code.

zhyncs · 2024-12-02T14:37:20Z

I'll merge this PR into sgl-project:HandH1998/sgl_model_loader first for the nightly gsm8k evaluation. Then, I'll grant you permission to update the PR. @HandH1998 cc @merrymercy @Ying1123

zhyncs · 2024-12-02T14:41:09Z

ref https://github.com/sgl-project/sglang/actions/runs/12121646155

Co-authored-by: HandH1998 <1335248067@qq.com>

HandH1998 requested review from merrymercy, Ying1123, hnyls2002, zhyncs, ispobock and ByronHsu as code owners November 27, 2024 09:27

zhyncs assigned zhyncs and ispobock Nov 27, 2024

zhyncs added the high priority label Nov 27, 2024

zhyncs assigned Ying1123 Nov 27, 2024

zhyncs reviewed Nov 27, 2024

View reviewed changes

HaiShaw self-requested a review November 28, 2024 09:02

zhyncs mentioned this pull request Nov 28, 2024

[Track] progress in removing vLLM dependencies #2245

Open

2 tasks

HandH1998 force-pushed the sgl_model_loader branch from b17b685 to 29e0eed Compare November 29, 2024 08:17

merrymercy reviewed Dec 1, 2024

View reviewed changes

python/sglang/srt/configs/load_config.py Outdated Show resolved Hide resolved

merrymercy reviewed Dec 1, 2024

View reviewed changes

python/sglang/srt/model_loader/weight_utils.py Outdated Show resolved Hide resolved

HandH1998 force-pushed the sgl_model_loader branch 2 times, most recently from 30d8990 to b6089f9 Compare December 2, 2024 13:44

remove model_loader deps on vllm

71bcc5f

HandH1998 force-pushed the sgl_model_loader branch from b6089f9 to 71bcc5f Compare December 2, 2024 14:20

zhyncs changed the base branch from main to HandH1998/sgl_model_loader December 2, 2024 14:35

zhyncs approved these changes Dec 2, 2024

View reviewed changes

zhyncs merged commit 9255020 into sgl-project:HandH1998/sgl_model_loader Dec 2, 2024
15 checks passed

zhyncs added a commit that referenced this pull request Dec 2, 2024

Update model_loader deps and qqq quantization deps (#2220) (#2318)

85e1a6f

Co-authored-by: HandH1998 <1335248067@qq.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update model_loader deps and qqq quantization deps #2220

Update model_loader deps and qqq quantization deps #2220

HandH1998 commented Nov 27, 2024 •

edited

Loading

HandH1998 commented Nov 27, 2024

zhyncs left a comment

HandH1998 commented Nov 29, 2024 •

edited

Loading

zhyncs commented Dec 2, 2024

HandH1998 commented Dec 2, 2024

HandH1998 commented Dec 2, 2024

zhyncs commented Dec 2, 2024

zhyncs commented Dec 2, 2024

Update model_loader deps and qqq quantization deps #2220

Update model_loader deps and qqq quantization deps #2220

Conversation

HandH1998 commented Nov 27, 2024 • edited Loading

Motivation

Modifications

HandH1998 commented Nov 27, 2024

zhyncs left a comment

Choose a reason for hiding this comment

HandH1998 commented Nov 29, 2024 • edited Loading

zhyncs commented Dec 2, 2024

HandH1998 commented Dec 2, 2024

HandH1998 commented Dec 2, 2024

zhyncs commented Dec 2, 2024

zhyncs commented Dec 2, 2024

HandH1998 commented Nov 27, 2024 •

edited

Loading

HandH1998 commented Nov 29, 2024 •

edited

Loading