[FEAT] Support GGUF format #2215

zhengy001 · 2024-11-27T07:23:56Z

Motivation

Modifications

Support GGUF format

Checklist

Format your code according to the Contributor Guide.
Add unit tests as outlined in the Contributor Guide.
Update documentation as needed, including docstrings or example tutorials.

zhengy001 · 2024-11-27T08:23:59Z

lm_head.weight is directly used in many places, however, vllm changes it to be qweight for gguf. This would be an issue.

merrymercy · 2024-11-27T11:32:11Z

Thanks for the contributions. Can you fix the CI errors?

zhengy001 · 2024-11-27T13:11:44Z

Thanks for the contributions. Can you fix the CI errors?

How to trigger the CI?

zhengy001 · 2024-11-27T14:10:45Z

lm_head.weight is directly used in many places, however, vllm changes it to be qweight for gguf. This would be an issue.

Pass lm_head to LogitsProcessor and check the weight inside

merrymercy · 2024-11-27T23:39:07Z

@zhengy001 CI won't be triggered for you automatically because you are a first-time contributor. You can send a random typo fix PR and I can merge that for you so your future commits can trigger CI automatically.

merrymercy · 2024-11-30T00:04:39Z

@zhengy001 Can you fix the CI errors?

zhengy001 · 2024-11-30T00:06:55Z

@zhengy001 Can you fix the CI errors?

@merrymercy Sure, working on it.

test/srt/run_suite.py

test/srt/test_gguf.py

python/sglang/srt/server_args.py

python/sglang/srt/layers/vocab_parallel_embedding.py

merrymercy · 2024-11-30T00:39:35Z

#2269 adds you as a new contributor so your future commits will trigger CI automatically

zhengy001 · 2024-11-30T00:42:03Z

#2269 adds you as a new contributor so your future commits will trigger CI automatically

@merrymercy :)

zhengy001 · 2024-11-30T04:17:17Z

python/sglang/srt/models/olmo.py

-            # With tie_word_embeddings, we can skip lm_head.weight
-            # The weight might appear unnecessarily in the files if the model is
-            # processed with quantization, LoRA, fine-tuning, etc.
-            if self.config.tie_word_embeddings and "lm_head.weight" in name:


There won't be "lm_head.weight" if self.config.tie_word_embeddings is True

zhengy001 · 2024-11-30T04:20:57Z

test/srt/test_gguf.py

+        outputs = engine.generate(prompt, sampling_params)["text"]
+        engine.shutdown()
+
+        self.assertEqual(outputs, " it. I have a lot of work")


Compared the result with vllm's. Pls suggest if there is a better way.

python/sglang/srt/layers/logits_processor.py

This reverts commit 883c955.

zhengy001 requested review from merrymercy, Ying1123, zhyncs, hnyls2002, ispobock and ByronHsu as code owners November 27, 2024 07:23

zhengy001 mentioned this pull request Nov 27, 2024

[Feature] GGUF support #1616

Closed

2 tasks

zhengy001 force-pushed the zyang_dev branch from 8462af3 to 75d9978 Compare November 27, 2024 08:20

zhengy001 force-pushed the zyang_dev branch 2 times, most recently from 5c616a5 to 2cffa70 Compare November 27, 2024 13:10

merrymercy requested changes Nov 30, 2024

View reviewed changes

test/srt/run_suite.py Outdated Show resolved Hide resolved

test/srt/test_gguf.py Show resolved Hide resolved

python/sglang/srt/server_args.py Outdated Show resolved Hide resolved

python/sglang/srt/layers/vocab_parallel_embedding.py Outdated Show resolved Hide resolved

zhengy001 commented Nov 30, 2024

View reviewed changes

zhengy001 force-pushed the zyang_dev branch from 804c608 to bc4b19f Compare November 30, 2024 05:05

Yang Zheng(SW)(Alex) and others added 7 commits November 30, 2024 05:23

Support GGUF format

33fc64e

Update test

ec2dd82

Format

28328bf

ServerArgs: remove gguf_file

8f42879

Pass lm_head to LogitsProcessor

8bf1890

Fix CI

02a56b0

Fix CI and cleanup tie_word_embeddings

f1ffb30

zhengy001 added 2 commits November 30, 2024 05:24

Update run_suite.py order

f027e7d

Check quant_config properity

8f8ceba

zhengy001 force-pushed the zyang_dev branch from bc4b19f to 8f8ceba Compare November 30, 2024 05:25

Merge branch 'main' into zyang_dev

2a34061

merrymercy approved these changes Nov 30, 2024

View reviewed changes

merrymercy enabled auto-merge (squash) November 30, 2024 07:47

Update llama.py

b3eaf49

merrymercy reviewed Nov 30, 2024

View reviewed changes

python/sglang/srt/layers/logits_processor.py Outdated Show resolved Hide resolved

merrymercy added 3 commits November 30, 2024 00:10

Apply suggestions from code review

7d79172

Merge branch 'main' into zyang_dev

540dd3e

Merge branch 'main' into zyang_dev

8a2e8ef

merrymercy disabled auto-merge November 30, 2024 08:44

merrymercy merged commit 883c955 into sgl-project:main Nov 30, 2024
10 of 13 checks passed

merrymercy added a commit that referenced this pull request Dec 1, 2024

Revert "[FEAT] Support GGUF format (#2215)"

0b295e5

This reverts commit 883c955.

merrymercy mentioned this pull request Dec 1, 2024

Revert "[FEAT] Support GGUF format" #2285

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEAT] Support GGUF format #2215

[FEAT] Support GGUF format #2215

zhengy001 commented Nov 27, 2024 •

edited

Loading

zhengy001 commented Nov 27, 2024

merrymercy commented Nov 27, 2024

zhengy001 commented Nov 27, 2024

zhengy001 commented Nov 27, 2024

merrymercy commented Nov 27, 2024 •

edited

Loading

merrymercy commented Nov 30, 2024

zhengy001 commented Nov 30, 2024

merrymercy commented Nov 30, 2024

zhengy001 commented Nov 30, 2024

zhengy001 Nov 30, 2024

zhengy001 Nov 30, 2024

[FEAT] Support GGUF format #2215

[FEAT] Support GGUF format #2215

Conversation

zhengy001 commented Nov 27, 2024 • edited Loading

Motivation

Modifications

Checklist

zhengy001 commented Nov 27, 2024

merrymercy commented Nov 27, 2024

zhengy001 commented Nov 27, 2024

zhengy001 commented Nov 27, 2024

merrymercy commented Nov 27, 2024 • edited Loading

merrymercy commented Nov 30, 2024

zhengy001 commented Nov 30, 2024

merrymercy commented Nov 30, 2024

zhengy001 commented Nov 30, 2024

zhengy001 Nov 30, 2024

Choose a reason for hiding this comment

zhengy001 Nov 30, 2024

Choose a reason for hiding this comment

zhengy001 commented Nov 27, 2024 •

edited

Loading

merrymercy commented Nov 27, 2024 •

edited

Loading