Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEAT] Support GGUF format #2215

Merged
merged 14 commits into from
Nov 30, 2024
Merged

[FEAT] Support GGUF format #2215

merged 14 commits into from
Nov 30, 2024

Conversation

zhengy001
Copy link
Contributor

@zhengy001 zhengy001 commented Nov 27, 2024

Motivation

#1616

Modifications

Support GGUF format

Checklist

  • Format your code according to the Contributor Guide.
  • Add unit tests as outlined in the Contributor Guide.
  • Update documentation as needed, including docstrings or example tutorials.

@zhengy001
Copy link
Contributor Author

lm_head.weight is directly used in many places, however, vllm changes it to be qweight for gguf. This would be an issue.

@merrymercy
Copy link
Contributor

Thanks for the contributions. Can you fix the CI errors?

@zhengy001 zhengy001 force-pushed the zyang_dev branch 2 times, most recently from 5c616a5 to 2cffa70 Compare November 27, 2024 13:10
@zhengy001
Copy link
Contributor Author

Thanks for the contributions. Can you fix the CI errors?

How to trigger the CI?

@zhengy001
Copy link
Contributor Author

lm_head.weight is directly used in many places, however, vllm changes it to be qweight for gguf. This would be an issue.

Pass lm_head to LogitsProcessor and check the weight inside

@merrymercy
Copy link
Contributor

merrymercy commented Nov 27, 2024

@zhengy001 CI won't be triggered for you automatically because you are a first-time contributor. You can send a random typo fix PR and I can merge that for you so your future commits can trigger CI automatically.

@merrymercy
Copy link
Contributor

@zhengy001 Can you fix the CI errors?

@zhengy001
Copy link
Contributor Author

@zhengy001 Can you fix the CI errors?

@merrymercy Sure, working on it.

test/srt/run_suite.py Outdated Show resolved Hide resolved
test/srt/test_gguf.py Show resolved Hide resolved
python/sglang/srt/server_args.py Outdated Show resolved Hide resolved
python/sglang/srt/layers/vocab_parallel_embedding.py Outdated Show resolved Hide resolved
@merrymercy
Copy link
Contributor

#2269 adds you as a new contributor so your future commits will trigger CI automatically

@zhengy001
Copy link
Contributor Author

#2269 adds you as a new contributor so your future commits will trigger CI automatically

@merrymercy :)

# With tie_word_embeddings, we can skip lm_head.weight
# The weight might appear unnecessarily in the files if the model is
# processed with quantization, LoRA, fine-tuning, etc.
if self.config.tie_word_embeddings and "lm_head.weight" in name:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There won't be "lm_head.weight" if self.config.tie_word_embeddings is True

outputs = engine.generate(prompt, sampling_params)["text"]
engine.shutdown()

self.assertEqual(outputs, " it. I have a lot of work")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compared the result with vllm's. Pls suggest if there is a better way.

@merrymercy merrymercy enabled auto-merge (squash) November 30, 2024 07:47
@merrymercy merrymercy disabled auto-merge November 30, 2024 08:44
@merrymercy merrymercy merged commit 883c955 into sgl-project:main Nov 30, 2024
10 of 13 checks passed
merrymercy added a commit that referenced this pull request Dec 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants