Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Model] Added GLM-4 series hf format model support vllm==0.6.4 #10561

Merged
merged 10 commits into from
Nov 28, 2024

Conversation

sixsixcoder
Copy link
Contributor

@sixsixcoder sixsixcoder commented Nov 22, 2024

Overview

This update adds GLM-4 series text model support vllm==0.6.4, which is different from GLM-4v #9242
This code is written according to transformers==4.46.0

FIX #5306

Usage

from transformers import AutoTokenizer
from vllm import LLM, SamplingParams

max_model_len, tp_size = 8192, 1
model_name = "THUDM/glm-4-9b-chat-hf"
prompt = [{"role": "user", "content": "Hello, my name is"}]

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
llm = LLM(
    model=model_name,
    tensor_parallel_size=tp_size,
    max_model_len=max_model_len,
    trust_remote_code=True,
    enforce_eager=True,
)
stop_token_ids = [151329, 151336, 151338]
sampling_params = SamplingParams(temperature=0.95, max_tokens=1024, stop_token_ids=stop_token_ids)

inputs = tokenizer.apply_chat_template(prompt, tokenize=False, add_generation_prompt=True)
outputs = llm.generate(prompts=inputs, sampling_params=sampling_params)

print(outputs[0].outputs[0].text)

Copy link

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

Copy link
Collaborator

@Isotr0py Isotr0py left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments.

Since the HF format GLM is highly similar to Llama, I prefer to inherit from existing Llama implementation to reduce code complexity. :)

vllm/model_executor/models/glm.py Outdated Show resolved Hide resolved
vllm/model_executor/models/glm.py Outdated Show resolved Hide resolved
@Isotr0py Isotr0py changed the title [Model] Added GLM-4 series model support vllm==0.6.4 [Model] Added GLM-4 series hf format model support vllm==0.6.4 Nov 22, 2024
@sixsixcoder
Copy link
Contributor Author

Thank you for your reply. I have updated the code according to your suggestion. Can it be merged?

Copy link
Collaborator

@Isotr0py Isotr0py left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparing with the llama implementation, I think we can remove more duplicated codes to simplify the model implementation by inheriting from llama.

vllm/model_executor/models/glm.py Outdated Show resolved Hide resolved
vllm/model_executor/models/glm.py Outdated Show resolved Hide resolved
vllm/model_executor/models/glm.py Outdated Show resolved Hide resolved
sixsixcoder and others added 2 commits November 27, 2024 09:48
Co-authored-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
@Isotr0py
Copy link
Collaborator

Isotr0py commented Nov 27, 2024

Hey @sixsixcoder, I just made a refactor on your GLM implementation to directly inherit from LlamaForCausalLM just like Phi-3, so that we can significantly reduce duplicated codes and support features like PP and LoRA without extra PRs. PTAL!

(Please correct me if there's any issues about model implementation, because you are model vendors and more familiar with GLM than me) :)

@sixsixcoder
Copy link
Contributor Author

Hey @sixsixcoder, I just made a refactor on your GLM implementation to directly inherit from LlamaForCausalLM just like Phi-3, so that we can significantly reduce duplicated codes and support features like PP and LoRA without extra PRs. PTAL!

(Please correct me if there's any issues about model implementation, because you are model vendors and more familiar with GLM than me) :)

Yes, you are right. Thank you for helping to reorganize the code. I hope it can be merged as soon as possible. This work will enable vllm to support GLM-4 and GLM-Edge series text models.

Signed-off-by: Isotr0py <2037008807@qq.com>
@mergify mergify bot added the documentation Improvements or additions to documentation label Nov 28, 2024
@Isotr0py
Copy link
Collaborator

@sixsixcoder Can you merge from main branch? So that we can get some failing CI fixed.

Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, have you tested this model?

Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
@Isotr0py
Copy link
Collaborator

Yes, it can work with both tp=2 and pp=2.

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) November 28, 2024 06:06
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 28, 2024
@sixsixcoder
Copy link
Contributor Author

@sixsixcoder Can you merge from main branch? So that we can get some failing CI fixed.

have merged from main branch, but the CI is still failing. What should I do?

@DarkLight1337
Copy link
Member

I think the basic models test failure is from this PR.

Signed-off-by: Isotr0py <2037008807@qq.com>
@Isotr0py
Copy link
Collaborator

Oh, just noticed this model is introduced in transformers >= 4.46.0, have added test skip for it.

@DarkLight1337 DarkLight1337 merged commit 5fc5ce0 into vllm-project:main Nov 28, 2024
47 checks passed
afeldman-nm pushed a commit to neuralmagic/vllm that referenced this pull request Dec 2, 2024
…project#10561)

Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com>
sleepwalker2017 pushed a commit to sleepwalker2017/vllm that referenced this pull request Dec 13, 2024
…project#10561)

Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
BKitor pushed a commit to BKitor/vllm that referenced this pull request Dec 30, 2024
…project#10561)

Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[New Model] GLM-4-9B-Chat
3 participants