[Model] Added GLM-4 series hf format model support vllm==0.6.4 #10561

sixsixcoder · 2024-11-22T02:17:27Z

Overview

This update adds GLM-4 series text model support vllm==0.6.4, which is different from GLM-4v #9242
This code is written according to transformers==4.46.0

FIX #5306

Usage

from transformers import AutoTokenizer
from vllm import LLM, SamplingParams

max_model_len, tp_size = 8192, 1
model_name = "THUDM/glm-4-9b-chat-hf"
prompt = [{"role": "user", "content": "Hello, my name is"}]

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
llm = LLM(
    model=model_name,
    tensor_parallel_size=tp_size,
    max_model_len=max_model_len,
    trust_remote_code=True,
    enforce_eager=True,
)
stop_token_ids = [151329, 151336, 151338]
sampling_params = SamplingParams(temperature=0.95, max_tokens=1024, stop_token_ids=stop_token_ids)

inputs = tokenizer.apply_chat_template(prompt, tokenize=False, add_generation_prompt=True)
outputs = llm.generate(prompts=inputs, sampling_params=sampling_params)

print(outputs[0].outputs[0].text)

github-actions · 2024-11-22T02:17:39Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

Isotr0py

Some initial comments.

Since the HF format GLM is highly similar to Llama, I prefer to inherit from existing Llama implementation to reduce code complexity. :)

vllm/model_executor/models/glm.py

sixsixcoder · 2024-11-26T08:03:44Z

Thank you for your reply. I have updated the code according to your suggestion. Can it be merged?

Isotr0py

Comparing with the llama implementation, I think we can remove more duplicated codes to simplify the model implementation by inheriting from llama.

vllm/model_executor/models/glm.py

Co-authored-by: Isotr0py <2037008807@qq.com>

Signed-off-by: Isotr0py <2037008807@qq.com>

Isotr0py · 2024-11-27T15:00:32Z

Hey @sixsixcoder, I just made a refactor on your GLM implementation to directly inherit from LlamaForCausalLM just like Phi-3, so that we can significantly reduce duplicated codes and support features like PP and LoRA without extra PRs. PTAL!

(Please correct me if there's any issues about model implementation, because you are model vendors and more familiar with GLM than me) :)

sixsixcoder · 2024-11-28T01:58:16Z

Hey @sixsixcoder, I just made a refactor on your GLM implementation to directly inherit from LlamaForCausalLM just like Phi-3, so that we can significantly reduce duplicated codes and support features like PP and LoRA without extra PRs. PTAL!

(Please correct me if there's any issues about model implementation, because you are model vendors and more familiar with GLM than me) :)

Yes, you are right. Thank you for helping to reorganize the code. I hope it can be merged as soon as possible. This work will enable vllm to support GLM-4 and GLM-Edge series text models.

Signed-off-by: Isotr0py <2037008807@qq.com>

Isotr0py · 2024-11-28T05:55:10Z

@sixsixcoder Can you merge from main branch? So that we can get some failing CI fixed.

vllm/model_executor/models/glm.py

DarkLight1337

Looks good, have you tested this model?

Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>

Isotr0py · 2024-11-28T06:01:46Z

Yes, it can work with both tp=2 and pp=2.

sixsixcoder · 2024-11-28T09:40:35Z

@sixsixcoder Can you merge from main branch? So that we can get some failing CI fixed.

have merged from main branch, but the CI is still failing. What should I do?

DarkLight1337 · 2024-11-28T09:42:39Z

I think the basic models test failure is from this PR.

Signed-off-by: Isotr0py <2037008807@qq.com>

Isotr0py · 2024-11-28T09:47:21Z

Oh, just noticed this model is introduced in transformers >= 4.46.0, have added test skip for it.

…project#10561) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com>

…project#10561) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>

Added GLM-4 series model support vllm==0.6.4

4f84afe

Isotr0py reviewed Nov 22, 2024

View reviewed changes

vllm/model_executor/models/glm.py Outdated Show resolved Hide resolved

vllm/model_executor/models/glm.py Outdated Show resolved Hide resolved

Isotr0py changed the title ~~[Model] Added GLM-4 series model support vllm==0.6.4~~ [Model] Added GLM-4 series hf format model support vllm==0.6.4 Nov 22, 2024

sixsixcoder added 3 commits November 25, 2024 17:30

inherit LlamaMLP and LlamaAttention

b4ee606

Merge remote-tracking branch 'upstream/main'

c0a06d4

inherit LlamaMLP and LlamaAttention

8624149

sixsixcoder mentioned this pull request Nov 26, 2024

ValueError: Model architectures ['GlmForCausalLM'] are not supported for now THUDM/GLM-4#662

Closed

2 tasks

Isotr0py reviewed Nov 26, 2024

View reviewed changes

vllm/model_executor/models/glm.py Outdated Show resolved Hide resolved

vllm/model_executor/models/glm.py Outdated Show resolved Hide resolved

vllm/model_executor/models/glm.py Outdated Show resolved Hide resolved

sixsixcoder and others added 2 commits November 27, 2024 09:48

Update vllm/model_executor/models/glm.py

5028019

Co-authored-by: Isotr0py <2037008807@qq.com>

cleanup GLM implement

8f21112

Signed-off-by: Isotr0py <2037008807@qq.com>

add registry test

7e57962

Signed-off-by: Isotr0py <2037008807@qq.com>

Isotr0py requested review from DarkLight1337 and ywang96 as code owners November 28, 2024 05:52

mergify bot added the documentation Improvements or additions to documentation label Nov 28, 2024

DarkLight1337 reviewed Nov 28, 2024

View reviewed changes

vllm/model_executor/models/glm.py Outdated Show resolved Hide resolved

DarkLight1337 approved these changes Nov 28, 2024

View reviewed changes

Update vllm/model_executor/models/glm.py

cb127dc

Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>

DarkLight1337 enabled auto-merge (squash) November 28, 2024 06:06

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 28, 2024

Merge branch 'vllm-project:main' into glm-4

4155357

skip glm init in test CI

b44ca79

Signed-off-by: Isotr0py <2037008807@qq.com>

DarkLight1337 merged commit 5fc5ce0 into vllm-project:main Nov 28, 2024
47 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Model] Added GLM-4 series hf format model support vllm==0.6.4 #10561

[Model] Added GLM-4 series hf format model support vllm==0.6.4 #10561

sixsixcoder commented Nov 22, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Nov 22, 2024

Isotr0py left a comment

sixsixcoder commented Nov 26, 2024

Isotr0py left a comment

Isotr0py commented Nov 27, 2024 •

edited

Loading

sixsixcoder commented Nov 28, 2024

Isotr0py commented Nov 28, 2024

DarkLight1337 left a comment

Isotr0py commented Nov 28, 2024

sixsixcoder commented Nov 28, 2024

DarkLight1337 commented Nov 28, 2024

Isotr0py commented Nov 28, 2024

[Model] Added GLM-4 series hf format model support vllm==0.6.4 #10561

[Model] Added GLM-4 series hf format model support vllm==0.6.4 #10561

Conversation

sixsixcoder commented Nov 22, 2024 • edited by github-actions bot Loading

Overview

Usage

github-actions bot commented Nov 22, 2024

Isotr0py left a comment

Choose a reason for hiding this comment

sixsixcoder commented Nov 26, 2024

Isotr0py left a comment

Choose a reason for hiding this comment

Isotr0py commented Nov 27, 2024 • edited Loading

sixsixcoder commented Nov 28, 2024

Isotr0py commented Nov 28, 2024

DarkLight1337 left a comment

Choose a reason for hiding this comment

Isotr0py commented Nov 28, 2024

sixsixcoder commented Nov 28, 2024

DarkLight1337 commented Nov 28, 2024

Isotr0py commented Nov 28, 2024

sixsixcoder commented Nov 22, 2024 •

edited by github-actions bot

Loading

Isotr0py commented Nov 27, 2024 •

edited

Loading