[Model] Enable Inference Support for the New Baichuan-M1 Model #12251

rainkert · 2025-01-21T07:59:24Z

This pull request adds the necessary support to the vLLM framework for the Baichuan-M1 model.

HuggingFace Page:
https://huggingface.co/baichuan-inc/Baichuan-M1-14B-Base
https://huggingface.co/baichuan-inc/Baichuan-M1-14B-Instruct

The Baichuan-M1 (M stands for medicine) model is a medical-enhanced general large model, designed to deliver exceptional performance in healthcare applications while maintaining strong general capabilities. This update ensures that VLLM can seamlessly handle inference for the Baichuan-M1 model, providing both compatibility and optimal performance for a wide range of natural language processing tasks, especially in the medical domain.

github-actions · 2025-01-21T07:59:35Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

mergify · 2025-01-21T08:00:01Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @rainkert.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

jameswu2014 · 2025-01-22T07:48:37Z

LGTM

rainkert · 2025-01-22T08:13:58Z

@youkaichao @zhuohan123 @DarkLight1337 @WoosukKwon
We will be releasing our model on Hugging Face on January 24th（The day after tomorrow), but you can review the code beforehand to identify any issues so we can address them in advance.

vllm/model_executor/models/baichuan_m1.py

Signed-off-by: dangshunya <dangshunya@baichuan-inc.com>

rainkert · 2025-01-24T06:32:25Z

ping @youkaichao @DarkLight1337 @njhill @comaniac @zhuohan123 @WoosukKwon @alexm-redhat
We've released our new model today, plz review this PR and merge ASAP.

DarkLight1337 · 2025-01-24T06:54:54Z

The model itself LGTM, but I'm not so sure about the custom KV cache. Is anyone else familiar with this part of the code?

simon-mo · 2025-01-25T04:48:45Z

Regarding the SWA, can we minimize the code change for now by adopting #10584? While we will work on refactoring the memory manager in #11382 by @heheda12345.

rainkert · 2025-01-25T05:18:47Z

Regarding the SWA, can we minimize the code change for now by adopting #10584? While we will work on refactoring the memory manager in #11382 by @heheda12345.

Because the kvcache used by ordinary layers and swa layers is inconsistent (we have 2 kv heads in normal attention, but 8 kv heads in swa), we cannot simply treat them the same way as in #10584 , but instead need to separately calculate the memory usage.

heheda12345 · 2025-01-25T05:35:16Z

For vLLM v1 engine, you can support normal attention with different hidden size by extending this function.

vllm/vllm/v1/core/kv_cache_utils.py

Line 410 in bf21481

def get_kv_cache_config(vllm_config: VllmConfig, kv_cache_spec: KVCacheSpec,

Then you can try #10584 in v1 to support the mix of normal attention and SWA.
If that works, we can raise an error to ask the user to use v1 engine to run this model if they do not enable vLLM v1.

rainkert requested review from DarkLight1337, ywang96, WoosukKwon, robertgshaw2-redhat, njhill, comaniac, alexm-redhat, zhuohan123 and youkaichao as code owners January 21, 2025 07:59

mergify bot added documentation Improvements or additions to documentation needs-rebase labels Jan 21, 2025

rainkert force-pushed the baichuan-m1 branch from ca4022d to 8e0a9c7 Compare January 21, 2025 10:16

mergify bot removed the needs-rebase label Jan 21, 2025

jeejeelee added the new model Requests to new models label Jan 21, 2025

rainkert force-pushed the baichuan-m1 branch from 8e0a9c7 to 23b8eab Compare January 22, 2025 07:59

jeejeelee reviewed Jan 22, 2025

View reviewed changes

vllm/model_executor/models/baichuan_m1.py Show resolved Hide resolved

[New Model] support Baichuan-M1

8939174

Signed-off-by: dangshunya <dangshunya@baichuan-inc.com>

rainkert force-pushed the baichuan-m1 branch from ba31039 to 8939174 Compare January 24, 2025 04:07

DarkLight1337 mentioned this pull request Jan 24, 2025

Release v0.7.0 #12365

Open

8 tasks

simon-mo mentioned this pull request Jan 27, 2025

Release v0.7.3 #12465

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Model] Enable Inference Support for the New Baichuan-M1 Model #12251

[Model] Enable Inference Support for the New Baichuan-M1 Model #12251

rainkert commented Jan 21, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Jan 21, 2025

mergify bot commented Jan 21, 2025

jameswu2014 commented Jan 22, 2025

rainkert commented Jan 22, 2025

rainkert commented Jan 24, 2025

DarkLight1337 commented Jan 24, 2025 •

edited

Loading

simon-mo commented Jan 25, 2025

rainkert commented Jan 25, 2025 •

edited

Loading

heheda12345 commented Jan 25, 2025

[Model] Enable Inference Support for the New Baichuan-M1 Model #12251

Are you sure you want to change the base?

[Model] Enable Inference Support for the New Baichuan-M1 Model #12251

Conversation

rainkert commented Jan 21, 2025 • edited by github-actions bot Loading

github-actions bot commented Jan 21, 2025

mergify bot commented Jan 21, 2025

jameswu2014 commented Jan 22, 2025

rainkert commented Jan 22, 2025

rainkert commented Jan 24, 2025

DarkLight1337 commented Jan 24, 2025 • edited Loading

simon-mo commented Jan 25, 2025

rainkert commented Jan 25, 2025 • edited Loading

heheda12345 commented Jan 25, 2025

rainkert commented Jan 21, 2025 •

edited by github-actions bot

Loading

DarkLight1337 commented Jan 24, 2025 •

edited

Loading

rainkert commented Jan 25, 2025 •

edited

Loading