[core] gemma2 full context length support #10584

youkaichao · 2024-11-22T23:57:14Z

the scheduler treats it as a model without sliding window, and sliding window is only used for computation.

FIX #6220
FIX #8580

Signed-off-by: youkaichao <youkaichao@gmail.com>

github-actions · 2024-11-22T23:57:27Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

youkaichao · 2024-11-23T00:02:40Z

fixes #9517

WoosukKwon

Just a heads up: the paged attention kernel we use for the xFormers backend doesn't support sliding window attention. This PR will introduce a slight correctness bug in the xformers backend.

Signed-off-by: youkaichao <youkaichao@gmail.com>

youkaichao · 2024-11-23T01:36:20Z

Just a heads up: the paged attention kernel we use for the xFormers backend doesn't support sliding window attention. This PR will introduce a slight correctness bug in the xformers backend.

for xformers, we keep the original behavior of capping the max-model-length.

noamgat · 2024-11-23T20:05:28Z

This looks great! Which attention backend do you recommend for gemma 2 now?

youkaichao · 2024-11-24T00:30:21Z

@noamgat the default one (flash attention) should work.

Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Maxime Fournioux <55544262+mfournioux@users.noreply.github.com>

azsh1725 · 2024-12-09T17:31:16Z

Hi! Thanks for this fix.

Can you, please, tell me when the release with this fix is planned?

Signed-off-by: youkaichao <youkaichao@gmail.com>

azsh1725 · 2025-01-07T18:49:52Z

Hi! Thanks for this fix.

Can you, please, tell me when the release with this fix is planned?

For those interested, the release with this fix is version 0.6.5

youkaichao added 5 commits November 22, 2024 15:20

fix alternating sliding window

f0401d5

Signed-off-by: youkaichao <youkaichao@gmail.com>

add tests

7c4700d

Signed-off-by: youkaichao <youkaichao@gmail.com>

add tests

c846ff7

Signed-off-by: youkaichao <youkaichao@gmail.com>

add comments

cab0770

Signed-off-by: youkaichao <youkaichao@gmail.com>

add comments

615020a

Signed-off-by: youkaichao <youkaichao@gmail.com>

youkaichao requested a review from WoosukKwon November 22, 2024 23:59

WoosukKwon approved these changes Nov 23, 2024

View reviewed changes

WoosukKwon reviewed Nov 23, 2024

View reviewed changes

youkaichao added 3 commits November 22, 2024 17:29

restore old behavior for xformers

5f8c223

Signed-off-by: youkaichao <youkaichao@gmail.com>

skip tests

553069a

Signed-off-by: youkaichao <youkaichao@gmail.com>

fix xformers

2fd08d4

Signed-off-by: youkaichao <youkaichao@gmail.com>

youkaichao enabled auto-merge (squash) November 23, 2024 01:35

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 23, 2024

youkaichao disabled auto-merge November 23, 2024 04:13

youkaichao merged commit 4aba6e3 into vllm-project:main Nov 23, 2024
63 of 68 checks passed

youkaichao deleted the fix_gemma2 branch November 23, 2024 04:13

patrickvonplaten mentioned this pull request Nov 23, 2024

Interleaving sliding window for Ministral-8B-Instruct-2410 #10591

Merged

yxchng mentioned this pull request Nov 28, 2024

[Installation]: vLLM build from source errors #8532

Closed

1 task

youkaichao mentioned this pull request Dec 12, 2024

[Usage]: Can we extend the context length of gemma2 model or other models? #10548

Open

1 task

sleepwalker2017 pushed a commit to sleepwalker2017/vllm that referenced this pull request Dec 13, 2024

[core] gemma2 full context length support (vllm-project#10584)

d32ce32

Signed-off-by: youkaichao <youkaichao@gmail.com>

simon-mo mentioned this pull request Jan 25, 2025

[Model] Enable Inference Support for the New Baichuan-M1 Model #12251

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[core] gemma2 full context length support #10584

[core] gemma2 full context length support #10584

youkaichao commented Nov 22, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Nov 22, 2024

youkaichao commented Nov 23, 2024

WoosukKwon left a comment

youkaichao commented Nov 23, 2024

noamgat commented Nov 23, 2024

youkaichao commented Nov 24, 2024

azsh1725 commented Dec 9, 2024

azsh1725 commented Jan 7, 2025

[core] gemma2 full context length support #10584

[core] gemma2 full context length support #10584

Conversation

youkaichao commented Nov 22, 2024 • edited by github-actions bot Loading

github-actions bot commented Nov 22, 2024

youkaichao commented Nov 23, 2024

WoosukKwon left a comment

Choose a reason for hiding this comment

youkaichao commented Nov 23, 2024

noamgat commented Nov 23, 2024

youkaichao commented Nov 24, 2024

azsh1725 commented Dec 9, 2024

azsh1725 commented Jan 7, 2025

youkaichao commented Nov 22, 2024 •

edited by github-actions bot

Loading