Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Prototype][WIP] Prefix Cache Aware Scheduling for V0 #9862

Closed
wants to merge 13 commits into from

Conversation

rickyyx
Copy link
Contributor

@rickyyx rickyyx commented Oct 31, 2024

FIX #7883 in V0

Copy link

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

@rickyyx
Copy link
Contributor Author

rickyyx commented Oct 31, 2024

Main changes plz take a look

  • block manager
  • scheduler
  • sequence

Please ignore the test codes for now. cc @comaniac

@rickyyx rickyyx changed the title Prefix Cache Aware Scheduling for V0 [WIP] Prefix Cache Aware Scheduling for V0 Oct 31, 2024
@comaniac comaniac self-assigned this Oct 31, 2024
@rickyyx rickyyx marked this pull request as ready for review November 5, 2024 01:01
Copy link

mergify bot commented Nov 5, 2024

This pull request has merge conflicts that must be resolved before it can be
merged. @rickyyx please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Nov 5, 2024
@rickyyx
Copy link
Contributor Author

rickyyx commented Nov 5, 2024

Still in draft - but running CIs for some early correctness signals.

@rickyyx rickyyx requested a review from KuntaiDu as a code owner November 6, 2024 22:57
@rickyyx rickyyx changed the title [WIP] Prefix Cache Aware Scheduling for V0 [Prototype][WIP] Prefix Cache Aware Scheduling for V0 Nov 7, 2024
@rickyyx
Copy link
Contributor Author

rickyyx commented Nov 27, 2024

Merged in #10128

Remaining items in this PR not being merged:

  • Inject block hash into the block at block creation rather having the block owning it (already done in V1)
  • Remove token ids from block if the hash is not computed by the block itself (already done in V1)
  • Lora support

@rickyyx rickyyx closed this Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Performance]: Prefix-caching aware scheduling
2 participants