Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC]: Replaceable Scheduler #7123

Closed
NadavShmayo opened this issue Aug 4, 2024 · 4 comments
Closed

[RFC]: Replaceable Scheduler #7123

NadavShmayo opened this issue Aug 4, 2024 · 4 comments

Comments

@NadavShmayo
Copy link
Contributor

NadavShmayo commented Aug 4, 2024

Motivation.

The default scheduler is functioning well for the basic use case of serving with maximum throughput.
There are still some use cases in which we prioritize other metrics before maximum throughput, for example maintaining fairness between different users.

I specifically have a use case in which I have an application that uses vLLM, and tries to maintain fairness between requests of different users of the application.
By making the scheduler component more abstract and replaceable (perhaps also pluginable) we can allow such use case without having to change the scheduler logic to support each of these use cases.

Proposed Change.

I propose 2 different solutions, one of which may be hard to implement, but allows anyone to implement any scheduling logic they wish without changing any other core logic. The other is simple to implement but doesn't allow full control of the scheduler logic, and the other may be harder to implement but .

Solution 1 - Scheduler plugins

This solution requires defining an abstract base class of a scheduler, and allowing to pass the desired scheduler implementation file path as a CLI argument (or an environment variable).
This idea could also serve as the basis of scheduler plugins - meaning anyone could implement their own scheduler as a package separate from core vLLM, which allows for great extensibility and modularity.

Solution 2 - Support voluntary preemption hooks

This solution is less flexible but should still allow support for most scheduling logic.
This solution means that the Scheduler class should expose public methods for preempt/suspend and resume a SequenceGroup, and then the API can add routes to expose these methods.
This way we allow applications wrapping vLLM to implement their own complex scheduling logic, to give each user it's fair share of scheduling, or any other desired scheduling logic.

Feedback Period.

No response

CC List.

No response

Any Other Things.

Just to make it clear, I'll be happy to implement this, but I want hear some feedback before I go ahead and implement this.

@njhill
Copy link
Member

njhill commented Aug 6, 2024

FYI @apatke @saurabhjha1

@apatke
Copy link
Contributor

apatke commented Aug 7, 2024

Regarding Solution 2, PTAL at #6077 and let us know if you have any feedback

Copy link

github-actions bot commented Nov 6, 2024

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

@github-actions github-actions bot added the stale label Nov 6, 2024
Copy link

github-actions bot commented Dec 6, 2024

This issue has been automatically closed due to inactivity. Please feel free to reopen if you feel it is still relevant. Thank you!

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants