Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于不同的调度策略是如何在系统中实现的 #9

Closed
wiluen opened this issue Aug 12, 2024 · 1 comment
Closed

关于不同的调度策略是如何在系统中实现的 #9

wiluen opened this issue Aug 12, 2024 · 1 comment

Comments

@wiluen
Copy link

wiluen commented Aug 12, 2024

感谢分享出色的工作!我有个疑惑的点:
分享的代码看起来貌似是通过simulation来进行测试的,在真实的系统中是否是提前收集了一批query在不同的执行顺序下的推理性能结果(模拟不同的调度结果)?或者是在真实的LLM系统中实现了这个基于prediction的调度器?

@James-QiuHaoran
Copy link
Owner

我们在vllm上有一个sjf简单的实现(来替换fcfs scheduler),还没release;可以参考vllm的priority scheduling feature,把sequence长度设置为priority的inverse即可

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants