-
Notifications
You must be signed in to change notification settings - Fork 331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add fine-grained device scheduling proposal #322
add fine-grained device scheduling proposal #322
Conversation
Codecov Report
@@ Coverage Diff @@
## main #322 +/- ##
==========================================
+ Coverage 64.53% 64.85% +0.31%
==========================================
Files 113 116 +3
Lines 11165 11451 +286
==========================================
+ Hits 7205 7426 +221
- Misses 3385 3440 +55
- Partials 575 585 +10
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
91107cb
to
90153e8
Compare
30f2ece
to
fee92ed
Compare
3597156
to
e9547b9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a great proposal, thanks for the work!
I commented some of my questions, please help to answer, thanks
docs/proposals/scheduling/20220629-schedule-device-in-card-level.md
Outdated
Show resolved
Hide resolved
docs/proposals/scheduling/20220629-schedule-device-in-card-level.md
Outdated
Show resolved
Hide resolved
docs/proposals/scheduling/20220629-schedule-device-in-card-level.md
Outdated
Show resolved
Hide resolved
docs/proposals/scheduling/20220629-schedule-device-in-card-level.md
Outdated
Show resolved
Hide resolved
e9c163f
to
7dbe9ee
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's focus on GPU in this proposal first. And more details would be appreciated.
docs/proposals/scheduling/20220629-schedule-device-in-card-level.md
Outdated
Show resolved
Hide resolved
docs/proposals/scheduling/20220629-schedule-device-in-card-level.md
Outdated
Show resolved
Hide resolved
docs/proposals/scheduling/20220629-schedule-device-in-card-level.md
Outdated
Show resolved
Hide resolved
docs/proposals/scheduling/20220629-schedule-device-in-card-level.md
Outdated
Show resolved
Hide resolved
docs/proposals/scheduling/20220629-schedule-device-in-card-level.md
Outdated
Show resolved
Hide resolved
docs/proposals/scheduling/20220629-schedule-device-in-card-level.md
Outdated
Show resolved
Hide resolved
Signed-off-by: yangzhang <bupt_cozy@126.com>
d9f7213
to
f7756a9
Compare
Signed-off-by: Joseph <joseph.t.lee@outlook.com>
26cdcef
to
c022d81
Compare
refactor fine-grained device scheduling
c022d81
to
8a92f5a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: eahydra, hormes, jasonliu747 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
* add schedule-device-in-card-level.md Signed-off-by: yangzhang <bupt_cozy@126.com> * refactor fine-grained device scheduling Signed-off-by: Joseph <joseph.t.lee@outlook.com>
|
||
If the user knows exactly or can roughly estimate the specific memory consumption of the workload, he can apply for GPU memory through `koordinator.sh/gpu-memory`. All details can be seen below. | ||
|
||
Besides, when dimension's value > 100, means Pod need multi-devices. now only allow the value can be divided by 100. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the value of a container's gpu-core is greater than 100 and cannot be divided by 100 (e.g. 101), will the pod be rejected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, should be rejected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it rejected by the webhook, or by the scheduler?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
koordinator.sh/gpu-core should be something like: 25, 51, 77, 100, 200, 300. otherwise it will be rejected by scheduler in Prefilter step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if a pod is rejected by the scheduler, it will go into Pending
phase, and the scheduler will keep retrying to schedule the pod. I think this retry may be useless, and may increase the load on the scheduler. would it be better to reject the pod in the webhook?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be better to reject the pod in the webhook.
Why do we need `koordinator.sh/gpu-memory-ratio` and `koordinator.sh/gpu-memory` ? | ||
When user apply 0.5/0.25 GPU, the user don't know the exact memory total bytes per GPU, only wants to use | ||
half or quarter percentage of memory, so user can request the GPU memory with `koordinator.sh/gpu-memory-ratio`. | ||
When scheduler assigned Pod on concrete node, scheduler will translate the `koordinator.sh/gpu-memory-ratio` to `koordinator.sh/gpu-memory` by the formulas: ***allocatedMemory = totalMemoryOf(GPU) * `koordinator.sh/gpu-memory-ratio`***, so that the GPU isolation can work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scheduler will translate the
koordinator.sh/gpu-memory-ratio
tokoordinator.sh/gpu-memory
Does this mean that scheduler will call kube-apiserver to update the pod's spec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it will update container resources.
memory: "8Gi" | ||
``` | ||
|
||
##### Apply `koordinator.sh/gpu-core` and `koordinator.sh/gpu-memory` separately |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if a container only requests gpu-core
and gpu-memory
, will the amount of gpu-memory-ratio
resources for the node in the scheduler cache be incorrect? because gpu-memory-ratio
resources may be not assumed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we will translate gpu-memory-ratio based on gpu-memory on concrete node.
As we know, the GPU scheduling in kube-scheduler side has no any different with other scalar resources. The concrete | ||
device-level assigning is done by kubelet and GPU device plugin, which will generate container's GPU env. | ||
|
||
Our design has no conflict with the above process. our device reporter will report koordinator GPU resources for kubelet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
our device reporter will report koordinator GPU resources for kubelet updating node resources
how does device reporter report koordinator GPU resources to kubelet?
does this mean that device reporter still implements a device plugin for koordinator GPU resources?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolved after reading #410
@jasonliu747 @zwzhang0107 hello, I have figured out some of these questions after reading #410. However, I still have some doubts about the scheduler:
Thanks for any reply! |
@caohe if it's ok for you, let's discuss this on WeChat or DingTalk. WDYT? |
@jasonliu747 sure, happy to discuss this on WeChat or DingTalk. |
@caohe You can find our DingTalk QR Code in README. Please PM me once you join the group, below is my DingTalk avatar. Thanks. |
Signed-off-by: yangzhang bupt_cozy@126.com
Ⅰ. Describe what this PR does
add schedule-device-in-card-level.md
Ⅱ. Does this pull request fix one issue?
Ⅲ. Describe how to verify it
Ⅳ. Special notes for reviews
V. Checklist
make test