Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(sheduler): take resource quota into consideration during pod group enqueue procedure #1100

Conversation

YuxiJin-tobeyjin
Copy link
Contributor

Try to solve: #1014

  1. Take resourceQuota in the namespace into consideration during podGroup enqueue procedure

For example you can set your resource quota as follows, then your podGroup will stay in pending state if the resource request plus usage is more than quota

spec:
  hard:
    requests.cpu: "32"
    requests.memory: 32Gi
    requests.nvidia.com/gpu: "10"
  1. Add a flag to control this feature, default is “false“, which means disable

@volcano-sh-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: YuxiJin-tobeyjin
To complete the pull request process, please assign kevin-wangzefeng
You can assign the PR to them by writing /assign @kevin-wangzefeng in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@volcano-sh-bot volcano-sh-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Oct 19, 2020
…up enqueue procedure

Signed-off-by: jinyuxi <jinyuxi@cambricon.com>
@YuxiJin-tobeyjin YuxiJin-tobeyjin force-pushed the yuxi/feat-take-resource-quota-into-consideration-during-pod-group-enqueue-procedure branch from 88cdb06 to 685ee39 Compare October 19, 2020 07:23
@YuxiJin-tobeyjin
Copy link
Contributor Author

/assign @kevin-wangzefeng

@Thor-wl
Copy link
Contributor

Thor-wl commented Oct 19, 2020

@william-wang @k82cn

if _, ok := resourceMap[job.Namespace]; ok {
klog.V(3).Infof("For job <%s/%s>, ResourceQuota: %t", job.Namespace, job.Name, minReq.LessEqual(resourceMap[job.Namespace]))
if minReq.LessEqual(resourceMap[job.Namespace]) {
resourceMap[job.Namespace].Sub(minReq)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

l think its better to put idle.Sub(minReq); inqueue = true here so that the following else clause can be removed :)

Copy link
Contributor Author

@YuxiJin-tobeyjin YuxiJin-tobeyjin Oct 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually idle.Sub(minReq);inqueue = true doesn't belong to this if branch;

If we put it here, idle.Sub will not be calculated under circumstances when there is no resourceQuota in the namespace but ssn.JobEnqueueable(job) && minReq.LessEqual(idle) is satisfied.

@@ -114,7 +118,23 @@ func (enqueue *Action) Execute(ssn *framework.Session) {
inqueue = true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And l think even though job.PodGroup.Spec.MinResources == nil, resourceQuota should also be checked?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering minResources is nil, the pods of this podgroup won't consume any resoureQuota, so we don't need it to be checked? WDYT

@stale
Copy link

stale bot commented Dec 25, 2020

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 25, 2020
@Thor-wl
Copy link
Contributor

Thor-wl commented Dec 31, 2020

Please resolve the conflictions

@stale stale bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 31, 2020
@Thor-wl
Copy link
Contributor

Thor-wl commented Feb 9, 2021

Please fix the conflictions and bugs

@YuxiJin-tobeyjin
Copy link
Contributor Author

YuxiJin-tobeyjin commented Feb 9, 2021

Please fix the conflictions and bugs

Okay, I will try to handle the conflicts. Would you please give me some clues about bugs,tks.

@shinytang6
Copy link
Member

Please fix the conflictions and bugs

Okay, I will try to handle the conflicts. Would you please give me some clues about bugs,tks.

Can y please resolve the conflicts, l think the original code is almost ready

@Thor-wl Thor-wl closed this Feb 23, 2021
@Thor-wl Thor-wl reopened this Feb 23, 2021
@YuxiJin-tobeyjin
Copy link
Contributor Author

Since enqueue structure has been changed in master and there are some logic defects in this design, we will give a new proposal with a plugin realization in the near future.

@merryzhou

This PR can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants