-
Notifications
You must be signed in to change notification settings - Fork 39.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invalidate resource requirements on extended resources with only request set #57170
Conversation
/sig-scheduling |
/release-note-non |
/assign @vishh @ConnorDoyle |
pkg/apis/core/helper/helpers.go
Outdated
@@ -167,6 +167,7 @@ var overcommitBlacklist = sets.NewString(string(core.ResourceNvidiaGPU)) | |||
func IsOvercommitAllowed(name core.ResourceName) bool { | |||
return IsDefaultNamespaceResource(name) && | |||
!IsHugePageResourceName(name) && | |||
!IsExtendedResourceName(name) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This predicate is technically correct but redundant since extended resources are by definition disjoint from default namespace resources. Probably we only need one of the two, but it would be helpful to add a comment saying why this was correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, comment makes more sense over redundant line of code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Removed.
if quantity.Cmp(limitQuantity) != 0 && !helper.IsOvercommitAllowed(resourceName) { | ||
allErrs = append(allErrs, field.Invalid(reqPath, quantity.String(), fmt.Sprintf("must be equal to %s limit", resourceName))) | ||
} else if quantity.Cmp(limitQuantity) > 0 { | ||
allErrs = append(allErrs, field.Invalid(reqPath, quantity.String(), fmt.Sprintf("must be less than or equal to %s limit", resourceName))) | ||
} | ||
} else if resourceName == core.ResourceNvidiaGPU { | ||
} else !helper.IsOvercommitAllowed(resourceName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! 👍 This was missed in the first extended resources pass.
However, this is a breaking API change right? I don’t think we can do this and maintain API compatibility with 1.8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ConnorDoyle do you have any suggestions on how we should roll out this API change? Can we first update the extended resource doc to document this side effect for 1.8 and 1.9 and mention our plan on invalidating this kind of resource spec in 1.10, and then have the validation logic take effect in 1.10?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should consider not making this change.
The semantics of resource requests are: a minimum amount provided to a container. There's no indication implied that bursting is allowed for arbitrary resources, although we allow it for cpu and memory and that is documented. Also, we do already disallow limit ≠ request in validation.
Could you outline the specific user benefit of the validation change? Then we can weigh it against the cost.
2da34de
to
d62b12e
Compare
Would like to get some opinions from the scheduling folks on this PR. In short, for extended resources introduced in 1.8, we clearly document that "Extended resources are only supported as integer resources and cannot be overcommitted" in its doc https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#extended-resources, but the current ValidateResourceRequirements check actually allows that only request field is set for extended resources and the current extended resource doc also implies that this kind of resource spec is allowed: "Extended resources cannot be overcommitted, so request and limit must be equal if both are present in a container spec." In this PR, I am thinking to change ValidateResourceRequirements logic to return an error for such resource spec to make it clear that extended resource doesn't allow overcommit, but this may break some existing pod specs that are formatted this way. Would like to get some opinions on whether we should leave the current validation logic as is and rely on the ER doc to communicate the fact that overcommit is not supported on ER, or we should consider to change the validation logic in a future release but document that plan now. |
Copying hidden reply from above:
|
@jiayingz Scheduler uses "Requests" for scheduling and does not control overcommitment. It is up to the node/kubelet to allow a pod to use more resources, including ER, than requested. If kubelet does not allow overcommitment of ER, that will probably be enough for preventing pods from using more resources than requested. |
I am unable to visualize a issue that we may hit if we skip this validation check. In CPU and memory case, it is required to put upper cap, before hand, using |
Related issue: #57276 |
Thanks a lot for the comments! With #57276 fixed, we can probably leave the validation part as is. Will leave this PR open for a few days and close it if no objection is raised. |
if quantity.Cmp(limitQuantity) != 0 && !helper.IsOvercommitAllowed(resourceName) { | ||
allErrs = append(allErrs, field.Invalid(reqPath, quantity.String(), fmt.Sprintf("must be equal to %s limit", resourceName))) | ||
} else if quantity.Cmp(limitQuantity) > 0 { | ||
allErrs = append(allErrs, field.Invalid(reqPath, quantity.String(), fmt.Sprintf("must be less than or equal to %s limit", resourceName))) | ||
} | ||
} else if resourceName == core.ResourceNvidiaGPU { | ||
} else if !helper.IsOvercommitAllowed(resourceName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we some unit tests for this?
As @jiayingz mentioned, extended resources are not expected to allow for overcommit to begin with until we have better Resource APIs that allow for expressing overcommit capabilities offered at the node level. I can't think of a use case for overcommit at the cluster level. |
And as far as device plugin goes, since it does not support overcommit, it should require request to equal limits if it considers requests for allocations. |
@vishh We tried to keep the API same and since in current behavior.
Thats fine too. Its just that we were not sure initially about changing the API behavior. Your point about a very small user base of ER, makes sense though. Please confirm if above three points is the way forward. |
absence of limit typically means give me everything available on the node. |
Apologies for jumping the gun on the other PR. I assumed that we wouldn’t tolerate an incompatible API change, but after all should have waited for discussion to settle before proceeding. Given that @vishh prefers to make the API change, the new plan makes sense. The compelling case for me is a future scenario where we have a way to specify scheduling properties of each resource. This would make the overcommit-ibility of a resource distinguishable from the use site in container specs and less implicit. |
After reading @vishh's comments and given that backward compatibility is not a major concern (given the small number of users), this change LGTM. /lgtm |
Automatic merge from submit-queue (batch tested with PRs 57591, 57369). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Revert back #57278 **What this PR does / why we need it**: This PR reverts back to behavior of scanning Limits. **Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*: Related # #57276 #57170 **Special notes for your reviewer**: **Release note**: ```release-note None ``` /sig node /cc @vishh @ConnorDoyle @jiayingz
/assign @liggitt |
Would love to get this landed. I ran into #57276 myself again (!!!) since the other patch was reverted. |
/assign @thockin |
/approve |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bsalamat, jiayingz, liggitt Associated issue: #57276 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
/test all [submit-queue is verifying that this PR is safe to merge] |
Automatic merge from submit-queue (batch tested with PRs 57037, 57170). If you want to cherry-pick this change to another branch, please follow the instructions here. |
What this PR does / why we need it:
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #57276
Special notes for your reviewer:
Release note: