-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for all-or-nothing scale-up strategy #6821
Add support for all-or-nothing scale-up strategy #6821
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some test cases?
return stopAllOrNothingScaleUp(podEquivalenceGroups, skippedNodeGroups, nodeGroups) | ||
} | ||
} | ||
|
||
// Execute scale up. | ||
klog.V(1).Infof("Final scale-up plan: %v", scaleUpInfos) | ||
aErr, failedNodeGroups := o.scaleUpExecutor.ExecuteScaleUps(scaleUpInfos, nodeInfos, now) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also add allOrNothing variable here, and use AtomicIncreaseSize() method if allOrNothing=true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -79,9 +80,9 @@ func (o *WrapperOrchestrator) ScaleUp( | |||
} | |||
|
|||
if o.scaleUpRegularPods { | |||
return o.podsOrchestrator.ScaleUp(regularPods, nodes, daemonSets, nodeInfos) | |||
return o.podsOrchestrator.ScaleUp(regularPods, nodes, daemonSets, nodeInfos, allOrNothing) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for podsOrchestrator allOrNothing=false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is false and set in static_autoscaler.go, I'm just passing the variable here.
I wanted this to be set to whatever is the default for the autoscaler, and the custom orchestrators to override it only as needed (like atomic scale-up for ProvisioningRequests will).
0fd691d
to
accc515
Compare
/lgtm |
if allOrNothing && bestOption.NodeGroup.MaxSize() < newNodes { | ||
klog.V(1).Infof("Can only create a new node group with max %d nodes, need %d nodes", bestOption.NodeGroup.MaxSize(), newNodes) | ||
return stopAllOrNothingScaleUp(podEquivalenceGroups, skippedNodeGroups, nodeGroups) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed? Will it not be checked in 155-156
and ComputeExpansionOption
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
In
155-156
we only exclude cases when there's at least one pod that can't be scheduled on nodes belonging to this group at all. If all pods are compatible with that node group, we'll just get a higher node count (possibly higher than group max size - but we can still balance across similar groups later). -
In
computeExpansionOption
we only cap the node count for special case (groups that scale from 0 to max).
We don't check group max size for all cases in either of those places because of balancing across similar groups, which we only do just before executing scale-up.
So without this check, for a scale-up not affected by the above two cases we could create a node group, then decide not to scale it up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think MaxSize()
is checked before balancing node groups below, because the amount over max can be distributed to the similar node groups.
But why check newNodes
against the max size of just one group, if we're balancing between groups below (and verifying that the total capacity is big enough) anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But why check newNodes against the max size of just one group, if we're balancing between groups below (and verifying that the total capacity is big enough) anyway?
This is checked only for autoprovisioned group just before it's created. I think we currently don't have information how many similar groups there'll be. I suspect estimating this may require changes on cloud provider side (for example, if creating a group in one zone implicitly creates groups in another zone with identical taint/labels/resources).
We agreed with @kisieland offline that it's acceptable to limit it for autoprovisioning to max size of one group initially and leave an open issue for handling this better for cases where an autoprovisioned groups will have unobvious similar groups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I understand correctly that the final total capacity check would still catch this (because the balancing processor also caps all the requests to MaxSize()), but this lets us avoid an unnecessary node group creation in that case?
In any case, this seems okay. As a nit, maybe leave a TODO for the issue when you have it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I understand correctly that the final total capacity check would still catch this (because the balancing processor also caps all the requests to MaxSize()), but this lets us avoid an unnecessary node group creation in that case?
Yes, that's exactly the case.
As a nit, maybe leave a TODO for the issue when you have it.
Will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add some test cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of my new test cases are in the other PR: #6824
I can add some here as well to have more granularity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added test case which fails if allOrNothing
is false. More extensive test cases remain in #6824
var err error | ||
if allOrNothing { | ||
err = info.Group.AtomicIncreaseSize(increase) | ||
} else { | ||
err = info.Group.IncreaseSize(increase) | ||
} | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this should fallback to IncreaseSize(increase)
if AtomicIncreaseSize(increase)
is not implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
accc515
to
60ffbf1
Compare
if allOrNothing && bestOption.NodeGroup.MaxSize() < newNodes { | ||
klog.V(1).Infof("Can only create a new node group with max %d nodes, need %d nodes", bestOption.NodeGroup.MaxSize(), newNodes) | ||
return stopAllOrNothingScaleUp(podEquivalenceGroups, skippedNodeGroups, nodeGroups) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think MaxSize()
is checked before balancing node groups below, because the amount over max can be distributed to the similar node groups.
But why check newNodes
against the max size of just one group, if we're balancing between groups below (and verifying that the total capacity is big enough) anyway?
cluster-autoscaler/provisioningrequest/orchestrator/orchestrator.go
Outdated
Show resolved
Hide resolved
5524ddd
to
4a9d939
Compare
@@ -89,6 +89,7 @@ func (o *ScaleUpOrchestrator) ScaleUp( | |||
nodes []*apiv1.Node, | |||
daemonSets []*appsv1.DaemonSet, | |||
nodeInfos map[string]*schedulerframework.NodeInfo, | |||
allOrNothing bool, // Either request enough capacity for all unschedulablePods, or don't request it at all. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Could you also copy the comment to the Orchestrator interface (e.g. in the follow-up PR)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do in follow-up PR
// Can't execute a scale-up that will accommodate all pods, so nothing is considered schedulable. | ||
klog.V(1).Info("Not attempting scale-up due to all-or-nothing strategy: not all pods would be accommodated") | ||
markedEquivalenceGroups := markAllGroupsAsUnschedulable(podEquivalenceGroups, AllOrNothingReason) | ||
return buildNoOptionsAvailableStatus(markedEquivalenceGroups, skippedNodeGroups, nodeGroups), nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use the helper function for the 2 other ScaleUpNoOptionsAvailable returns from this method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be fixed in the follow-up PR as well if you prefer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do!
Left 3 nits, otherwise LGTM! Feel free to unhold if you don't agree with the nits or if you prefer to address them in the follow-up PR. /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: aleksandra-malinowska, towca The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Stuff left to address in follow-up PR:
/unhold |
* Add support for all-or-nothing scale-up strategy * Review fixes
/cherry-pick cluster-autoscaler-release-1.30 |
@yaroslava-serdiuk: #6821 failed to apply on top of branch "cluster-autoscaler-release-1.30":
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
* Add support for all-or-nothing scale-up strategy * Review fixes
/cherry-pick cluster-autoscaler-release-1.30 |
@yaroslava-serdiuk: new pull request created: #7015 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
What type of PR is this?
/kind feature
What this PR does / why we need it:
Implements all-or-nothing scaling strategy. This is needed for supporting
atomic-scale-up.kubernetes.io
ProvisioningRequests: https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/proposals/provisioning-request.mdIssue: #6815
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
/cc @yaroslava-serdiuk @towca