Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert scale-down checks to drainability rules #6164

Merged
merged 9 commits into from
Oct 11, 2023

Conversation

artemvmin
Copy link
Contributor

@artemvmin artemvmin commented Sep 30, 2023

What type of PR is this?

/kind feature

What this PR does / why we need it:

Convert replicated pod, kube-system, not-safe-to-evict annotation, local storage, and pdb scale-down checks to drainability rules.

Related: #6135

Does this PR introduce a user-facing change?

None

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Sep 30, 2023
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 30, 2023
@artemvmin
Copy link
Contributor Author

/assign x13n

@artemvmin artemvmin changed the title Scale down drainability Convert scale-down checks to scalability rules Sep 30, 2023
@artemvmin artemvmin changed the title Convert scale-down checks to scalability rules Convert scale-down checks to drainability rules Sep 30, 2023
@artemvmin artemvmin force-pushed the scale-down-drainability branch 2 times, most recently from 00ba312 to 1775757 Compare September 30, 2023 07:26

// Drainable decides what to do with replicated pods on node drain.
func (Rule) Drainable(drainCtx *drainability.DrainContext, pod *apiv1.Pod) drainability.Status {
if drain.IsPodLongTerminating(pod, drainCtx.Timestamp) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of having checks like this which are not relevant to the specific rule (why does replicated rule check for things like whether the pod is long terminating? Or - below - why does it check whether safe-to-evict annotation is added?), I'd just make sure the rules are used in the right order. Otherwise it may be hard to reason about them and even harder to make changes since now the same check appears in many different rules.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original logic only performs checks (replicated, safe-to-evict, system, etc.) if the pod is not long terminating. Therefore, all of these rules have to check this and exit with nil (i.e. UndefinedStatus) if the pod is not worth considering. Indenting error flow is the recommended style pattern: https://google.github.io/styleguide/go/decisions#indent-error-flow

All checks that return non-nil (i.e. BlockingStatus) are correctly distributed among rules.

Do you have ideas for how to simplify this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking was to have a Rule for handling long terminating pods and just skipping them. This is exactly what happens with mirror pods already - they could be combined into a single rule to reduce logic-to-boilerplate ratio - or kept as separate super-simple rules.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great idea. Done.

Unlike the mirror pods, we can't use skip directly, as these pods are usually not skipped. I added a field to the drainability.Status object to interrupt remaining rule checks without causing a skip condition.

cluster-autoscaler/simulator/drainability/context.go Outdated Show resolved Hide resolved
replicated.New(),
system.New(),
notsafetoevict.New(),
localstorage.New(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

localstorage doesn't make sense to be used at all when DeleteOptions.SkipNodesWithLocalStorage is false. Can DeleteOptions be passed here to determine which rules are used instead?

Similarly, system is only useful when SkipNodesWithSystemPods is true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That trades a tiny amount of computation for coupling between rules. With the current implementation, the rules framework just registers known rules and doesn't have to have any understanding of rule conditions. Putting any amount of validation higher up in the drainability rules framework is a bad idea for modularity.

One option is to add an Enabled(drainCtx, deleteOptions) method to the rules interface that will be run when generating rules. However, I would argue the overhead isn't worth it. A single Drainable() function that returns early is minimal computation and keeps all logic in a single place.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rules.Default already has to handle enablement flags to specific rules - it's just now the rules need to handle that state themselves, which makes each one more complex - some rules now have to handle a case in which they behave as if they didn't exist at all. It's not really about saving compute time - you're right that's negligible - but rather about keeping the rules logic focus on what they're supposed to be checking. I'd expect something else (now it is rules.Default) to collect all rules that should be used.

That being said, I don't have a strong opinion on this, so leaving it up to you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented this in the latest commit. I'm still not convinced.

What I like about this implementation is that it attempts to keep DeleteOptions at the framework level and not pass them down to the rules. Unfortunately, we still need to pass the legacy custom controller option to the replicated rule until the option is deprecated, which nullifies this argument.

What I dislike about it is that it is adding complexity to the framework, whithout much benefit. While not having a rule is conceptually better than the enabled field within select rules, the field is actually well contained within a rule and uses minimal compute to exit early. I can see this evolving into more business logic living at the DefaultRules construction and that turning into spaghetti code. Simple registration and delegation for all business logic is a simpler model.

Please take a look and let me know what you think.

Another alternative is to add an Enabled() field to every rule, as previously suggested, but this is the worst of both worlds (all options still passed to the rules + more boilerplate).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned before, I don't have a strong preference here - current version looks good to me, but adding enabled field within specific rules is also acceptable.

@artemvmin artemvmin force-pushed the scale-down-drainability branch 3 times, most recently from 68c7829 to 26afed5 Compare October 4, 2023 01:19
@@ -48,6 +48,23 @@ type Status struct {
BlockingReason drain.BlockingPodReason
// Error contains an optional error message.
Error error
// Interrupted means that the Rule returning the status exited early and that
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has the same semantics as Outcome != UndefinedOutcome. Can you clarify why do you think it is needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was necessary to deal with the branched logic, but it's no longer needed with the below refactor.

@@ -74,188 +70,44 @@ const (
UnexpectedError
)

// GetPodsForDeletionOnNodeDrain returns pods that should be deleted on node drain as well as some extra information
// about possibly problematic pods (unreplicated and DaemonSets).
// GetPodsForDeletionOnNodeDrain returns pods that should be deleted on node
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cannot this function be just removed at this point?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't do this because it's a public function and may have downstream dependencies. But since you think it's a good idea, I merged the logic and migrated the tests.

@artemvmin artemvmin force-pushed the scale-down-drainability branch 3 times, most recently from d0d4456 to b3e3eeb Compare October 7, 2023 00:25
pods = append(pods, podInfo.Pod)
case drainability.DrainOk:
case drainability.UndefinedOutcome, drainability.DrainOk:
if drain.IsPodLongTerminating(pod, timestamp) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whether to skip a pod or not should be determined by rules, here you are effectively overriding the decision to treat both UndefinedOutcome and DrainOk as SkipDrain for long terminating pods. Why?

Copy link
Contributor Author

@artemvmin artemvmin Oct 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, SkipDrain doesn't get added to the drainable pods list (notice how there is no switch case for SkipDrain). UndefinedOutcome used to pass pods to the GetPodsForDeletionOnNodeDrain, but these pods are now integrated into the drainability checks. UndefinedOutcome and DrainOk are treated identically after the refactor.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies if I wasn't clear. I didn't mean to treat DrainOk/UndefinedOutcome and SkipDrain the same here, just that the special case for long terminating pods effectively causes DrainOk to be treated as if it was SkipDrain. Since now longterminating rule simply returns SkipDrain, the whole special casing is really a no-op.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. You want this condition removed. Agreed that this can be delegated to the longterminating rule.

return drainability.NewBlockedStatus(drain.MinReplicasReached, fmt.Errorf("replication controller for %s/%s has too few replicas spec: %d min: %d", pod.Namespace, pod.Name, rc.Spec.Replicas, r.minReplicaCount))
}
} else if pod_util.IsDaemonSetPod(pod) {
if refKind == "DaemonSet" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be !=. Might be worth adding a unit test that would catch this...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch. Added tests.

limitations under the License.
*/

package customcontroller
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Would package replicacount better reflect the logic here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// Drainable decides what to do with long terminating pods on node drain.
func (r *Rule) Drainable(drainCtx *drainability.DrainContext, pod *apiv1.Pod) drainability.Status {
if drain.IsPodLongTerminating(pod, drainCtx.Timestamp) {
return drainability.NewDrainableStatus()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be SkipDrain to be consistent with previous logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you're right. The continue doesn't add it to the list. Fixed.

ObjectMeta: metav1.ObjectMeta{
Name: "bar",
Namespace: "default",
DeletionTimestamp: &metav1.Time{Time: testTime.Add(-2 * time.Duration(extendedGracePeriod) * time.Second)},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original test case had DeletionTimestamp extendedGracePeriod/2 seconds ago, not 2*extendedGracePeriod seconds ago and so wasn't considered long terminating - it should result in drainability.NewUndefinedStatus().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was a test that I believe was incorrectly constructed and that I fixed during the refactor. We want to test that extended grace period specified in the spec is respected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the test case you described in addition to the existing one.

cluster-autoscaler/utils/drain/drain.go Show resolved Hide resolved
@artemvmin artemvmin force-pushed the scale-down-drainability branch 3 times, most recently from 6de45a3 to 6c71099 Compare October 9, 2023 17:04
},
want: drainability.NewSkipStatus(),
},
"long terminating pod with expired grace period": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this one actually doesn't have "expired" grace period - it is extended grace period that didn't expire yet. "expired" better matches the previous test case actually.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'm understanding now. We want to skip pods that are expired. Also added another case for the zero extended duration case.

pods = append(pods, podInfo.Pod)
case drainability.DrainOk:
case drainability.UndefinedOutcome, drainability.DrainOk:
if drain.IsPodLongTerminating(pod, timestamp) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies if I wasn't clear. I didn't mean to treat DrainOk/UndefinedOutcome and SkipDrain the same here, just that the special case for long terminating pods effectively causes DrainOk to be treated as if it was SkipDrain. Since now longterminating rule simply returns SkipDrain, the whole special casing is really a no-op.

Copy link
Member

@x13n x13n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks good to me now.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 11, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: artemvmin, x13n

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 11, 2023
@k8s-ci-robot k8s-ci-robot merged commit 133fdc7 into kubernetes:master Oct 11, 2023
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/cluster-autoscaler cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants