Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider dropping MachineHealthCheck conditions validations #9762

Closed
rikatz opened this issue Nov 22, 2023 · 6 comments · Fixed by #9774
Closed

Consider dropping MachineHealthCheck conditions validations #9762

rikatz opened this issue Nov 22, 2023 · 6 comments · Fixed by #9774
Labels
area/machinehealthcheck Issues or PRs related to machinehealthchecks kind/feature Categorizes issue or PR as related to a new feature. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@rikatz
Copy link

rikatz commented Nov 22, 2023

What would you like to be added (User Story)?

As an operator, I want to use Machine Health Check to detect when my node was deleted, but without relying on any other additional conditions.

Detailed Description

As today cloud providers can detect if a node was deleted from the underlying infrastructure (eg. vsphere-cloud-provider) and delete the node from Kubernetes cluster (kubectl delete node), and as Machine Health Check today also checks if the node is present, otherwise it reconciles, this is enough for a basic cloud health check and no further condition should be required

This way, the request is to consider dropping the "existence" of conditions on a MachineHealthCheck and make MHC controller be able to reconcile NodeNotFound errors only

Anything else you would like to add?

No response

Label(s) to be applied

/kind feature

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 22, 2023
@enxebre
Copy link
Member

enxebre commented Nov 22, 2023

This way, the request is to consider dropping the "existence" of conditions on a MachineHealthCheck and make MHC controller be able to reconcile NodeNotFound errors only

We already support the missing node scenario


Are you suggesting something different?

@rikatz
Copy link
Author

rikatz commented Nov 22, 2023

The issue is not the missing node scenario :) Is that validation webhook forces me to set conditions on MHC, even if I just want to check the missing node scenario and nothing else.

So today I need to set some random condition in https://doc.crds.dev/github.com/kubernetes-sigs/cluster-api/cluster.x-k8s.io/MachineHealthCheck/v1beta1@v1.6.0-beta.0#spec-unhealthyConditions just because I don't care about unhealthy Conditions, but just with node deleted scenario

@sbueringer
Copy link
Member

sbueringer commented Nov 22, 2023

I'm +1 to drop the validation:

// +kubebuilder:validation:MinItems=1
which basically will make unhealthyConditions an optional field (or at least allow to set it to an empty array).

I don't see a reason why it should be mandatory, but maybe I'm missing something.

@fabriziopandini Opinions?

Note: We should also check for validations in other places, like e.g. this one for ClusterClass / Cluster.spec.topology...:

if len(m.Spec.UnhealthyConditions) == 0 {
allErrs = append(allErrs, field.Forbidden(
fldPath.Child("unhealthyConditions"),
"must have at least one entry",
))
}

@sbueringer sbueringer added the area/machinehealthcheck Issues or PRs related to machinehealthchecks label Nov 22, 2023
@enxebre
Copy link
Member

enxebre commented Nov 22, 2023

The issue is not the missing node scenario :) Is that validation webhook forces me to set conditions on MHC, even if I just want to check the missing node scenario and nothing else.

Ah, understood. I'm +1 as well for specific conditions not being a requirement, I have seen the use case a few times where the api consumer needs to set a "fake" condition just to satisfy the req.

@fabriziopandini
Copy link
Member

I'm generally ok, but it would be great if we find some way to surface that MHC remediates if the node goes away
e.g I'm not sure this is documented in our book or in the description of our API fields

@enxebre
Copy link
Member

enxebre commented Nov 23, 2023

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/machinehealthcheck Issues or PRs related to machinehealthchecks kind/feature Categorizes issue or PR as related to a new feature. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants