Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Add MachineSetReady condition to MachineDeployment #9262

Merged

Conversation

muraee
Copy link
Contributor

@muraee muraee commented Aug 22, 2023

What this PR does / why we need it:

For example, when the AWSMachine (InfraMachine) fails to be created/provisioned, the MachineSet will propagate that error into the MachinesReady condition which is included in the Ready condition summary:

status:
  conditions:
  - lastTransitionTime: "2023-07-28T09:13:11Z"
    message: 1 of 2 completed
    reason: InstanceProvisionFailed @ /capi-quickstart-md-0-56858f78bxjck8s-7jskm
    severity: Error
    status: "False"
    type: Ready
  - lastTransitionTime: "2023-07-28T09:13:11Z"
    message: 1 of 2 completed
    reason: InstanceProvisionFailed @ /capi-quickstart-md-0-56858f78bxjck8s-7jskm
    severity: Error
    status: "False"
    type: MachinesReady

However, looking at the MachineDeployment, it wouldn't be clear what is failing

status:
  conditions:
  - lastTransitionTime: "2023-07-28T09:13:02Z"
    message: Minimum availability requires 2 replicas, current 0 available
    reason: WaitingForAvailableMachines
    severity: Warning
    status: "False"
    type: Ready

This PR adds a new condition MachineSetReady to MachineDeployment which mirrors the underlying MachineSet Ready condition to provide meaningful errors.
Given the example above, the new condition will look like the following:

status:
  conditions:
  - lastTransitionTime: "2023-07-28T09:13:02Z"
    message: 1 of 2 completed
    reason: InstanceProvisionFailed @ /capi-quickstart-md-0-56858f78bxjck8s-7jskm
    severity: Error
    status: "False"
    type: MachineSetReady

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes ##3486

/area machinedeployment

@k8s-ci-robot k8s-ci-robot added area/machinedeployment Issues or PRs related to machinedeployments cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Aug 22, 2023
@k8s-ci-robot
Copy link
Contributor

Welcome @muraee!

It looks like this is your first PR to kubernetes-sigs/cluster-api 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/cluster-api has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Aug 22, 2023
@k8s-ci-robot
Copy link
Contributor

Hi @muraee. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Aug 22, 2023
@muraee muraee changed the title ✨ Add MachineSetReady condition to MchineDeployment ✨ Add MachineSetReady condition to MachineDeployment Aug 22, 2023
@enxebre
Copy link
Member

enxebre commented Aug 23, 2023

Thanks @muraee! can you please provide concrete examples in the PR desc for use cases where this condition gives additional value?

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Aug 23, 2023
@@ -228,6 +228,14 @@ const (
// machines required (i.e. Spec.Replicas-MaxUnavailable when MachineDeploymentStrategyType = RollingUpdate) are up and running for at least minReadySeconds.
MachineDeploymentAvailableCondition ConditionType = "Available"

// MachineSetReadyCondition reports a summary of current status of the MachineSet owned by the MachineDeployment.
MachineSetReadyCondition ConditionType = "MachineSetReady"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be included in the MD conditions.SetSummary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I am not mistaken, currently if the minReplicas is reached, the MachineDeployment will report Ready to be True, even if one of the machines is failing.
If we include this new condition in the summary, then the MachineDeployment will not be Ready if any machine fails.

Do we want that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right, let's keep the scope to the minimal and avoid changing existing semantic.

@muraee muraee force-pushed the propogate-ms-condition-to-md branch from 6560132 to 03545ca Compare August 23, 2023 10:32
@muraee muraee force-pushed the propogate-ms-condition-to-md branch from 03545ca to 4bc8a1f Compare August 23, 2023 13:35
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Aug 23, 2023
@enxebre
Copy link
Member

enxebre commented Aug 23, 2023

something doesn't like it in the tests



2/4703 Tests Failed. | expand_less
-- | --
sigs.k8s.io/cluster-api/internal/util/ssa: TestPatch/Test_patch_with_Machine expand_less0s{Failed  === RUN   TestPatch/Test_patch_with_Machine     testing.go:1471: test executed panic(nil) or runtime.Goexit: subtest may have called FailNow on a parent test --- FAIL: TestPatch/Test_patch_with_Machine (0.11s) } | sigs.k8s.io/cluster-api/internal/util/ssa: TestPatch/Test_patch_with_Machine expand_less | 0s | {Failed  === RUN   TestPatch/Test_patch_with_Machine     testing.go:1471: test executed panic(nil) or runtime.Goexit: subtest may have called FailNow on a parent test --- FAIL: TestPatch/Test_patch_with_Machine (0.11s) }
sigs.k8s.io/cluster-api/internal/util/ssa: TestPatch/Test_patch_with_Machine expand_less | 0s
{Failed  === RUN   TestPatch/Test_patch_with_Machine     testing.go:1471: test executed panic(nil) or runtime.Goexit: subtest may have called FailNow on a parent test --- FAIL: TestPatch/Test_patch_with_Machine (0.11s) }

lgtm otherwise

@muraee
Copy link
Contributor Author

muraee commented Aug 23, 2023

/test pull-cluster-api-test-main

@muraee
Copy link
Contributor Author

muraee commented Aug 23, 2023

test passed, seems it was just a flake.

conditions.WithFallbackValue(false, clusterv1.WaitingForMachineSetFallbackReason, clusterv1.ConditionSeverityInfo, ""),
)
} else {
conditions.MarkFalse(md, clusterv1.MachineSetReadyCondition, clusterv1.WaitingForMachineSetFallbackReason, clusterv1.ConditionSeverityInfo, "MachineSet not found")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what scenarios would newMS be nil?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

before I added this check, one of the e2e panicked with a nil dereference error.
If I am not wrong, that was the case when the MD was deleting.

@enxebre
Copy link
Member

enxebre commented Aug 24, 2023

I think this is a good step towards providing better UX for MD consumers. Can we think of any additional scenario where the condition would provide value, e.g during a rolling upgrade?

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 24, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 565398f47e778f22b1fd10c7fe4f12a44c74abd8

Copy link
Member

@vincepri vincepri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

/hold
@sbueringer any thoughts here?

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 29, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vincepri

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 29, 2023
@sbueringer
Copy link
Member

sbueringer commented Dec 12, 2023

Looks good. We obviously ignore all the potentially existing old MachineSets, but I assume this is intended.

Not sure if it's confusing if we refer to "the" MachineSet in MachineSetReady while a MachineDeployment can have a lot more than 1 MachineSet.

/lgtm

Not sure if Fabrizio has an opinion on it.

@sbueringer
Copy link
Member

@fabriziopandini ^^

@muraee
Copy link
Contributor Author

muraee commented Jan 16, 2024

Hey @fabriziopandini did you have a chance to review this?

Copy link
Member

@vincepri vincepri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 6, 2024
@vincepri
Copy link
Member

vincepri commented Feb 6, 2024

@fabriziopandini if you have time to review, we can follow-up to address potential comments

@k8s-ci-robot k8s-ci-robot merged commit d8a3aaf into kubernetes-sigs:main Feb 6, 2024
25 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.7 milestone Feb 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/machinedeployment Issues or PRs related to machinedeployments cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants