Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ MachineDeployment - Available, Progressing and Ready Conditions #4174

Closed
wants to merge 1 commit into from

Conversation

Arvinderpal
Copy link
Contributor

@Arvinderpal Arvinderpal commented Feb 10, 2021

What this PR does / why we need it:

This brings the use of Conditions to MachineDeployment. More specifically, it brings the Available, Progressing and Failure Conditions. The code borrows heavenly from the Deployment/ReplicaSet approach for conditions.

Fixes # #3486

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Feb 10, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign justinsb after the PR has been reviewed.
You can assign the PR to them by writing /assign @justinsb in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link
Contributor

Hi @Arvinderpal. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 10, 2021
@Arvinderpal Arvinderpal force-pushed the md-conditions branch 3 times, most recently from 43b696f to 034ffc7 Compare February 13, 2021 22:31
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 13, 2021
@Arvinderpal Arvinderpal changed the title WIP: ✨ MachineDeployment - Available, Progressing and Failure Conditions ✨ MachineDeployment - Available, Progressing and Failure Conditions Feb 13, 2021
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 13, 2021
@vincepri
Copy link
Member

/ok-to-test

Comment on lines 193 to 200
// Progressing means the MachineDeployment is progressing. Progress for a MachineDeployment is
// considered when a new machine set is created or adopted, and when new machine scale
// up or old machine scale down. Progress is not estimated for paused deployments or
// when progressDeadlineSeconds is not specified.
MachineDeploymentProgressing ConditionType = "MachineDeploymentProgressing"
// MachineFailure is added in a MachineDeployment when one of its machines fails to be created
// or deleted.
MachineDeploymentMachineFailure ConditionType = "MachineDeploymentMachineFailure"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two conditions seem a bit out of place compared to others, given that they could be a reason for Ready or similar, rather than on their own. @fabriziopandini what do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to give some context.
The code is modeled after the Deployment/ReplicaSet code

Additionally, the much like the kubectl rollout status command, the clusterctl rollout status will monitor the Progressing condition.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, although to be consistent with the rest of CAPI conditions, we might want to change things up a little bit. I'd see having a Ready condition on MachineDeployment that not only aggregates the status of each Machine in a MachineDeployment, but can tell users if there is a failure or a scaling operation is in progress. This would be a Reason for the condition to be False

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Vince we should aim to better consistency, and in this case, we should revisit things in order to have all the conditions using a positive polarity.

Please note there is also the option to propose an amendment to the current condition proposal, so we can discuss pros and cons and address all the related concerns e.g. how do we compute summary in case of conditions with mixed polarity? what are the required changes to util/conditions in order to help users in shifting to the new model?

Copy link
Contributor Author

@Arvinderpal Arvinderpal Mar 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vincepri I have created a MachineDeploymentReadyCondition which encapsulates failures in the underlying MachineSets. The MachineDeploymentProgressing IMO is essential. Again to be in line with how Deployment/RS work and in particular since MD/MS has been modeled around that, the Progressing seems to be an important condition. It's not clear to me how just a Ready condition can capture progression by itself. For example, NewMSAvailableReason and MachineSetUpdatedReason are Reasons for progressing condition. How do they get incorporated into Ready? Also, clusterctl rollout status essentially watches the Progressing condition and spits out updates as they happen. Again, not sure how this command would be impacted or even be useful.

@fabriziopandini Positive polarity only seems too restrictive to me. Also, I see various places where conditions.MarkFalse() is used already.

Copy link
Member

@richardcase richardcase Mar 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on ensuring we have the Ready condition. This is for consistency and making this kstatus compatible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Progressing a way to describe that a rolling upgrade is happening, should we be more explicit here? Thinking about the rest of the system as well, should we do something similar in KCP and express better when a resource is performing a rolling operation?

Before merging and proceeding with any of these, I'd like to find some consistency and apply it everywhere.

Comment on lines 202 to 220
// Reasons for deployment conditions
//
// Progressing:
// NewMSAvailableReason is added in a deployment when its newest machine set is made available
NewMSAvailableReason = "NewMachineSetAvailable"
// TimedOutReason is added in a deployment when its newest machine set fails to show any progress
// within the given deadline (progressDeadlineSeconds).
TimedOutReason = "ProgressDeadlineExceeded"
// MachineSetUpdatedReason is added in a deployment when one of its machine sets is updated as part
// of the rollout process.
MachineSetUpdatedReason = "MachineSetUpdated"
//
// Available:

// MinimumMachinesAvailable is added in a deployment when it has its minimum machines required available.
MinimumMachinesAvailable = "MinimumMachinesAvailable"
// MinimumMachinesUnavailable is added in a deployment when it doesn't have the minimum required machines
// available.
MinimumMachinesUnavailable = "MinimumMachinesUnavailable"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Is it possible to map reasons and conditions, so it will be easier to understand the expected workflow/state diagram for each condition?

Comment on lines +561 to +562
// If the previous condition has been a successful rollout then we shouldn't try to
// estimate any progress. Scenario:
Copy link
Member

@fabriziopandini fabriziopandini Feb 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a general recommendation, we should avoid to determine the current status on the basis of the previous one, but instead, we should always reconcile with the current state of things.

@fabriziopandini
Copy link
Member

/milestone v0.4.0

@k8s-ci-robot k8s-ci-robot added this to the v0.4.0 milestone Mar 5, 2021
@vincepri
Copy link
Member

vincepri commented Mar 8, 2021

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 8, 2021
@Arvinderpal Arvinderpal changed the title ✨ MachineDeployment - Available, Progressing and Failure Conditions ✨ MachineDeployment - Available, Progressing and Ready Conditions Mar 9, 2021
@k8s-ci-robot
Copy link
Contributor

@Arvinderpal: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-cluster-api-build-main adb1bfc link /test pull-cluster-api-build-main
pull-cluster-api-apidiff-main adb1bfc link /test pull-cluster-api-apidiff-main
pull-cluster-api-make-main adb1bfc link /test pull-cluster-api-make-main
pull-cluster-api-e2e-main adb1bfc link /test pull-cluster-api-e2e-main
pull-cluster-api-test-main adb1bfc link /test pull-cluster-api-test-main
pull-cluster-api-verify adb1bfc link /test pull-cluster-api-verify

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 6, 2021
@k8s-ci-robot
Copy link
Contributor

@Arvinderpal: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@enxebre
Copy link
Member

enxebre commented Apr 19, 2021

hey @Arvinderpal are you still tackling this work?

@Arvinderpal
Copy link
Contributor Author

@enxebre Sorry this fell through the cracks. Unless there is particular urgency, I would like to come back to this after a couple of weeks.

Also on a related note: #3153 will require that we remove the use of Status.Phase. That should also be incorporated into this PR.

@enxebre
Copy link
Member

enxebre commented Apr 21, 2021

Thanks for the update @Arvinderpal.
@vincepri @fabriziopandini @CecileRobertMichon @sbueringer @JoelSpeed when is the plan to cut the v1alpha4 release?
/area api
/kind api-change

@k8s-ci-robot k8s-ci-robot added area/api Issues or PRs related to the APIs kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API labels Apr 21, 2021
@sbueringer
Copy link
Member

@enxebre I think the last time it was mentioned in the ClusterAPI Meeting we were talking about some time in June. But not sure if there are more concrete plans now.

@Arvinderpal
Copy link
Contributor Author

@vincepri @fabriziopandini @enxebre I'm closing this in favor of #4625
I will add conditions to MachineDeployment as smaller PRs. The tasks are now listed in this issue: #3486
I hope this will make the overall review process easier and faster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api Issues or PRs related to the APIs cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants