Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🌱 Log reason for MachineDeployment rollouts / MachineSet creations #10688

Merged
merged 1 commit into from
May 30, 2024

Conversation

sbueringer
Copy link
Member

Signed-off-by: Stefan Büringer buringerst@vmware.com

What this PR does / why we need it:
Follow-up of this PR #10628 for Machinedeployments

Initial rollout:

I0528 09:14:11.164279      17 machinedeployment_sync.go:175] "Creating new MachineSet, no MachineSets exist for the MachineDeployment" controller="machinedeployment" controllerGroup="cluster.x-k8s.io" controllerKind="MachineDeployment" MachineDeployment="default/capi-quickstart-md-0" namespace="default" name="capi-quickstart-md-0" reconcileID="acc1a765-03f6-4e2d-a815-148efc8e5b36" Cluster="default/capi-quickstart" MachineSet="default/capi-quickstart-md-0-fqhwx"

Rollout because existing MachineSets don't match

I0528 09:14:42.812951      17 machinedeployment_sync.go:175] "Creating new MachineSet, couldn't find MachineSet matching MachineDeployment spec template: MachineSet capi-quickstart-md-0-fqhwx: diff: &v1beta1.MachineTemplateSpec{\n    ObjectMeta: {},\n    Spec: v1beta1.MachineSpec{\n      ClusterName:       \"capi-quickstart\",\n      Bootstrap:         {ConfigRef: &{Kind: \"KubeadmConfigTemplate\", Namespace: \"default\", Name: \"capi-quickstart-md-0\", APIVersion: \"bootstrap.cluster.x-k8s.io\", ...}},\n      InfrastructureRef: {Kind: \"DockerMachineTemplate\", Namespace: \"default\", Name: \"capi-quickstart-md-0\", APIVersion: \"infrastructure.cluster.x-k8s.io\", ...},\n-     Version:           &\"v1.25.0\",\n+     Version:           &\"v1.26.0\",\n      ProviderID:        nil,\n      FailureDomain:     nil,\n      ... // 3 identical fields\n    },\n  }" controller="machinedeployment" controllerGroup="cluster.x-k8s.io" controllerKind="MachineDeployment" MachineDeployment="default/capi-quickstart-md-0" namespace="default" name="capi-quickstart-md-0" reconcileID="7d8b34ad-1777-4f0b-a8df-edec28eae702" Cluster="default/capi-quickstart" MachineSet="default/capi-quickstart-md-0-b4w2w"

Message properly formatted: (only \n replaced with new line)

Creating new MachineSet, couldn't find MachineSet matching MachineDeployment spec template: MachineSet capi-quickstart-md-0-fqhwx: diff: &v1beta1.MachineTemplateSpec{
    ObjectMeta: {},
    Spec: v1beta1.MachineSpec{
      ClusterName:       \"capi-quickstart\",
      Bootstrap:         {ConfigRef: &{Kind: \"KubeadmConfigTemplate\", Namespace: \"default\", Name: \"capi-quickstart-md-0\", APIVersion: \"bootstrap.cluster.x-k8s.io\", ...}},
      InfrastructureRef: {Kind: \"DockerMachineTemplate\", Namespace: \"default\", Name: \"capi-quickstart-md-0\", APIVersion: \"infrastructure.cluster.x-k8s.io\", ...},
-     Version:           &\"v1.25.0\",
+     Version:           &\"v1.26.0\",
      ProviderID:        nil,
      FailureDomain:     nil,
      ... // 3 identical fields
    },
  }

Rollout because of RolloutAfter

I0528 09:17:20.864224      17 machinedeployment_sync.go:175] "Creating new MachineSet, RolloutAfter on MachineDeployment set to 2024-05-28T09:15:00Z, no MachineSet has been created afterwards" controller="machinedeployment" controllerGroup="cluster.x-k8s.io" controllerKind="MachineDeployment" MachineDeployment="default/capi-quickstart-md-0" namespace="default" name="capi-quickstart-md-0" reconcileID="4418d85d-8524-4a02-b4f5-e8083ab19056" Cluster="default/capi-quickstart" MachineSet="default/capi-quickstart-md-0-bd78w"

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Related to #8186

Signed-off-by: Stefan Büringer buringerst@vmware.com
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-area PR is missing an area label size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 28, 2024
@sbueringer
Copy link
Member Author

/assign @fabriziopandini @chrischdi @vincepri

@sbueringer sbueringer added the area/machinedeployment Issues or PRs related to machinedeployments label May 28, 2024
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/needs-area PR is missing an area label label May 28, 2024
return false

// If RolloutAfter is not set, pick the first matching MachineSet.
if deployment.Spec.RolloutAfter == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to clarify, for this three cases (451, 456, 464) we could return early within the if equal { check if we wanted?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup

if ms == nil {
return false
if len(matchingMachineSets) == 0 {
return nil, fmt.Sprintf("couldn't find MachineSet matching MachineDeployment spec template: %s", strings.Join(diffs, ",")), nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did you choose to pass this around instead of logging right here?

Copy link
Member Author

@sbueringer sbueringer May 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this func is used in multiple places. I wanted to make sure we log this decision/reason only at the point where we actually create a new MachineSet

Copy link
Member Author

@sbueringer sbueringer May 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering this one: #10688 (review)

Logging it only when we create a new MachineSet means that we can do it on log level 0.

If we would log it every time we run through this code we have to move it somewhere towards trace level, because it would be very very verbose otherwise. But of course on log levels >=3 it's a lot less useful because nobody will see it per default.

Copy link
Member

@enxebre enxebre May 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, the reasoning and outcome makes sense to me. The implementation seems to me like we have some debt tangling in these functions. May be have this be an error type that can be checked, that wouldn't help much though...
Anyways we can always follow up and iterate to keep untangling. Thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree overall. I think an error might be not the best fit, because it's not really an error, it's the reasoning why we made a certain decision.

@sbueringer
Copy link
Member Author

/test pull-cluster-api-e2e-conformance-ci-latest-main
/test pull-cluster-api-e2e-conformance-main
/test pull-cluster-api-e2e-main
/test pull-cluster-api-e2e-mink8s-main
/test pull-cluster-api-e2e-upgrade-1-30-1-31-main

@sbueringer
Copy link
Member Author

/test pull-cluster-api-e2e-main

Copy link
Member

@fabriziopandini fabriziopandini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

The only concern is that in case of diffs we log the difference between current and all the existing MS, which can become verbose.
But considering we keep revision history to 1 by default, the trade-off between improved observability and verbosity makes sense to me.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 29, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 44bc72a6743fb71125f06e7d338051f876f44c31

@sbueringer
Copy link
Member Author

sbueringer commented May 29, 2024

/lgtm

The only concern is that in case of diffs we log the difference between current and all the existing MS, which can become verbose. But considering we keep revision history to 1 by default, the trade-off between improved observability and verbosity makes sense to me.

Yup that's a good point. I was considering only logging "n" MachineSets with the highest revisions. But then I thought:

  • revisionHistoryLimit is 1 per default
  • revision history will be dropped
  • we only log this when we create a new MachineSet (which should not be very often)

@enxebre
Copy link
Member

enxebre commented May 29, 2024

/lgtm

Copy link
Member

@vincepri vincepri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vincepri

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 30, 2024
@k8s-ci-robot k8s-ci-robot merged commit 2d5b9ae into kubernetes-sigs:main May 30, 2024
28 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.8 milestone May 30, 2024
@sbueringer sbueringer deleted the pr-md-rollout-log branch May 31, 2024 09:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/machinedeployment Issues or PRs related to machinedeployments cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants