Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 topology controller should avoid unnecessary rollouts during upgrades #8628

Merged

Conversation

ykakarap
Copy link
Contributor

@ykakarap ykakarap commented May 10, 2023

What this PR does / why we need it:

This PR modified the topology controller so that it modified pushing any changes to ControlPlane/MachineDeployment if it is pending an upgrade.

If a ControlPlane or MD is pending an upgrade it implies that it will be rollouts after it picks up the upgrade. Therefore it is better to hold out other changes, potentially causing rollouts, to these objects as well in the mean time to avoid unnecessary machine churn.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #8656
Fixes #8695

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 10, 2023
@ykakarap ykakarap changed the title ⚠️ [DO_NOT_REVIEW][EXPIREMENT] topology controller should avoid double rollouts during upgrades ⚠️ [DO_NOT_REVIEW][EXPERIMENT] topology controller should avoid double rollouts during upgrades May 10, 2023
@ykakarap ykakarap changed the title ⚠️ [DO_NOT_REVIEW][EXPERIMENT] topology controller should avoid double rollouts during upgrades ⚠️ [DO_NOT_REVIEW][EXPERIMENT] topology controller should avoid unnecessary rollouts during upgrades May 14, 2023
@ykakarap
Copy link
Contributor Author

/test pull-cluster-api-e2e-informing-dualstack-ipv6-main

@ykakarap ykakarap force-pushed the pr-double-rollout_topology branch from 14ee7f0 to 6b2f3a1 Compare May 15, 2023 17:54
@ykakarap ykakarap changed the title ⚠️ [DO_NOT_REVIEW][EXPERIMENT] topology controller should avoid unnecessary rollouts during upgrades 🐛 topology controller should avoid unnecessary rollouts during upgrades May 15, 2023
@ykakarap ykakarap force-pushed the pr-double-rollout_topology branch from b15064d to 2b9f1f7 Compare May 16, 2023 01:51
Copy link
Contributor

@killianmuldoon killianmuldoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/area clusterclass

@k8s-ci-robot k8s-ci-robot added the area/clusterclass Issues or PRs related to clusterclass label May 17, 2023
Copy link
Member

@sbueringer sbueringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. I wonder if it would be better to define "upgrade is held back" independent of the pending stuff.
Or we have to re-evaluate what pending exactly means and align to what we need.

In my opinion holding back an upgrade simply means:

  • For control plane: Cluster.topology.version != desiredCP.version
  • For MD: Cluster.topology.version != desiredMD.version

Maybe it's as easy as that?

internal/controllers/topology/cluster/conditions.go Outdated Show resolved Hide resolved
internal/controllers/topology/cluster/desired_state.go Outdated Show resolved Hide resolved
internal/controllers/topology/cluster/reconcile_state.go Outdated Show resolved Hide resolved
@ykakarap ykakarap changed the title 🐛 topology controller should avoid unnecessary rollouts during upgrades 🐛 [WIP] topology controller should avoid unnecessary rollouts during upgrades May 23, 2023
@ykakarap ykakarap force-pushed the pr-double-rollout_topology branch from 37d855a to a53b2c5 Compare May 24, 2023 07:22
Copy link
Member

@sbueringer sbueringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good progress :)

Let's pair a bit on it. I think we have some good ideas on how we can optimize this further

Copy link
Member

@sbueringer sbueringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good progress :)

Let's pair a bit on it. I think we have some good ideas on how we can optimize this further

Let's rebase on top of #8658

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 24, 2023
Copy link
Member

@fabriziopandini fabriziopandini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made a first pass while the PR is still WIP, and it looks already good
just a couple of nits from my side

internal/controllers/topology/cluster/desired_state.go Outdated Show resolved Hide resolved
@ykakarap ykakarap force-pushed the pr-double-rollout_topology branch from a53b2c5 to db2140a Compare May 30, 2023 04:25
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 30, 2023
@ykakarap
Copy link
Contributor Author

/test ?

@k8s-ci-robot
Copy link
Contributor

@ykakarap: The following commands are available to trigger required jobs:

  • /test pull-cluster-api-build-main
  • /test pull-cluster-api-e2e-main
  • /test pull-cluster-api-test-main
  • /test pull-cluster-api-test-mink8s-main
  • /test pull-cluster-api-verify-main

The following commands are available to trigger optional jobs:

  • /test pull-cluster-api-apidiff-main
  • /test pull-cluster-api-e2e-full-dualstack-ipv6-main
  • /test pull-cluster-api-e2e-full-main
  • /test pull-cluster-api-e2e-informing-ipv6-main
  • /test pull-cluster-api-e2e-informing-main
  • /test pull-cluster-api-e2e-scale-main-experimental
  • /test pull-cluster-api-e2e-workload-upgrade-1-27-latest-main

Use /test all to run the following jobs that were automatically triggered:

  • pull-cluster-api-apidiff-main
  • pull-cluster-api-build-main
  • pull-cluster-api-e2e-informing-ipv6-main
  • pull-cluster-api-e2e-informing-main
  • pull-cluster-api-e2e-main
  • pull-cluster-api-test-main
  • pull-cluster-api-test-mink8s-main
  • pull-cluster-api-verify-main

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ykakarap
Copy link
Contributor Author

/test pull-cluster-api-e2e-full-main

@ykakarap
Copy link
Contributor Author

/retest

Copy link
Member

@sbueringer sbueringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only nits, looks great!

@k8s-ci-robot k8s-ci-robot removed the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Jun 2, 2023
@ykakarap ykakarap changed the title 🐛 [WIP] topology controller should avoid unnecessary rollouts during upgrades 🐛 topology controller should avoid unnecessary rollouts during upgrades Jun 2, 2023
@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Jun 2, 2023
@ykakarap
Copy link
Contributor Author

ykakarap commented Jun 2, 2023

/test pull-cluster-api-e2e-full-main

@ykakarap ykakarap force-pushed the pr-double-rollout_topology branch from 5fb211a to 13d456d Compare June 2, 2023 05:32
@ykakarap
Copy link
Contributor Author

ykakarap commented Jun 2, 2023

/retest

@sbueringer
Copy link
Member

Thx, last nit from my side: #8628 (comment)

@fabriziopandini
Copy link
Member

pending latest comment from @sbueringer, lgtm from my side

@ykakarap ykakarap force-pushed the pr-double-rollout_topology branch from 13d456d to 0be8e6e Compare June 5, 2023 16:07
@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Jun 5, 2023

@ykakarap: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cluster-api-e2e-informing-dualstack-ipv6-main 14ee7f0 link false /test pull-cluster-api-e2e-informing-dualstack-ipv6-main
pull-cluster-api-e2e-full-dualstack-ipv6-main 6b2f3a1 link false /test pull-cluster-api-e2e-full-dualstack-ipv6-main

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@sbueringer
Copy link
Member

Thank you!

/lgtm
/approve

Let's hold until we have some idea what's going on regarding #8786 (just to keep debugging as easy as possible. This PR could introduce another potential error source)

@sbueringer
Copy link
Member

/hold

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Jun 5, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: a6c0f5bb594609962f3252e4e33d09f8d5f40933

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sbueringer

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 5, 2023
@ykakarap
Copy link
Contributor Author

ykakarap commented Jun 5, 2023

/retest

@sbueringer
Copy link
Member

Updated the status of #8786 (comment). I think we're good with merging this PR. The issue is something in KCP. So unrelated to this PR

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 9, 2023
@k8s-ci-robot k8s-ci-robot merged commit 0ba7f45 into kubernetes-sigs:main Jun 9, 2023
10 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.5 milestone Jun 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/clusterclass Issues or PRs related to clusterclass cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
5 participants