Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

⚠️ KCP: block upgrade to versions with old registry, improve registry handling #7856

Merged
merged 1 commit into from
Jan 9, 2023

Conversation

sbueringer
Copy link
Member

@sbueringer sbueringer commented Jan 5, 2023

Signed-off-by: Stefan Büringer buringerst@vmware.com

What this PR does / why we need it:

This PR results in the following change of behavior:

  1. newly triggered upgrades to versions with the old registry will be blocked by the KCP webhook
    • Only if imageRepository is not set in KCP and to versions >= v1.22.0 that also have a newer kubeadm patch version with the new registry.
    • Goal is to steer folks, that are leaving the imageRepository defaulting to us, towards the new registry
  2. pre-existing clusters with versions using the old registry will work again

Because 1. only blocks cases that were broken with v1.3.0 and v1.2.8 I guess this is technically not a breaking change.

Notes:

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Part of #7833

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jan 5, 2023
@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Jan 5, 2023
@sbueringer
Copy link
Member Author

/cherry-pick release-1.3

@k8s-infra-cherrypick-robot

@sbueringer: once the present PR merges, I will cherry-pick it on top of release-1.3 in a new PR and assign it to you.

In response to this:

/cherry-pick release-1.3

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sbueringer
Copy link
Member Author

/cherry-pick release-1.2

@k8s-infra-cherrypick-robot

@sbueringer: once the present PR merges, I will cherry-pick it on top of release-1.2 in a new PR and assign it to you.

In response to this:

/cherry-pick release-1.2

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sbueringer sbueringer force-pushed the pr-kcp-registry-fix branch 2 times, most recently from 5a6c699 to 0ec1648 Compare January 5, 2023 14:37
@sbueringer sbueringer changed the title ⚠️ KCP: block upgrade to versions with old registry, improve registry ⚠️ KCP: block upgrade to versions with old registry, improve registry handling Jan 5, 2023
@sbueringer
Copy link
Member Author

/test pull-cluster-api-e2e-full-main
/test pull-cluster-api-e2e-workload-upgrade-1-25-latest-main

@sbueringer
Copy link
Member Author

sbueringer commented Jan 5, 2023

@CecileRobertMichon @killianmuldoon @fabriziopandini @ykakarap Please take a look when you have some time.

I missed the most important part yesterday in the office hours unfortunately. Which is that if the user delegates the management of the registry to us / kubeadm (by not setting imageRepository) we will now block upgrades to versions which would use the old registry.

Those are the exact cases that are failing with v1.3.0 & >= v1.2.8.

Additionally this PR improves KCP to be able to handle clusters if they already have one of those versions.

I personally have no pressure to get it out but it might be nice to get this PR out with the patch releases next Tuesday (to avoid more users running into the registry issue), WDYT?

@sbueringer
Copy link
Member Author

cc @kubernetes-sigs/cluster-api-release-team

Copy link
Contributor

@ykakarap ykakarap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Would be nice to get the fix in before the next patch releases.
Hopefully we can get merged on Monday so that we have enough signal before we cut the releases on Tuesday.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 9, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 79ff466df1b2fefdf27eaaba5c0979243d4989d7

@ykakarap
Copy link
Contributor

ykakarap commented Jan 9, 2023

How about a similar validation in Cluster topology? With this fix, if a user who is using a managed Cluster updates the version to an undesired version, the change will pass cluster topology validation but the topology reconciler will keep failing because KCP rejects this change.
It might be tricky to get this validaiton in ClusterTopology as the control plane need not be KCP. Therefore the validation should only be applied if the control plane is KCP and if the target version if an undesirable version.

We need to find a balance between having good UX for the user and how much of KCP validations should be bring into Cluster topology.

@sbueringer
Copy link
Member Author

How about a similar validation in Cluster topology? With this fix, if a user who is using a managed Cluster updates the version to an undesired version, the change will pass cluster topology validation but the topology reconciler will keep failing because KCP rejects this change.
It might be tricky to get this validaiton in ClusterTopology as the control plane need not be KCP. Therefore the validation should only be applied if the control plane is KCP and if the target version if an undesirable version.

To be honest, I wouldn't bring any of this validation into Cluster topology.

The validation is essentially:

  • // Block if imageRepository is not set (i.e. the default registry should be used),
  • // the version changed (i.e. we have an upgrade),
  • // the version is >= v1.22.0 and < v1.26.0
  • // and the default registry of the new Kubernetes/kubeadm version is the old default registry.

This means we have to get all this information in the Cluster webhook, most notably we have to:

  • check if the control plane is a KubeadmControlPlane
  • figure out if imageRepository is not set (this means we have to run through calculating desired state including patches, ...)

I think we can and have to live with the UX that the reconciliation will fail. It's better than nothing and for all new versions the validation is not needed anyway (i.e. >= v1.22.17, >= v1.23.15, >= v1.24.9, >= v1.25.0)

We also have so much more validation in our KCP, MD, ... webhooks that is not covered by our Cluster webhook. The only way to adress this generically is to run the entire reconcile with SSA dry-run calls in the webhook, but I don't think we want to do that.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 9, 2023
handling

Signed-off-by: Stefan Büringer buringerst@vmware.com
@sbueringer
Copy link
Member Author

Squashed

@sbueringer
Copy link
Member Author

/cherry-pick release-1.3

@k8s-infra-cherrypick-robot

@sbueringer: once the present PR merges, I will cherry-pick it on top of release-1.3 in a new PR and assign it to you.

In response to this:

/cherry-pick release-1.3

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sbueringer
Copy link
Member Author

/cherry-pick release-1.2

@k8s-infra-cherrypick-robot

@sbueringer: once the present PR merges, I will cherry-pick it on top of release-1.2 in a new PR and assign it to you.

In response to this:

/cherry-pick release-1.2

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@fabriziopandini
Copy link
Member

/lgtm
/approve

Thanks again for the great work of research behind this PR

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 9, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: b6501fb74ae9a15d5923f9dbf6ad316d412fe3a9

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fabriziopandini

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 9, 2023
@k8s-ci-robot k8s-ci-robot merged commit 6c5ee02 into kubernetes-sigs:main Jan 9, 2023
@k8s-ci-robot k8s-ci-robot added this to the v1.4 milestone Jan 9, 2023
@k8s-infra-cherrypick-robot

@sbueringer: new pull request created: #7871

In response to this:

/cherry-pick release-1.3

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-infra-cherrypick-robot

@sbueringer: #7856 failed to apply on top of branch "release-1.2":

Applying: KCP: block upgrade to versions with old registry, improve registry handling
Using index info to reconstruct a base tree...
M	bootstrap/kubeadm/api/v1beta1/kubeadm_types.go
M	bootstrap/kubeadm/config/crd/bases/bootstrap.cluster.x-k8s.io_kubeadmconfigs.yaml
M	bootstrap/kubeadm/config/crd/bases/bootstrap.cluster.x-k8s.io_kubeadmconfigtemplates.yaml
M	controlplane/kubeadm/api/v1beta1/kubeadm_control_plane_types.go
M	controlplane/kubeadm/api/v1beta1/kubeadm_control_plane_webhook.go
M	controlplane/kubeadm/api/v1beta1/kubeadm_control_plane_webhook_test.go
M	controlplane/kubeadm/config/crd/bases/controlplane.cluster.x-k8s.io_kubeadmcontrolplanes.yaml
M	controlplane/kubeadm/config/crd/bases/controlplane.cluster.x-k8s.io_kubeadmcontrolplanetemplates.yaml
M	controlplane/kubeadm/internal/workload_cluster.go
M	controlplane/kubeadm/internal/workload_cluster_coredns.go
M	controlplane/kubeadm/internal/workload_cluster_coredns_test.go
M	test/framework/daemonset_helpers.go
M	test/infrastructure/docker/internal/docker/machine.go
Falling back to patching base and 3-way merge...
Auto-merging test/infrastructure/docker/internal/docker/machine.go
CONFLICT (content): Merge conflict in test/infrastructure/docker/internal/docker/machine.go
Auto-merging test/framework/daemonset_helpers.go
Auto-merging controlplane/kubeadm/internal/workload_cluster_coredns_test.go
Auto-merging controlplane/kubeadm/internal/workload_cluster_coredns.go
Auto-merging controlplane/kubeadm/internal/workload_cluster.go
CONFLICT (content): Merge conflict in controlplane/kubeadm/internal/workload_cluster.go
Auto-merging controlplane/kubeadm/config/crd/bases/controlplane.cluster.x-k8s.io_kubeadmcontrolplanetemplates.yaml
Auto-merging controlplane/kubeadm/config/crd/bases/controlplane.cluster.x-k8s.io_kubeadmcontrolplanes.yaml
Auto-merging controlplane/kubeadm/api/v1beta1/kubeadm_control_plane_webhook_test.go
Auto-merging controlplane/kubeadm/api/v1beta1/kubeadm_control_plane_webhook.go
Auto-merging controlplane/kubeadm/api/v1beta1/kubeadm_control_plane_types.go
Auto-merging bootstrap/kubeadm/config/crd/bases/bootstrap.cluster.x-k8s.io_kubeadmconfigtemplates.yaml
Auto-merging bootstrap/kubeadm/config/crd/bases/bootstrap.cluster.x-k8s.io_kubeadmconfigs.yaml
Auto-merging bootstrap/kubeadm/api/v1beta1/kubeadm_types.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 KCP: block upgrade to versions with old registry, improve registry handling
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

In response to this:

/cherry-pick release-1.2

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants