Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clusterctl move takes ownership of all fields #7473

Closed
sbueringer opened this issue Oct 31, 2022 · 8 comments · Fixed by #7504
Closed

clusterctl move takes ownership of all fields #7473

sbueringer opened this issue Oct 31, 2022 · 8 comments · Fixed by #7504
Assignees
Labels
area/clusterctl Issues or PRs related to clusterctl kind/bug Categorizes issue or PR as related to a bug. kind/release-blocking Issues or PRs that need to be closed before the next CAPI release triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@sbueringer
Copy link
Member

sbueringer commented Oct 31, 2022

As the title says when using clusterctl move, clusterctl owns all fields after the move.

Steps to reproduce

  • Create a mgmt cluster (Cluster A)
  • Deploy a workload cluster (Cluster B)
  • Make the workload cluster (Cluster B) a mgmt cluster
  • clusterctl move from Cluster A to Cluster B
$ kubectl --kubeconfig self-hosted.kubeconfig get cluster docker-cluster-self-hosted --show-managed-fields -o yaml
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  managedFields:
  - apiVersion: cluster.x-k8s.io/v1beta1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:labels:
          f:cluster.x-k8s.io/cluster-name: {}
          f:topology.cluster.x-k8s.io/owned: {}
      f:spec:
        f:controlPlaneRef: {}
        f:infrastructureRef: {}
    manager: capi-topology
    operation: Apply
    time: "2022-10-31T18:03:17Z"
  - apiVersion: cluster.x-k8s.io/v1beta1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .: {}
          f:kubectl.kubernetes.io/last-applied-configuration: {}
        f:finalizers:
          .: {}
          v:"cluster.cluster.x-k8s.io": {}
        f:labels:
          .: {}
          f:cluster.x-k8s.io/cluster-name: {}
          f:topology.cluster.x-k8s.io/owned: {}
      f:spec:
        .: {}
        f:clusterNetwork:
          .: {}
          f:pods:
            .: {}
            f:cidrBlocks: {}
          f:serviceDomain: {}
          f:services:
            .: {}
            f:cidrBlocks: {}
        f:controlPlaneEndpoint:
          .: {}
          f:host: {}
          f:port: {}
        f:controlPlaneRef: {}
        f:infrastructureRef: {}
        f:paused: {}
...
    manager: clusterctl
    operation: Update
    time: "2022-10-31T18:03:16Z"
  name: docker-cluster-self-hosted

The consequence of that issue is that

  • From now on kubectl apply or other clients (e.g. the Cluster topology controller) won't be able to remove fields without explicitly taking ownership first by setting fields to a different value.
  • Similarly kubectl apply --server-side=true won't be able to change field values except if --force-conflicts is used.

I think we have to fix this. My suggestion is to modify clusterctl move so that the managed fields are not changed during the move.

/kind bug
[One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels]

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Oct 31, 2022
@sbueringer sbueringer added release-blocking and removed kind/bug Categorizes issue or PR as related to a bug. labels Oct 31, 2022
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Oct 31, 2022
@sbueringer
Copy link
Member Author

/triage accepted

/area clusterctl

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. area/clusterctl Issues or PRs related to clusterctl and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 31, 2022
@sbueringer
Copy link
Member Author

@sbueringer sbueringer added the kind/bug Categorizes issue or PR as related to a bug. label Oct 31, 2022
@ykakarap
Copy link
Contributor

ykakarap commented Nov 2, 2022

/assign

I can work on this. I will sync with @fabriziopandini on this.

@sbueringer sbueringer added this to the v1.3 milestone Nov 3, 2022
@sbueringer
Copy link
Member Author

sbueringer commented Nov 3, 2022

Yup please sync with Fabrizio. He also wanted to talk to Vince about it

Added to the milestone for now so we can track it, but basically TBD.

@sbueringer
Copy link
Member Author

fyi. Some prior art on how to write managed fields:

// patchTopologyManagedFields patches the managed fields of obj if parts of it are owned by the topology controller.
// This is necessary to ensure the managed fields created by the topology controller are still present and thus to
// prevent unnecessary machine rollouts. Without patching the managed fields, clusterctl would be the owner of the fields
// which would lead to co-ownership and also additional machine rollouts.
func patchTopologyManagedFields(ctx context.Context, oldManagedFields []metav1.ManagedFieldsEntry, obj *unstructured.Unstructured, cTo client.Client) error {
var containsTopologyManagedFields bool
// Check if the object was owned by the topology controller.
for _, m := range oldManagedFields {
if m.Operation == metav1.ManagedFieldsOperationApply &&
m.Manager == structuredmerge.TopologyManagerName &&
m.Subresource == "" {
containsTopologyManagedFields = true
break
}
}
// Return early if the object was not owned by the topology controller.
if !containsTopologyManagedFields {
return nil
}
base := obj.DeepCopy()
obj.SetManagedFields(oldManagedFields)
if err := cTo.Patch(ctx, obj, client.MergeFrom(base)); err != nil {
return errors.Wrapf(err, "error patching managed fields %q %s/%s",
obj.GroupVersionKind(), obj.GetNamespace(), obj.GetName())
}
return nil
}

(we dropped this code later on again)

@sbueringer
Copy link
Member Author

sbueringer commented Nov 3, 2022

@chrischdi Do you remember why we dropped that code?

// patchTopologyManagedFields patches the managed fields of obj if parts of it are owned by the topology controller.
// This is necessary to ensure the managed fields created by the topology controller are still present and thus to
// prevent unnecessary machine rollouts. Without patching the managed fields, clusterctl would be the owner of the fields
// which would lead to co-ownership and also additional machine rollouts.

(xref: #6710)

I guess because we maybe didn't need it for this anymore?

  • prevent unnecessary machine rollouts

But I think this is still the case:

  • Without patching the managed fields, clusterctl would be the owner of the fields which would lead to co-ownership

Would be just good to know so we don't add it again and then drop it again because we missed something.

Independent of that, I think the new implementation should just preserve all managed fields 100% for all resources so that a clusterctl move doesn't change the managed fields at all.

@chrischdi
Copy link
Member

If I'm right:

  • We did need it when we had the POC which did parse and read the managedFields
  • With using dry-run SSA now, this is not required anymore and if I'm right: changes for managedFields only (ownership) are not considered as change which results in a template rollover.

@fabriziopandini
Copy link
Member

Given this is something that impact everyone who does bootstrap pivot I think that we should implement a minimal fix in clusterctl (~same of the example linked above), and cherry-pick it to the 1.2 branch

then, while working at SSA for label/annotation propagation we can eventually consider if to add a more sophisticated fix.

@sbueringer sbueringer added kind/release-blocking Issues or PRs that need to be closed before the next CAPI release and removed release-blocking labels Feb 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/clusterctl Issues or PRs related to clusterctl kind/bug Categorizes issue or PR as related to a bug. kind/release-blocking Issues or PRs that need to be closed before the next CAPI release triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
5 participants