Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 [WIP] Update MachineSet and MachineDeployment ownerRef versions on reconcile #7509

Conversation

killianmuldoon
Copy link
Contributor

@killianmuldoon killianmuldoon commented Nov 7, 2022

Signed-off-by: killianmuldoon kmuldoon@vmware.com

Currently if the apiVersion of an ownerReference doesn't match any version that is served by the Kubernetes apiServer garbage collection of resources based on ownerReferences fails. Ref: kubernetes/kubernetes#96650

When Cluster API stops serving deprecated APIVersions - i.e. v1alpha3 and v1alpha4 when they're removed - we want to ensure garbage collection continues to work as expected. This change ensures that these ownerReferences are kept up to date on each reconcile for a subset of resources.

Specifically this changes ensures updates for the apiVersion in OwnerReferences of:

  • Machines owned by MachineSets
  • MachineSets owned by MachineDeployments
  • MachineDeployments owned by Clusters

Fixes #7224

Signed-off-by: killianmuldoon <kmuldoon@vmware.com>
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 7, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign cecilerobertmichon for approval by writing /assign @cecilerobertmichon in a comment. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 7, 2022
@srm09
Copy link
Contributor

srm09 commented Nov 7, 2022

This is an interesting one, and I think there might be some cases of this in the CAPV codebase. The gist here is we should always update the OwnerRefs to use the latest API version served by the api server.

@sbueringer
Copy link
Member

This is an interesting one, and I think there might be some cases of this in the CAPV codebase. The gist here is we should always update the OwnerRefs to use the latest API version served by the api server.

Yup exactly!

In general every controller which sets ownerReferences in a CRD is responsible to keep them up-to-date. Otherwise there's a chance garbage collection is broken when apiVersions are removed.

@killianmuldoon Would you mind surfacing this general topic in the office hours and pointing folks to the issue? Just that everyone has a chance to be aware of it.

if metav1.GetControllerOf(ms).APIVersion != md.APIVersion {
err := r.updateControllerRef(ctx, md, ms)
if err != nil {
log.Error(err, "Failed to update Machine controller reference.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should return the error in this case.

In general this request should work anyway, right?

@@ -165,9 +165,9 @@ func (r *Reconciler) Reconcile(ctx context.Context, req ctrl.Request) (_ ctrl.Re
return result, err
}

func patchMachineDeployment(ctx context.Context, patchHelper *patch.Helper, d *clusterv1.MachineDeployment, options ...patch.Option) error {
func patchMachineDeployment(ctx context.Context, patchHelper *patch.Helper, md *clusterv1.MachineDeployment, options ...patch.Option) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would highly recommend to split out a general m=>md rename into a separate PR next time.

Just helps to de-obfuscate the actual changes in this PR and keeps everything nicely separated

(just resolve for this PR)

if metav1.GetControllerOf(machine).APIVersion != machineSet.APIVersion {
err := r.updateControllerReference(ctx, machineSet, machine)
if err != nil {
log.Error(err, "Failed to update Machine controller reference.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also return the error here

@@ -278,6 +279,15 @@ func (r *Reconciler) reconcile(ctx context.Context, cluster *clusterv1.Cluster,
r.recorder.Eventf(machineSet, corev1.EventTypeNormal, "SuccessfulAdopt", "Adopted Machine %q", machine.Name)
}

// Ensure that the version of the controllerReference is up-to-date.
if metav1.GetControllerOf(machine).APIVersion != machineSet.APIVersion {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are sure the machineSet.APIVersion is set? (afaik per default it's not in CR / client-go)

I think we should use the same source of truth as updateControllerReference (which is machineSetKind)

continue
}

// Ensure that the version of the controllerReference is up-to-date.
if metav1.GetControllerOf(ms).APIVersion != md.APIVersion {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are sure the md.APIVersion is set? (afaik per default it's not in CR / client-go)

I think we should use the same source of truth as updateControllerReference (which is machineDeploymentKind)

// adoptOrphan sets the MachineSet as a controller OwnerReference to the Machine.
func (r *Reconciler) adoptOrphan(ctx context.Context, machineSet *clusterv1.MachineSet, machine *clusterv1.Machine) error {
// updateControllerReference sets the MachineSet as a controller OwnerReference to the Machine.
func (r *Reconciler) updateControllerReference(ctx context.Context, machineSet *clusterv1.MachineSet, machine *clusterv1.Machine) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: let's be consistent between MD and MS and pick either updateControllerRef or updateControllerReference

@@ -437,6 +437,20 @@ func HasOwner(refList []metav1.OwnerReference, apiVersion string, kinds []string
return false
}

// HasOwnerWithSameVersion checks if any of the references in the passed list match the given group from apiVersion and one of the given kinds.
func HasOwnerWithSameVersion(refList []metav1.OwnerReference, apiVersion string, kinds []string) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's relatively trivial but let's ~ copy&paste the unit tests of HasOwner

@sbueringer
Copy link
Member

Overall I really like the simplicity of the change, nice work!

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 18, 2022
@k8s-ci-robot
Copy link
Contributor

@killianmuldoon: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sbueringer
Copy link
Member

@killianmuldoon

Q: How does this PR relate now to #7575?

Is #7575 fixing all kinds of ownerRef reconcile issues including that the version is correct and the current PR is then fixing the version issue for MS/MD?

@killianmuldoon
Copy link
Contributor Author

killianmuldoon commented Nov 24, 2022

Q: How does this PR relate now to #7575?

Should be considered entirely unrelated.

@fabriziopandini fabriziopandini added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 11, 2023
Copy link
Contributor Author

@killianmuldoon killianmuldoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/close

@k8s-ci-robot
Copy link
Contributor

@killianmuldoon: Closed this PR.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CAPI controllers should keep ownerRefs on the current apiVersion
5 participants