Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Wait for all descendants when deleting a cluster #1650

Conversation

ncdc
Copy link
Contributor

@ncdc ncdc commented Oct 24, 2019

What this PR does / why we need it:

Instead of waiting for all direct descendants to be deleted and then
allowing cluster deletion to proceed (by removing the finalizer), wait
for all descendants (both direct and indirect) to be removed before
allowing cluster deletion to proceed. There can be race conditions where
there are indirect descendants (machines belonging to a machine set)
that still exist, and they need the cluster to remain so they can be
deleted properly.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #1643

Alternative to #1644

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Oct 24, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ncdc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 24, 2019
@ncdc
Copy link
Contributor Author

ncdc commented Oct 24, 2019

If y'all are 👍 on this, I'll see about adding some unit tests

}

// empty returns true if all the lists are empty.
func (c *clusterDescendants) empty() bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isEmpty?

}

// listAllClusterDescendants returns a list of all MachineDeployments, MachineSets, and Machines for the cluster.
func (r *ClusterReconciler) listAllClusterDescendants(ctx context.Context, cluster *clusterv1.Cluster) (clusterDescendants, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func (r *ClusterReconciler) listAllClusterDescendants(ctx context.Context, cluster *clusterv1.Cluster) (clusterDescendants, error) {
func (r *ClusterReconciler) listDescendants(ctx context.Context, cluster *clusterv1.Cluster) (clusterDescendants, error) {

Given that this is the cluster reconciler, cluster can probably be omitted from the name

return descendants, nil
}

func (r *ClusterReconciler) extractDirectClusterDescendants(input clusterDescendants, cluster *clusterv1.Cluster) ([]runtime.Object, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func (r *ClusterReconciler) extractDirectClusterDescendants(input clusterDescendants, cluster *clusterv1.Cluster) ([]runtime.Object, error) {
func (r *ClusterReconciler) filterOwnedDescendants(input clusterDescendants, cluster *clusterv1.Cluster) ([]runtime.Object, error) {

?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also add a godoc for this method

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a method on clusterDescendants?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably yeah

logger := r.Log.WithValues("cluster", cluster.Name, "namespace", cluster.Namespace)

// Split machines into control plane and worker machines so we make sure we delete control plane machines last
controlPlaneMachines, machines := splitMachineList(&input.machines)

var children []runtime.Object
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
var children []runtime.Object
var ownedDescendants []runtime.Object

@ncdc
Copy link
Contributor Author

ncdc commented Oct 24, 2019

WDYT of the overall approach? @detiber @liztio @vincepri @xrmzju?

@vincepri
Copy link
Member

I'm +1, seems pretty straightforward, the review I did is mostly nits. The logic looks good

return descendants, nil
}

func (r *ClusterReconciler) extractDirectClusterDescendants(input clusterDescendants, cluster *clusterv1.Cluster) ([]runtime.Object, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a method on clusterDescendants?

logger := r.Log.WithValues("cluster", cluster.Name, "namespace", cluster.Namespace)

// Split machines into control plane and worker machines so we make sure we delete control plane machines last
controlPlaneMachines, machines := splitMachineList(&input.machines)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe split these within clusterDescendants as well instead of here?

}

func (r *ClusterReconciler) extractDirectClusterDescendants(input clusterDescendants, cluster *clusterv1.Cluster) ([]runtime.Object, error) {
logger := r.Log.WithValues("cluster", cluster.Name, "namespace", cluster.Namespace)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much do y'all care about logging an error if we couldn't create the meta.Accessor?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine removing it

func (r *ClusterReconciler) listChildren(ctx context.Context, cluster *clusterv1.Cluster) ([]runtime.Object, error) {
logger := r.Log.WithValues("cluster", cluster.Name, "namespace", cluster.Namespace)
type clusterDescendants struct {
machineDeployments clusterv1.MachineDeploymentList
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these lists be pointers or is it ok for them not to be?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They don't have to be afaict

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, worse comes to worse it becomes a performance concern down the line, but that would require some really large clusters anyway.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Oct 24, 2019
@ncdc ncdc force-pushed the delete-cluster-wait-for-all-descendants branch from 91b48cd to fb02873 Compare October 25, 2019 14:08
@ncdc ncdc changed the title [WIP] 🐛 Wait for all descendants when deleting a cluster 🐛 Wait for all descendants when deleting a cluster Oct 25, 2019
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 25, 2019
@ncdc ncdc added this to the v0.3.0 milestone Oct 25, 2019
Copy link
Member

@vincepri vincepri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nit, other than that changes LGTM.

Seems you need to run make modules, if that doesn't change any file locally, try using Go 1.12

workerMachines clusterv1.MachineList
}

// isEmpty returns true if all the lists are isEmpty.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// isEmpty returns true if all the lists are isEmpty.
// isEmpty returns true if all the lists are empty.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, my find and replace was too aggressive.

@ncdc
Copy link
Contributor Author

ncdc commented Oct 25, 2019

Yeah I'm working on the modules. There's no diff. Hence my new commit to figure out what it's complaining about. And I'm using 1.12.9.

@ncdc ncdc force-pushed the delete-cluster-wait-for-all-descendants branch 3 times, most recently from 5d6c608 to 63388cc Compare October 25, 2019 15:24
@ncdc
Copy link
Contributor Author

ncdc commented Oct 25, 2019

/hold

Would like @xrmzju to review

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 25, 2019
@xrmzju
Copy link
Member

xrmzju commented Oct 27, 2019

LGTM expect a little log suggestion

controllers/cluster_controller.go Outdated Show resolved Hide resolved
@ncdc
Copy link
Contributor Author

ncdc commented Oct 28, 2019

If this looks ok, I will squash. @vincepri

@vincepri
Copy link
Member

go for it, lgtm

Instead of waiting for all direct descendants to be deleted and then
allowing cluster deletion to proceed (by removing the finalizer), wait
for all descendants (both direct and indirect) to be removed before
allowing cluster deletion to proceed. There can be race conditions where
there are indirect descendants (machines belonging to a machine set)
that still exist, and they need the cluster to remain so they can be
deleted properly.

Signed-off-by: Andy Goldstein <goldsteina@vmware.com>
@ncdc ncdc force-pushed the delete-cluster-wait-for-all-descendants branch from 485cbbc to a193a17 Compare October 28, 2019 14:43
Copy link
Member

@vincepri vincepri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/hold cancel
/lgtm

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Oct 28, 2019
@k8s-ci-robot k8s-ci-robot merged commit 3a9008d into kubernetes-sigs:master Oct 28, 2019
@ncdc ncdc deleted the delete-cluster-wait-for-all-descendants branch December 3, 2019 20:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

machine remains undeleted while cluster has been deleted
5 participants