-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 Wait for all descendants when deleting a cluster #1650
🐛 Wait for all descendants when deleting a cluster #1650
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ncdc The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
If y'all are 👍 on this, I'll see about adding some unit tests |
controllers/cluster_controller.go
Outdated
} | ||
|
||
// empty returns true if all the lists are empty. | ||
func (c *clusterDescendants) empty() bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isEmpty?
controllers/cluster_controller.go
Outdated
} | ||
|
||
// listAllClusterDescendants returns a list of all MachineDeployments, MachineSets, and Machines for the cluster. | ||
func (r *ClusterReconciler) listAllClusterDescendants(ctx context.Context, cluster *clusterv1.Cluster) (clusterDescendants, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func (r *ClusterReconciler) listAllClusterDescendants(ctx context.Context, cluster *clusterv1.Cluster) (clusterDescendants, error) { | |
func (r *ClusterReconciler) listDescendants(ctx context.Context, cluster *clusterv1.Cluster) (clusterDescendants, error) { |
Given that this is the cluster reconciler, cluster
can probably be omitted from the name
controllers/cluster_controller.go
Outdated
return descendants, nil | ||
} | ||
|
||
func (r *ClusterReconciler) extractDirectClusterDescendants(input clusterDescendants, cluster *clusterv1.Cluster) ([]runtime.Object, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func (r *ClusterReconciler) extractDirectClusterDescendants(input clusterDescendants, cluster *clusterv1.Cluster) ([]runtime.Object, error) { | |
func (r *ClusterReconciler) filterOwnedDescendants(input clusterDescendants, cluster *clusterv1.Cluster) ([]runtime.Object, error) { |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's also add a godoc for this method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a method on clusterDescendants?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably yeah
controllers/cluster_controller.go
Outdated
logger := r.Log.WithValues("cluster", cluster.Name, "namespace", cluster.Namespace) | ||
|
||
// Split machines into control plane and worker machines so we make sure we delete control plane machines last | ||
controlPlaneMachines, machines := splitMachineList(&input.machines) | ||
|
||
var children []runtime.Object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var children []runtime.Object | |
var ownedDescendants []runtime.Object |
I'm +1, seems pretty straightforward, the review I did is mostly nits. The logic looks good |
controllers/cluster_controller.go
Outdated
return descendants, nil | ||
} | ||
|
||
func (r *ClusterReconciler) extractDirectClusterDescendants(input clusterDescendants, cluster *clusterv1.Cluster) ([]runtime.Object, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a method on clusterDescendants?
controllers/cluster_controller.go
Outdated
logger := r.Log.WithValues("cluster", cluster.Name, "namespace", cluster.Namespace) | ||
|
||
// Split machines into control plane and worker machines so we make sure we delete control plane machines last | ||
controlPlaneMachines, machines := splitMachineList(&input.machines) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe split these within clusterDescendants as well instead of here?
controllers/cluster_controller.go
Outdated
} | ||
|
||
func (r *ClusterReconciler) extractDirectClusterDescendants(input clusterDescendants, cluster *clusterv1.Cluster) ([]runtime.Object, error) { | ||
logger := r.Log.WithValues("cluster", cluster.Name, "namespace", cluster.Namespace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How much do y'all care about logging an error if we couldn't create the meta.Accessor
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine removing it
controllers/cluster_controller.go
Outdated
func (r *ClusterReconciler) listChildren(ctx context.Context, cluster *clusterv1.Cluster) ([]runtime.Object, error) { | ||
logger := r.Log.WithValues("cluster", cluster.Name, "namespace", cluster.Namespace) | ||
type clusterDescendants struct { | ||
machineDeployments clusterv1.MachineDeploymentList |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these lists be pointers or is it ok for them not to be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They don't have to be afaict
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, worse comes to worse it becomes a performance concern down the line, but that would require some really large clusters anyway.
91b48cd
to
fb02873
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One nit, other than that changes LGTM.
Seems you need to run make modules
, if that doesn't change any file locally, try using Go 1.12
controllers/cluster_controller.go
Outdated
workerMachines clusterv1.MachineList | ||
} | ||
|
||
// isEmpty returns true if all the lists are isEmpty. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// isEmpty returns true if all the lists are isEmpty. | |
// isEmpty returns true if all the lists are empty. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, my find and replace was too aggressive.
Yeah I'm working on the modules. There's no diff. Hence my new commit to figure out what it's complaining about. And I'm using 1.12.9. |
5d6c608
to
63388cc
Compare
/hold Would like @xrmzju to review |
LGTM expect a little log suggestion |
If this looks ok, I will squash. @vincepri |
go for it, lgtm |
Instead of waiting for all direct descendants to be deleted and then allowing cluster deletion to proceed (by removing the finalizer), wait for all descendants (both direct and indirect) to be removed before allowing cluster deletion to proceed. There can be race conditions where there are indirect descendants (machines belonging to a machine set) that still exist, and they need the cluster to remain so they can be deleted properly. Signed-off-by: Andy Goldstein <goldsteina@vmware.com>
485cbbc
to
a193a17
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/hold cancel
/lgtm
What this PR does / why we need it:
Instead of waiting for all direct descendants to be deleted and then
allowing cluster deletion to proceed (by removing the finalizer), wait
for all descendants (both direct and indirect) to be removed before
allowing cluster deletion to proceed. There can be race conditions where
there are indirect descendants (machines belonging to a machine set)
that still exist, and they need the cluster to remain so they can be
deleted properly.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #1643
Alternative to #1644