Implement clusterctl delete cluster #406

spew · 2018-06-26T22:49:00Z

What this PR does / why we need it:
This PR implements the command clusterctl delete cluster.

Release note:

Add a `delete cluster` command to `clusterctl`.

@kubernetes/kube-deploy-reviewers

k8s-ci-robot · 2018-06-26T22:49:03Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: spew

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [spew]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

spew · 2018-06-26T22:55:31Z

/assign @roberthbailey @k4leung4 @mkjelland

roberthbailey · 2018-06-27T04:29:43Z

clusterctl/clusterdeployer/clusterdeployer.go

+	defer closeClient(externalClient, "external")
+
+	glog.Info("Applying Cluster API stack to external cluster")
+	err = d.applyClusterAPIStack(externalClient)


if err := d.apply...(); err != nil { ... }

Fixed all instances of this -- I need to burn this pattern into my brain :P Also forgot that you had moved the Create(...) method to this style in another PR.

roberthbailey · 2018-06-27T04:29:56Z

clusterctl/clusterdeployer/clusterdeployer.go

+	}
+
+	glog.Info("Deleting Cluster API Provider Components from internal cluster")
+	err = internalClient.Delete(d.providerComponents)


if err := ...; err != nil { ... }

roberthbailey · 2018-06-27T04:30:02Z

clusterctl/clusterdeployer/clusterdeployer.go

+	}
+
+	glog.Info("Copying objects from internal cluster to external cluster")
+	err = pivot(internalClient, externalClient)


if err := ...; err != nil { ... }

roberthbailey · 2018-06-27T04:30:18Z

clusterctl/clusterdeployer/clusterdeployer.go

+	}
+
+	glog.Info("Deleting objects from external cluster")
+	err = deleteObjects(externalClient)


if err := ...; err != nil { ... }

roberthbailey · 2018-06-27T04:30:38Z

clusterctl/clusterdeployer/clusterdeployer.go

 	if err != nil {
 		return nil, fmt.Errorf("unable to get internal cluster kubeconfig: %v", err)
 	}

-	err = d.writeKubeconfig(internalKubeconfig)
+	err = d.writeKubeconfig(internalKubeconfig, kubeconfigOutput)


if err := ...; err != nil { ... }

roberthbailey · 2018-06-27T04:32:37Z

clusterctl/clusterdeployer/clusterdeployer.go

+		errors = append(errors, err.Error())
+	}
+	glog.Infof("Deleting machine sets")
+	err = client.DeleteMachineSetObjects()


if err := ...; err != nil { ... }

roberthbailey · 2018-06-27T04:32:55Z

clusterctl/clusterdeployer/clusterdeployer.go

+func deleteObjects(client ClusterClient) error {
+	var errors []string
+	glog.Infof("Deleting machine deployments")
+	err := client.DeleteMachineDeploymentObjects()


if err := ...; err != nil { ... }

roberthbailey · 2018-06-27T04:33:20Z

clusterctl/clusterdeployer/clusterdeployer.go

+		errors = append(errors, err.Error())
+	}
+	glog.Infof("Deleting clusters")
+	err = client.DeleteClusterObjects()


if err := ...; err != nil { ... }

roberthbailey · 2018-06-27T04:33:31Z

clusterctl/clusterdeployer/clusterdeployer.go

@@ -433,3 +528,10 @@ func containsMasterRole(roles []clustercommon.MachineRole) bool {
 	}
 	return false
 }
+
+func closeClient(client ClusterClient, name string) {
+	err := client.Close()


if err := ...; err != nil { ... }

roberthbailey · 2018-06-27T04:39:54Z

clusterctl/cmd/create_cluster.go

 		co.CleanupExternalCluster)
-	return d.Create(c, m, pcsFactory)
+	err = d.Create(c, m, pd, co.KubeconfigOutput, pcsFactory)


if you aren't going to log this error, just do return d.Create(...) like it was before.

Whoops, not sure why I did this.

roberthbailey · 2018-06-27T04:41:04Z

I added a first round of comments, but don't block merging on me.

k4leung4 · 2018-06-27T15:36:34Z

lgtm

karan · 2018-06-27T17:17:20Z

clusterctl/clusterdeployer/clusterdeployer.go

+
+func (d *ClusterDeployer) Delete(internalClient ClusterClient) error {
+	glog.Info("Creating external cluster")
+	externalClient, cleanupExternalCluster, err := d.createExternalCluster()


What happens if the external cluster already exists?

I believe this will fail -- much like the CreateCluster command. However, I'll test this manually to make sure we understand the behavior and report back in this comment.

I just tried this and this is the behavior:

$ minikube start Starting local Kubernetes v1.10.0 cluster... Starting VM... Getting VM IP address... Moving files into cluster... Setting up certs... Connecting to cluster... Setting up kubeconfig... Starting cluster components... Kubectl is now configured to use the cluster. Loading cached images from config file. $ clusterctl delete cluster -p provider-components.yaml I0627 14:17:22.206083 51664 clusterdeployer.go:194] Creating external cluster F0627 14:27:56.001433 51664 delete_cluster.go:47] could not create external cluster: could not create external control plane: error running command 'minikube start --bootstrapper=kubeadm': exit status 1

Given that we do nothing special in kubectl create cluster if there is an existing minikube cluster I'd prefer to punt on the problems as it is an existing issue and not a change in the architecture.

Can we skip the external cluster creation if the minikube cluster already exists (pretty much the minikube status result). Not needed in this PR, but an issue+TODO would do.

I created a less prescriptive issue -- I think there are some other possible solutions. Here it is: #413

karan · 2018-06-27T17:18:40Z

clusterctl/clusterdeployer/clusterdeployer.go

+
+	glog.Info("Deleting Cluster API Provider Components from internal cluster")
+	if err = internalClient.Delete(d.providerComponents); err != nil {
+		glog.Infof("Error while shutting down provider components on internal cluster: %v", err)


nit: "shutting down" and deleting are different things, no?

I don't think so, the underlying action in the clusterclient for this method is a kubectl delete which according to this doc (thanks for the link btw!) attempts a graceful shutdown of the pods:

https://kubernetes.io/docs/concepts/workloads/pods/pod/#termination-of-pods

I could change the log to be more accurate but a bit less user friendly as follows:

Error while executing kubectl delete on provider components on internal cluster: %v

What do you think?

What about "error while removing provider components from internal cluster: %v"

Will update with the new messaging.

karan · 2018-06-27T17:26:02Z

clusterctl/clusterdeployer/clusterdeployer.go

+
+	glog.Info("Deleting objects from external cluster")
+	if err = deleteObjects(externalClient); err != nil {
+		return fmt.Errorf("unable to finish deleting objects in external cluster, resources may have been leaked: %v", err)


this is a pretty scary log line. Can you provide some guidance here as to what resources might have leaked?

It's hard to build a generalized line of text here because the underlying resources are sort of provider specific. For example, in GCP when the cluster deletion fails it means that you may have 'leaked' a firewall rule. However, that would likely not be true for a vsphere implementation.

Do you have any ideas on how to improve the messaging?

I don't have a good solution right now, but perhaps we could make a generic statement ("VMs, firewall rules, LB rules etc") for now?

I'm not sure that is actually more clear because the resources in question are extremely specific to a given provider, for example, firewall rule is a GCP concept, in AWS there is "security groups", VMs might be safe, but in AWS land those would be called instances. When talking about load balancers, within AWS there are at least 3 different types of load balancers with different APIs, etc, (application, network, and classic).

I see two options:

Add another method to the ProviderDeployer interface. This is the interface that is specific to the provider / cloud. This method could be something like this GetDeleteMachineErrorMessage(...) string and there would be one for Clusters, Machines, MachineSets, and MachineDeployments. This method would return a provider specific message of the kinds of resources that can be leaked.

Change the message to this, substituting 'google' or 'aws' or the like for the provider name:

fmt.Errorf("unable to finish deleting objects in external cluster, the associated resources specific to the %v provider may have been leaked: %v", providerName, err)

Do you think we should implement either of these or leave things how they are?

karan · 2018-06-27T17:26:52Z

clusterctl/clusterdeployer/clusterdeployer.go

+		errors = append(errors, err.Error())
+	}
+	glog.Infof("Deleting machines")
+	if err := client.DeleteMachineObjects(); err != nil {


Is this safe considering some machines will be deleted by deletion of MachineSets?

It's safe in the sense that we are simply calling the 'delete all' method in the cluster API. It is up to the cluster API to properly handle machines that already have a deletion in progress due to the fact that their parent object has been deleted.

It's still not great because attempting to delete a machine twice (or three times) could spit a couple of error logs which would be misleading unless the user knows exactly how this code works, and exactly how MachineSets work. I think here, we should delete all Machines that are not in a MachineSet.

There is no actual deletion going on here, there is just a call to the cluster-api "delete all" methods, i.e. delete all machines, delete all machinesets, delete all machine deployments. Those are asynchronous methods that don't actually do the deletion -- the deletion itself is eventually reconciled by the controllers. IN this case, the underlying deletion for machines & machine sets is done by the machine controller.

I'm not convinced there actually is a problem or that the cluster API would do the wrong thing. Rather than put a lot of complicated deletion logic in clusterctl it seems better to have the controllers do the right thing.

I synced with @k4leung4 and he confirmed that there is not an issue here, there are a couple scenarios / points that we discussed:

Deleting existing Machines?: DeleteCollection(...) (the actual delete methods being used) do not return errors when there are already objects that are in the process of being deleted (i.e. marked as being deleted). We are not passing in any machines we are simply call ing a "delete all" method.

Concurrent Deletes?: We are calling the DeleteCollection(...) method and passing in the propogation policy of DeletePropogationForeground. This means the delete call will block until all child objects (i.e. machines that belong to a machine set are deleted. This comes from the comments for DeletePropagationForeground. I think what that actually means is they are marked for deleted in etcd. I copied the doc below.

Deleting Machines before MachineSets: Even if we did things in the wrong order and called DeleteCollection on Machines, and that method actually deleted machines associated with a MachineSet (I've not yet drilled into whether that would even happen) all that would occur is the controller would attempt to recreate the machine and then when the DeleteCollection was called on the machine sets the machines would be deleted again.

Here is the DeletePropagationForeground doc for reference:

// The object exists in the key-value store until the garbage collector // deletes all the dependents whose ownerReference.blockOwnerDeletion=true // from the key-value store. API sever will put the "foregroundDeletion" // finalizer on the object, and sets its deletionTimestamp. This policy is // cascading, i.e., the dependents will be deleted with Foreground. DeletePropagationForeground DeletionPropagation = "Foreground"

spew · 2018-07-02T16:53:14Z

Updated the PR with a new error message as per comment above

roberthbailey · 2018-07-17T22:33:35Z

/lgtm

…-fix default vmfolder set for cloudconfig

k8s-ci-robot requested review from krousey and roberthbailey June 26, 2018 22:49

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 26, 2018

k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 26, 2018

spew force-pushed the clusterctl-delete branch 2 times, most recently from 27bfd42 to 05b9e2c Compare June 26, 2018 22:54

k8s-ci-robot assigned k4leung4, mkjelland and roberthbailey Jun 26, 2018

roberthbailey reviewed Jun 27, 2018

View reviewed changes

spew force-pushed the clusterctl-delete branch from 05b9e2c to f03dbfb Compare June 27, 2018 14:23

karan suggested changes Jun 27, 2018

View reviewed changes

spew mentioned this pull request Jun 29, 2018

Solution for clusterctl commands when minikube already exists #413

Closed

spew force-pushed the clusterctl-delete branch from f03dbfb to af0595b Compare July 2, 2018 16:52

Implement clusterctl delete cluster

c16ea0c

spew force-pushed the clusterctl-delete branch from af0595b to c16ea0c Compare July 2, 2018 16:53

roberthbailey assigned karan and unassigned roberthbailey Jul 13, 2018

k8s-ci-robot assigned roberthbailey Jul 17, 2018

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 17, 2018

k8s-ci-robot merged commit babe910 into kubernetes-sigs:master Jul 17, 2018

roberthbailey mentioned this pull request Jul 18, 2018

implement clusterctl delete #301

Closed

ingvagabund mentioned this pull request Aug 1, 2018

[WIP] Create delete basic integration test #410

Closed

jayunit100 pushed a commit to jayunit100/cluster-api that referenced this pull request Jan 31, 2020

Merge pull request kubernetes-sigs#406 from ykakarap/default-vmfolder…

8561694

…-fix default vmfolder set for cloudconfig

Implement clusterctl delete cluster #406

Implement clusterctl delete cluster #406

Conversation

spew commented Jun 26, 2018

k8s-ci-robot commented Jun 26, 2018

spew commented Jun 26, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

roberthbailey commented Jun 27, 2018

k4leung4 commented Jun 27, 2018

Choose a reason for hiding this comment

spew Jun 27, 2018 • edited Loading

Choose a reason for hiding this comment

spew Jun 27, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

spew Jun 29, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

spew Jun 29, 2018 • edited Loading

Choose a reason for hiding this comment

spew Jun 29, 2018 • edited Loading

Choose a reason for hiding this comment

spew commented Jul 2, 2018

roberthbailey commented Jul 17, 2018

spew Jun 27, 2018 •

edited

Loading

spew Jun 27, 2018 •

edited

Loading

spew Jun 29, 2018 •

edited

Loading

spew Jun 29, 2018 •

edited

Loading

spew Jun 29, 2018 •

edited

Loading