Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixes 'kfctl delete fails when no platform or minikube is specified' #2810

Merged
merged 5 commits into from
Mar 28, 2019

Conversation

kkasravi
Copy link
Contributor

@kkasravi kkasravi commented Mar 26, 2019

fixes #2809


This change is Reviewable

@kkasravi
Copy link
Contributor Author

/assign @jlewi

@kkasravi
Copy link
Contributor Author

/assign @gabrielwen

@@ -288,6 +288,50 @@ func (ksApp *ksApp) deleteGlobalResources(config *rest.Config) error {
}

func (ksApp *ksApp) Delete(resources kftypes.ResourceEnum) error {
if ksApp.Spec.Platform != kftypes.GCP {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why aren't you invoking this code on GCP? If you are deleting K8s resources I would expect that to also work on GCP.

On GCP I think we want to delete K8s resources before deleting the cluster because this triggers some cleanup.

e.g. deleting ingress should allow the GCP resources to be cleaned up; whereas if we don't delete the K8s resource and just delete the cluster they won't get cleaned up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what kind of GCP resources? currently we skip deleting resources for the following: https://github.com/kubeflow/kubeflow/blob/master/testing/test_deploy_app.py#L443-L450

currently these codes won't work if user doesn't point KUBECONFIG to the correct context.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gabriel is right, i think but i’ll verify. I’ll also add an e2e test

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this code is deleting all the K8s resources that we created as part of deploying Kubeflow.

So why wouldn't we also delete those resources in the case of GCP platform?

At worst its an unnecessary operation because we will end up deleting the cluster.

However, in some cases if we don't delete the K8s resource and trigger corresponding deletion handlers in K8s we don't properly clean up all resources.

The logic here:
https://github.com/kubeflow/kubeflow/blob/master/testing/test_deploy_app.py#L443-L450

Is indicating that certain resources like http-maps aren't being properly deleted.

We never create those resources explicitly. Those resources are created by the ingress that we create.

When we delete Kubeflow we shouldn't have to explicitly delete those resources (url-map, etc...) The fact that we do is a work around for the fact that those resources aren't being adequately garbage collected when the owning resource (GKE cluster is deleted).

I think if we explicitly delete the K8s resources as opposed to just deleting that cluster helps avoid these types of resource leaks.

Also from a design perspective, its better if we minimize the number of "if ${platform} then ...".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jlewi +1

I also realized that kfctl delete k8s would not delete the cluster, so k8 cleanup needs to occur.

On GCP I think we want to delete K8s resources before deleting the cluster because this triggers some cleanup.

I'll invert the order of operations - (package managers delete -> platforms delete)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jlewi

For kfctl delete platform - it seems like this should behave the same as kfctl delete all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on the fence; what do you recommend?

What happens if there is a bug that prevents

kfctl delete k8s from succeeding?

In that case we might want to fall back to just running kfctl delete platform.

So maybe the behavior could be

kfctl delete all - Invoke k8s delete and abort on error

kfctl delete platform - try to invoke k8s delete but continue to platform if k8s delete throws an error.

Thoughts?

@kkasravi kkasravi force-pushed the kfctl_deletefix branch 2 times, most recently from 068f530 to de0e7bd Compare March 27, 2019 04:41
if err := platform(); err != nil {
fallthrough
case kftypes.PLATFORM:
if err := k8s(); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Add a comment explaining why we want to call K8s as part of platform deletion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

Copy link
Contributor Author

@kkasravi kkasravi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 2 files reviewed, 1 unresolved discussion (waiting on @ellis-bigelow, @jlewi, and @kunmingg)


bootstrap/pkg/kfapp/coordinator/coordinator.go, line 458 at r2 (raw file):

Previously, jlewi (Jeremy Lewi) wrote…

nit: Add a comment explaining why we want to call K8s as part of platform deletion.

done


bootstrap/pkg/kfapp/ksonnet/ksonnet.go, line 291 at r1 (raw file):

Previously, jlewi (Jeremy Lewi) wrote…

I'm on the fence; what do you recommend?

What happens if there is a bug that prevents

kfctl delete k8s from succeeding?

In that case we might want to fall back to just running kfctl delete platform.

So maybe the behavior could be

kfctl delete all - Invoke k8s delete and abort on error

kfctl delete platform - try to invoke k8s delete but continue to platform if k8s delete throws an error.

Thoughts?

Ok, implemented.

@kkasravi kkasravi force-pushed the kfctl_deletefix branch 2 times, most recently from dc11592 to 89de9c3 Compare March 27, 2019 19:08
if err := platform(); err != nil {
return err
if err := k8s(); err != nil {
return fmt.Errorf("error while deleting k8 resources, aborting deleting the platform. Error %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you use new KfError interface when adding new error message?

example:
https://github.com/kubeflow/kubeflow/blob/master/bootstrap/pkg/kfapp/gcp/gcp.go#L351

Copy link
Contributor Author

@kkasravi kkasravi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 2 files reviewed, 2 unresolved discussions (waiting on @ellis-bigelow, @gabrielwen, @jlewi, and @kunmingg)


bootstrap/pkg/kfapp/coordinator/coordinator.go, line 457 at r4 (raw file):

Previously, gabrielwen (Hung-Ting Wen) wrote…

could you use new KfError interface when adding new error message?

example:
https://github.com/kubeflow/kubeflow/blob/master/bootstrap/pkg/kfapp/gcp/gcp.go#L351

sure, did you want all errors in coordinator.go converted?

Copy link
Contributor

@gabrielwen gabrielwen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 2 files reviewed, 2 unresolved discussions (waiting on @ellis-bigelow, @jlewi, @kkasravi, and @kunmingg)


bootstrap/pkg/kfapp/coordinator/coordinator.go, line 457 at r4 (raw file):

Previously, kkasravi (Kam Kasravi) wrote…

sure, did you want all errors in coordinator.go converted?

just those newly added, will make up a different PR for all errors.

Copy link
Contributor Author

@kkasravi kkasravi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 2 files reviewed, 2 unresolved discussions (waiting on @ellis-bigelow, @gabrielwen, @jlewi, and @kunmingg)


bootstrap/pkg/kfapp/coordinator/coordinator.go, line 457 at r4 (raw file):

Previously, gabrielwen (Hung-Ting Wen) wrote…

just those newly added, will make up a different PR for all errors.

ok, done. Might be nice to have a more terse form eg:

  return kfapis.Internalf("error while deleting k8 resources, aborting deleting the platform. Error %v", err)

where

func Internalf(format string, a ...interface{}) *kfapis.KfError {
	return &kfapis.KfError{
		Code:    int(kfapis.INTERNAL_ERROR),
		Message: fmt.Sprintf(format, a...),
	}
}

Copy link
Contributor

@jlewi jlewi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 2 files reviewed, 1 unresolved discussion (waiting on @ellis-bigelow, @gabrielwen, and @kunmingg)


bootstrap/pkg/kfapp/coordinator/coordinator.go, line 458 at r2 (raw file):

Previously, kkasravi (Kam Kasravi) wrote…

done

Do you need to sync? I'm not seeing the comment.yyyy

Copy link
Contributor

@jlewi jlewi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 2 files reviewed, 2 unresolved discussions (waiting on @ellis-bigelow, @gabrielwen, @kkasravi, and @kunmingg)

a discussion (no related file):
On my end this looks good to go except it looks like you might need to sync.


@kkasravi
Copy link
Contributor Author

Sorry - doing it now

Copy link
Contributor Author

@kkasravi kkasravi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 2 files reviewed, 2 unresolved discussions (waiting on @ellis-bigelow, @gabrielwen, @jlewi, and @kunmingg)

a discussion (no related file):

Previously, jlewi (Jeremy Lewi) wrote…

On my end this looks good to go except it looks like you might need to sync.

Done.



bootstrap/pkg/kfapp/coordinator/coordinator.go, line 458 at r2 (raw file):

Previously, jlewi (Jeremy Lewi) wrote…

Do you need to sync? I'm not seeing the comment.yyyy

see lines 457, 471

@jlewi
Copy link
Contributor

jlewi commented Mar 28, 2019

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jlewi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit db542c3 into kubeflow:master Mar 28, 2019
lluunn pushed a commit to lluunn/kubeflow that referenced this pull request Mar 28, 2019
…ubeflow#2810)

* call package manager, then platform in delete. Allow ksonnet to delete k8s

* fix logic around deletion of ALL and PLATFORM

* compilation error

* convert to KfError on error returns

* update on code comments, make sure all errors are returned as KfError
@kkasravi kkasravi deleted the kfctl_deletefix branch June 4, 2019 13:19
saffaalvi pushed a commit to StatCan/kubeflow that referenced this pull request Feb 11, 2021
…ubeflow#2810)

* call package manager, then platform in delete. Allow ksonnet to delete k8s

* fix logic around deletion of ALL and PLATFORM

* compilation error

* convert to KfError on error returns

* update on code comments, make sure all errors are returned as KfError
saffaalvi pushed a commit to StatCan/kubeflow that referenced this pull request Feb 12, 2021
…ubeflow#2810)

* call package manager, then platform in delete. Allow ksonnet to delete k8s

* fix logic around deletion of ALL and PLATFORM

* compilation error

* convert to KfError on error returns

* update on code comments, make sure all errors are returned as KfError
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kfctl delete fails when no platform or minikube is specified
5 participants