-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fixes 'kfctl delete fails when no platform or minikube is specified' #2810
Conversation
/assign @jlewi |
/assign @gabrielwen |
@@ -288,6 +288,50 @@ func (ksApp *ksApp) deleteGlobalResources(config *rest.Config) error { | |||
} | |||
|
|||
func (ksApp *ksApp) Delete(resources kftypes.ResourceEnum) error { | |||
if ksApp.Spec.Platform != kftypes.GCP { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why aren't you invoking this code on GCP? If you are deleting K8s resources I would expect that to also work on GCP.
On GCP I think we want to delete K8s resources before deleting the cluster because this triggers some cleanup.
e.g. deleting ingress should allow the GCP resources to be cleaned up; whereas if we don't delete the K8s resource and just delete the cluster they won't get cleaned up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what kind of GCP resources? currently we skip deleting resources for the following: https://github.com/kubeflow/kubeflow/blob/master/testing/test_deploy_app.py#L443-L450
currently these codes won't work if user doesn't point KUBECONFIG to the correct context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gabriel is right, i think but i’ll verify. I’ll also add an e2e test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this code is deleting all the K8s resources that we created as part of deploying Kubeflow.
So why wouldn't we also delete those resources in the case of GCP platform?
At worst its an unnecessary operation because we will end up deleting the cluster.
However, in some cases if we don't delete the K8s resource and trigger corresponding deletion handlers in K8s we don't properly clean up all resources.
The logic here:
https://github.com/kubeflow/kubeflow/blob/master/testing/test_deploy_app.py#L443-L450
Is indicating that certain resources like http-maps aren't being properly deleted.
We never create those resources explicitly. Those resources are created by the ingress that we create.
When we delete Kubeflow we shouldn't have to explicitly delete those resources (url-map, etc...) The fact that we do is a work around for the fact that those resources aren't being adequately garbage collected when the owning resource (GKE cluster is deleted).
I think if we explicitly delete the K8s resources as opposed to just deleting that cluster helps avoid these types of resource leaks.
Also from a design perspective, its better if we minimize the number of "if ${platform} then ...".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jlewi +1
I also realized that kfctl delete k8s
would not delete the cluster, so k8 cleanup needs to occur.
On GCP I think we want to delete K8s resources before deleting the cluster because this triggers some cleanup.
I'll invert the order of operations - (package managers delete -> platforms delete)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For kfctl delete platform
- it seems like this should behave the same as kfctl delete all
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm on the fence; what do you recommend?
What happens if there is a bug that prevents
kfctl delete k8s
from succeeding?
In that case we might want to fall back to just running kfctl delete platform
.
So maybe the behavior could be
kfctl delete all
- Invoke k8s delete and abort on error
kfctl delete platform
- try to invoke k8s delete but continue to platform if k8s delete throws an error.
Thoughts?
068f530
to
de0e7bd
Compare
if err := platform(); err != nil { | ||
fallthrough | ||
case kftypes.PLATFORM: | ||
if err := k8s(); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Add a comment explaining why we want to call K8s as part of platform deletion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 2 files reviewed, 1 unresolved discussion (waiting on @ellis-bigelow, @jlewi, and @kunmingg)
bootstrap/pkg/kfapp/coordinator/coordinator.go, line 458 at r2 (raw file):
Previously, jlewi (Jeremy Lewi) wrote…
nit: Add a comment explaining why we want to call K8s as part of platform deletion.
done
bootstrap/pkg/kfapp/ksonnet/ksonnet.go, line 291 at r1 (raw file):
Previously, jlewi (Jeremy Lewi) wrote…
I'm on the fence; what do you recommend?
What happens if there is a bug that prevents
kfctl delete k8s
from succeeding?In that case we might want to fall back to just running
kfctl delete platform
.So maybe the behavior could be
kfctl delete all
- Invoke k8s delete and abort on error
kfctl delete platform
- try to invoke k8s delete but continue to platform if k8s delete throws an error.Thoughts?
Ok, implemented.
dc11592
to
89de9c3
Compare
if err := platform(); err != nil { | ||
return err | ||
if err := k8s(); err != nil { | ||
return fmt.Errorf("error while deleting k8 resources, aborting deleting the platform. Error %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you use new KfError
interface when adding new error message?
example:
https://github.com/kubeflow/kubeflow/blob/master/bootstrap/pkg/kfapp/gcp/gcp.go#L351
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 2 files reviewed, 2 unresolved discussions (waiting on @ellis-bigelow, @gabrielwen, @jlewi, and @kunmingg)
bootstrap/pkg/kfapp/coordinator/coordinator.go, line 457 at r4 (raw file):
Previously, gabrielwen (Hung-Ting Wen) wrote…
could you use new
KfError
interface when adding new error message?example:
https://github.com/kubeflow/kubeflow/blob/master/bootstrap/pkg/kfapp/gcp/gcp.go#L351
sure, did you want all errors in coordinator.go converted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 2 files reviewed, 2 unresolved discussions (waiting on @ellis-bigelow, @jlewi, @kkasravi, and @kunmingg)
bootstrap/pkg/kfapp/coordinator/coordinator.go, line 457 at r4 (raw file):
Previously, kkasravi (Kam Kasravi) wrote…
sure, did you want all errors in coordinator.go converted?
just those newly added, will make up a different PR for all errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 2 files reviewed, 2 unresolved discussions (waiting on @ellis-bigelow, @gabrielwen, @jlewi, and @kunmingg)
bootstrap/pkg/kfapp/coordinator/coordinator.go, line 457 at r4 (raw file):
Previously, gabrielwen (Hung-Ting Wen) wrote…
just those newly added, will make up a different PR for all errors.
ok, done. Might be nice to have a more terse form eg:
return kfapis.Internalf("error while deleting k8 resources, aborting deleting the platform. Error %v", err)
where
func Internalf(format string, a ...interface{}) *kfapis.KfError {
return &kfapis.KfError{
Code: int(kfapis.INTERNAL_ERROR),
Message: fmt.Sprintf(format, a...),
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 2 files reviewed, 1 unresolved discussion (waiting on @ellis-bigelow, @gabrielwen, and @kunmingg)
bootstrap/pkg/kfapp/coordinator/coordinator.go, line 458 at r2 (raw file):
Previously, kkasravi (Kam Kasravi) wrote…
done
Do you need to sync? I'm not seeing the comment.yyyy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 2 files reviewed, 2 unresolved discussions (waiting on @ellis-bigelow, @gabrielwen, @kkasravi, and @kunmingg)
a discussion (no related file):
On my end this looks good to go except it looks like you might need to sync.
Sorry - doing it now |
6b753ff
to
6d76542
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 2 files reviewed, 2 unresolved discussions (waiting on @ellis-bigelow, @gabrielwen, @jlewi, and @kunmingg)
a discussion (no related file):
Previously, jlewi (Jeremy Lewi) wrote…
On my end this looks good to go except it looks like you might need to sync.
Done.
bootstrap/pkg/kfapp/coordinator/coordinator.go, line 458 at r2 (raw file):
Previously, jlewi (Jeremy Lewi) wrote…
Do you need to sync? I'm not seeing the comment.yyyy
see lines 457, 471
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jlewi The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
…ubeflow#2810) * call package manager, then platform in delete. Allow ksonnet to delete k8s * fix logic around deletion of ALL and PLATFORM * compilation error * convert to KfError on error returns * update on code comments, make sure all errors are returned as KfError
…ubeflow#2810) * call package manager, then platform in delete. Allow ksonnet to delete k8s * fix logic around deletion of ALL and PLATFORM * compilation error * convert to KfError on error returns * update on code comments, make sure all errors are returned as KfError
…ubeflow#2810) * call package manager, then platform in delete. Allow ksonnet to delete k8s * fix logic around deletion of ALL and PLATFORM * compilation error * convert to KfError on error returns * update on code comments, make sure all errors are returned as KfError
fixes #2809
This change is