Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(eks): k8s resources accidentally deleted due to logical ID change #12053

Merged
merged 1 commit into from
Dec 13, 2020

Conversation

eladb
Copy link
Contributor

@eladb eladb commented Dec 13, 2020

The KubernetesManifest construct used kubectl apply for both CREATE and UPDATE operations. This means that if, for example, two manifests had resources with the same k8s name (metadata.name), the second manifest created will not fail, but rather override the resource definition.

As a consequence, if the logical ID of a KubernetesManifest resource was changed (without a change in the physical name), CFN would perform a replacement process which involves a CREATE of the new resource and then a DELETE of the old one. Since the CREATE operation was implemented through apply, it succeeded (with no-op) but then the DELETE operation would delete the resource. The result is that the resource was deleted.

The solution is to use kubectl create --save-config instead of kubectl apply for CREATE operations. This yields the desired CREATE semantics (dah!).

Now, if a KubernetesManifest resource is defined with a K8S object name that already exists, the CREATE operation will fail as expected. The logical ID change scenario (resource replacement), would also issue a CREATE operation first which will fail.

To change logical IDs of KubernetesManifest resources, users will have to either delete the old resource or change its physical name.

Since this is quite hard to test (due to multi-phase deployments and failure modes), this was tested manually:

  1. Defined a manifest with logical name X1 and physical name Y1 -> CREATE was issued
  2. Changed logical name to X2 (physical remains Y1) -> update failed because CFN issues a CREATE operation first ([aws-eks] Replacement of KubernetesManifest resources results in a deletion, without creation  #10397)
  3. Changed also the physical name to Y2 -> deploy succeeded, new resource created, old resource pruned.

This fixes #10397


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

The `KubernetesManifest` construct used `kubectl apply` for both CREATE and UPDATE operations. This means that if, for example, two manifests had resources with the same k8s name (`metadata.name`), the second manifest created will not fail, but rather override the resource definition.

As a consequence, if the logical ID of a `KubernetesManifest` resource was changed (without a change in the physical name), CFN would perform a replacement process which involves a CREATE of the new resource and then a DELETE of the old one. Since the CREATE operation was implemented through `apply`, it succeeded (with no-op) but then the DELETE operation would delete the resource. The result is that the resource was deleted.

The solution is to use `kubectl create --save-config` instead of `kubectl apply` for CREATE operations. This yields the desired CREATE semantics (dah!).

Now, if a `KubernetesManifest` resource is defined with a K8S object name that already exists, the CREATE operation will fail as expected. The logical ID change scenario (resource replacement), would also issue a CREATE operation first which will fail.

To change logical IDs of `KubernetesManifest` resources, users will have to either delete the old resource or change its physical name.

Since this is quite hard to test (due to multi-phase deployments and failure modes), this was tested manually:

1. Defined a manifest with logical name X1 and physical name Y1 -> CREATE was issued
2. Changed logical name to X2 (physical remains Y1) -> update failed because CFN issues a CREATE operation first (#10397)
3. Changed also the physical name to Y2 -> deploy succeeded, new resource created, old resource pruned.

This fixes #10397
@gitpod-io
Copy link

gitpod-io bot commented Dec 13, 2020

@eladb eladb added the pr-linter/exempt-test The PR linter will not require test changes label Dec 13, 2020
@github-actions github-actions bot added the @aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service label Dec 13, 2020
@eladb eladb requested a review from iliapolo December 13, 2020 14:36
@mergify mergify bot added the contribution/core This is a PR that came from AWS. label Dec 13, 2020
@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildProject6AEA49D1-qxepHUsryhcu
  • Commit ID: e3ea1c1
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@mergify
Copy link
Contributor

mergify bot commented Dec 13, 2020

Thank you for contributing! Your pull request will be updated from master and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@mergify mergify bot merged commit 019852e into master Dec 13, 2020
@mergify mergify bot deleted the benisrae/eks-k8s-create branch December 13, 2020 19:57
eladb pushed a commit that referenced this pull request Dec 14, 2020
The change in #12053 introduced a regression which causes failures in creating new clusters. Since we changed the KubernetesManifest resource to use `kubectl create` in CREATE operations, the attempt to create the `aws-auth` config map is failing because this config map is already created by the cluster.

This change adds an `override` to `KubernetesManifest` which will cause CREATE to be performed using `apply` instead, which practically allows overriding/adopting existing K8s resources.
mergify bot pushed a commit that referenced this pull request Dec 14, 2020
…12068)

The change in #12053 introduced a regression which causes failures in creating new clusters. Since we changed the KubernetesManifest resource to use `kubectl create` in CREATE operations, the attempt to create the `aws-auth` config map is failing because this config map is already created by the cluster.

This change adds an `override` to `KubernetesManifest` which will cause CREATE to be performed using `apply` instead, which practically allows overriding/adopting existing K8s resources.


----



*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
flochaz pushed a commit to flochaz/aws-cdk that referenced this pull request Jan 5, 2021
…aws#12053)

The `KubernetesManifest` construct used `kubectl apply` for both CREATE and UPDATE operations. This means that if, for example, two manifests had resources with the same k8s name (`metadata.name`), the second manifest created will not fail, but rather override the resource definition.

As a consequence, if the logical ID of a `KubernetesManifest` resource was changed (without a change in the physical name), CFN would perform a replacement process which involves a CREATE of the new resource and then a DELETE of the old one. Since the CREATE operation was implemented through `apply`, it succeeded (with no-op) but then the DELETE operation would delete the resource. The result is that the resource was deleted.

The solution is to use `kubectl create --save-config` instead of `kubectl apply` for CREATE operations. This yields the desired CREATE semantics (dah!).

Now, if a `KubernetesManifest` resource is defined with a K8S object name that already exists, the CREATE operation will fail as expected. The logical ID change scenario (resource replacement), would also issue a CREATE operation first which will fail.

To change logical IDs of `KubernetesManifest` resources, users will have to either delete the old resource or change its physical name.

Since this is quite hard to test (due to multi-phase deployments and failure modes), this was tested manually:

1. Defined a manifest with logical name X1 and physical name Y1 -> CREATE was issued
2. Changed logical name to X2 (physical remains Y1) -> update failed because CFN issues a CREATE operation first (aws#10397)
3. Changed also the physical name to Y2 -> deploy succeeded, new resource created, old resource pruned.

This fixes aws#10397


----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
flochaz pushed a commit to flochaz/aws-cdk that referenced this pull request Jan 5, 2021
…ws#12068)

The change in aws#12053 introduced a regression which causes failures in creating new clusters. Since we changed the KubernetesManifest resource to use `kubectl create` in CREATE operations, the attempt to create the `aws-auth` config map is failing because this config map is already created by the cluster.

This change adds an `override` to `KubernetesManifest` which will cause CREATE to be performed using `apply` instead, which practically allows overriding/adopting existing K8s resources.


----



*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service contribution/core This is a PR that came from AWS. pr-linter/exempt-test The PR linter will not require test changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[aws-eks] Replacement of KubernetesManifest resources results in a deletion, without creation
3 participants