fix(eks): k8s resources accidentally deleted due to logical ID change #12053

eladb · 2020-12-13T14:36:23Z

The KubernetesManifest construct used kubectl apply for both CREATE and UPDATE operations. This means that if, for example, two manifests had resources with the same k8s name (metadata.name), the second manifest created will not fail, but rather override the resource definition.

As a consequence, if the logical ID of a KubernetesManifest resource was changed (without a change in the physical name), CFN would perform a replacement process which involves a CREATE of the new resource and then a DELETE of the old one. Since the CREATE operation was implemented through apply, it succeeded (with no-op) but then the DELETE operation would delete the resource. The result is that the resource was deleted.

The solution is to use kubectl create --save-config instead of kubectl apply for CREATE operations. This yields the desired CREATE semantics (dah!).

Now, if a KubernetesManifest resource is defined with a K8S object name that already exists, the CREATE operation will fail as expected. The logical ID change scenario (resource replacement), would also issue a CREATE operation first which will fail.

To change logical IDs of KubernetesManifest resources, users will have to either delete the old resource or change its physical name.

Since this is quite hard to test (due to multi-phase deployments and failure modes), this was tested manually:

Defined a manifest with logical name X1 and physical name Y1 -> CREATE was issued
Changed logical name to X2 (physical remains Y1) -> update failed because CFN issues a CREATE operation first ([aws-eks] Replacement of KubernetesManifest resources results in a deletion, without creation #10397)
Changed also the physical name to Y2 -> deploy succeeded, new resource created, old resource pruned.

This fixes #10397

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

The `KubernetesManifest` construct used `kubectl apply` for both CREATE and UPDATE operations. This means that if, for example, two manifests had resources with the same k8s name (`metadata.name`), the second manifest created will not fail, but rather override the resource definition. As a consequence, if the logical ID of a `KubernetesManifest` resource was changed (without a change in the physical name), CFN would perform a replacement process which involves a CREATE of the new resource and then a DELETE of the old one. Since the CREATE operation was implemented through `apply`, it succeeded (with no-op) but then the DELETE operation would delete the resource. The result is that the resource was deleted. The solution is to use `kubectl create --save-config` instead of `kubectl apply` for CREATE operations. This yields the desired CREATE semantics (dah!). Now, if a `KubernetesManifest` resource is defined with a K8S object name that already exists, the CREATE operation will fail as expected. The logical ID change scenario (resource replacement), would also issue a CREATE operation first which will fail. To change logical IDs of `KubernetesManifest` resources, users will have to either delete the old resource or change its physical name. Since this is quite hard to test (due to multi-phase deployments and failure modes), this was tested manually: 1. Defined a manifest with logical name X1 and physical name Y1 -> CREATE was issued 2. Changed logical name to X2 (physical remains Y1) -> update failed because CFN issues a CREATE operation first (#10397) 3. Changed also the physical name to Y2 -> deploy succeeded, new resource created, old resource pruned. This fixes #10397

gitpod-io · 2020-12-13T14:36:27Z

aws-cdk-automation · 2020-12-13T15:06:35Z

AWS CodeBuild CI Report

CodeBuild project: AutoBuildProject6AEA49D1-qxepHUsryhcu
Commit ID: e3ea1c1
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

mergify · 2020-12-13T19:57:20Z

Thank you for contributing! Your pull request will be updated from master and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

The change in #12053 introduced a regression which causes failures in creating new clusters. Since we changed the KubernetesManifest resource to use `kubectl create` in CREATE operations, the attempt to create the `aws-auth` config map is failing because this config map is already created by the cluster. This change adds an `override` to `KubernetesManifest` which will cause CREATE to be performed using `apply` instead, which practically allows overriding/adopting existing K8s resources.

…12068) The change in #12053 introduced a regression which causes failures in creating new clusters. Since we changed the KubernetesManifest resource to use `kubectl create` in CREATE operations, the attempt to create the `aws-auth` config map is failing because this config map is already created by the cluster. This change adds an `override` to `KubernetesManifest` which will cause CREATE to be performed using `apply` instead, which practically allows overriding/adopting existing K8s resources. ---- *By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*

…aws#12053) The `KubernetesManifest` construct used `kubectl apply` for both CREATE and UPDATE operations. This means that if, for example, two manifests had resources with the same k8s name (`metadata.name`), the second manifest created will not fail, but rather override the resource definition. As a consequence, if the logical ID of a `KubernetesManifest` resource was changed (without a change in the physical name), CFN would perform a replacement process which involves a CREATE of the new resource and then a DELETE of the old one. Since the CREATE operation was implemented through `apply`, it succeeded (with no-op) but then the DELETE operation would delete the resource. The result is that the resource was deleted. The solution is to use `kubectl create --save-config` instead of `kubectl apply` for CREATE operations. This yields the desired CREATE semantics (dah!). Now, if a `KubernetesManifest` resource is defined with a K8S object name that already exists, the CREATE operation will fail as expected. The logical ID change scenario (resource replacement), would also issue a CREATE operation first which will fail. To change logical IDs of `KubernetesManifest` resources, users will have to either delete the old resource or change its physical name. Since this is quite hard to test (due to multi-phase deployments and failure modes), this was tested manually: 1. Defined a manifest with logical name X1 and physical name Y1 -> CREATE was issued 2. Changed logical name to X2 (physical remains Y1) -> update failed because CFN issues a CREATE operation first (aws#10397) 3. Changed also the physical name to Y2 -> deploy succeeded, new resource created, old resource pruned. This fixes aws#10397 ---- *By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*

…ws#12068) The change in aws#12053 introduced a regression which causes failures in creating new clusters. Since we changed the KubernetesManifest resource to use `kubectl create` in CREATE operations, the attempt to create the `aws-auth` config map is failing because this config map is already created by the cluster. This change adds an `override` to `KubernetesManifest` which will cause CREATE to be performed using `apply` instead, which practically allows overriding/adopting existing K8s resources. ---- *By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*

eladb added the pr-linter/exempt-test The PR linter will not require test changes label Dec 13, 2020

github-actions bot assigned iliapolo Dec 13, 2020

github-actions bot added the @aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service label Dec 13, 2020

eladb requested a review from iliapolo December 13, 2020 14:36

mergify bot added the contribution/core This is a PR that came from AWS. label Dec 13, 2020

iliapolo approved these changes Dec 13, 2020

View reviewed changes

mergify bot merged commit 019852e into master Dec 13, 2020

mergify bot deleted the benisrae/eks-k8s-create branch December 13, 2020 19:57

eladb mentioned this pull request Dec 14, 2020

fix(eks): failure to deploy cluster since aws-auth configmap exists #12068

Merged

iliapolo mentioned this pull request Dec 14, 2020

feat(eks): option to disable manifest validation #12012

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(eks): k8s resources accidentally deleted due to logical ID change #12053

fix(eks): k8s resources accidentally deleted due to logical ID change #12053

eladb commented Dec 13, 2020

gitpod-io bot commented Dec 13, 2020

aws-cdk-automation commented Dec 13, 2020

mergify bot commented Dec 13, 2020

fix(eks): k8s resources accidentally deleted due to logical ID change #12053

fix(eks): k8s resources accidentally deleted due to logical ID change #12053

Conversation

eladb commented Dec 13, 2020

gitpod-io bot commented Dec 13, 2020

aws-cdk-automation commented Dec 13, 2020

AWS CodeBuild CI Report

mergify bot commented Dec 13, 2020