Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EKS] [bug]: CloudFormation Version upgrades can deadlock your stack #497

Closed
vincentheet opened this issue Sep 26, 2019 · 8 comments
Closed
Labels
EKS Amazon Elastic Kubernetes Service Proposed Community submitted issue

Comments

@vincentheet
Copy link

Tell us about your request
The current implementation of Cloudformation support for EKS by using the resource AWS::EKS::Cluster is not working as expected on the Version property.

Which service(s) is this request for?
EKS

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
Scenario 1
The default behaviour of the CloudFormation EKSCluster resource is to create a cluster with the latest available Kubernetes version if no version property is specified. When you explicitly specify the version in a later version of the CloudFormation template due to a stack update the update fails. This is due to the fact that when explicitly specifying the version it's currently on the stack fails to update with the error "No Updated To be Performed" on the EKSCluster resource. The error itself is correct but it means we are unable to lock down the version in a newer CloudFormation template.
CF Templates: https://gist.github.com/vincentheet/e826e39d0c47cdb79310866cccce2acd

Scenario 2
If you initially create a EKSCluster with the version property on 1.11 and want to update this cluster to 1.12 with a new CloudFormation template. The CloudFormation stack can come in an erroneous / deadlock state if there is another resources in the new CloudFormation template that want's to rollback the whole CF stack. When the EKSCluster resource is successfully upgraded from 1.11 to 1.12 another resource in the same CF stack fails to update then the EKSCluster tries to rollback. The rollback on the EKSCluster fails because of the error "Update failed because of Kubernetes version is required". Since this rollback is not supported by EKS the CF stack comes in an error state. When then trying to rollout a fixed / correct CF template the EKSCluster update fails because it is already updated with the error: "The following resources failed to update: EksCluster"
CF Templates: https://gist.github.com/vincentheet/f4047c3bb1461d9f05430cea1b74d681

Suggested solution
When an EKSCluster resource is being requested to update its version from CloudFormation please verify if the EKSCluster already is on the requested version. So for example, if the EKSCluster already is on 1.12 then ignore the update request and report a successful state to Cloudformation instead of an error. This will result in the fact that other resources in the same CloudFormation stack can be updated.

Are you currently working around this issue?
We removed all other CloudFormation resources we could out of our CloudFormation stack that contains AWS::EKS::Cluster resource. Just to make sure the chances of a stack coming in a deadlock state are minimized.

Additional context
Please read the comments in this thread as well: #115

@omerfsen
Copy link

This is really mandatory

@danieljamesscott
Copy link

This is pretty much the same problem that happens with ElasticSearch. You can manually upgrade the cluster, but then your CF stack is out of sync, and cannot be modified.

aws-cloudformation/cloudformation-coverage-roadmap#125

And you can also get stuck with RDS when 'AutoMinorVersionUpgrades' are enabled. If an automatic upgrade is applied, then you cannot update your CF stack until you update the version to match what is currently running.

https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-rds-database-instance.html#cfn-rds-dbinstance-autominorversionupgrade

@tabern tabern added the EKS Amazon Elastic Kubernetes Service label Sep 26, 2019
@tabern tabern added this to Researching in containers-roadmap Sep 26, 2019
@hugoprudente
Copy link

Another case that can happen is the following:

Cluster upgraded from version 1.11 to 1.12 due to the "[EKS]: Kubernetes v1.11 End of Support"

CloudFormation will Drift, so naturally we will do the following, after a drift detection operation:

 EksCluster:
    Type: AWS::EKS::Cluster
    Properties:
      Name: 'ekscluster'
      Version: '1.12'
      ...

Them a error will be returned by CloudFormation with the following message:

Update failed because of Unsupported Kubernetes minor version update from 1.12 to 1.12 (Service: AmazonEKS; Status Code: 400; Error Code: InvalidParameterException; Request ID: 375f6f63-6d85-11e9-a10b-313b75e44d30)  ekscluster

This behaviour is the same as using the update the cluster version API directly or aws-cli.

$ aws eks update-cluster-version --name cluster-13 --kubernetes-version 1.13
An error occurred (InvalidParameterException) when calling the UpdateClusterVersion operation: Unsupported Kubernetes minor version update from 1.13 to 1.13
$ echo $?
255

@whereisaaron
Copy link

eksctl-io/eksctl#778

@mikestef9
Copy link
Contributor

mikestef9 commented Mar 25, 2020

We are working on update to EKS CFN behavior to address this problem.

We are planning to perform a describe call using our describeCluster API in EKS CFN to get the actual k8s minor version. If the k8s minor version in the template is missing or is the same or lesser k8s version, we will see this as a no-op operation and send it as a success to CFN thereby not failing the update of other resources in EKS CFN stack.

This means there will be a difference in behavior between EKS CFN and the EKS API, but we think this is the simplest approach to solve the issue.

@mikestef9
Copy link
Contributor

Hey all, this issue has been resolved using the approach as described above.

containers-roadmap automation moved this from Researching to Just Shipped May 20, 2020
@mr-robot-in
Copy link

Hey all, this issue has been resolved using the approach as described above.

is it resolve?
Received response status [FAILED] from custom resource. Message returned: Unsupported Kubernetes minor version update from 1.21 to 1.20 Logs:
I have upgraded cluster and but there is some issue on nodegroup so cloudformation parents stack in UPDATE_ROLLBACK_FAILED status and awscdkawseksClusterResourceProviderNestedStackawscdkawseksClusterResource nested stack in UPDATE_ROLLBACK_COMPLETE_CLEANUP_IN_PROGRESS state

@adriantaut
Copy link

following for facing the same issue as @mr-robot-in

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
EKS Amazon Elastic Kubernetes Service Proposed Community submitted issue
Projects
Development

No branches or pull requests

9 participants