Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eks: renaming the cluster would trigger rollback due to not authorized to delete the cluster #29282

Closed
pahud opened this issue Feb 27, 2024 · 6 comments · Fixed by #29283
Closed
Assignees
Labels
@aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service bug This issue is a bug. effort/medium Medium work item – several days of effort p1

Comments

@pahud
Copy link
Contributor

pahud commented Feb 27, 2024

Describe the bug

Given the code:

 const cluster = new eks.Cluster(this, 'demo-eks-cluster', {
      vpc,
      clusterName: 'foo',
      defaultCapacity: 0,
      version: eks.KubernetesVersion.V1_29,
      kubectlLayer: new KubectlLayer(this, 'kubectlLayer'),
    });

If we rename the clusterName, it would trigger the replacement due to this, which creates a new one and delete the existing one. But we are seeing not authorized error hence rollback.

11:12:03 AM | DELETE_FAILED        | AWS::CloudFormation::CustomResource | demo-eks-cluster/R...e/Resource/De
fault
Received response status [FAILED] from custom resource. Message returned: User:
arn:aws:sts::<deducted>:assumed-role/dummy-stack1-demoeksclusterCreationRoleD556FC0C-eSVJAmlypMdd/AWSCDK.EK
SCluster.Delete.ec88927b-3c8e-4b8f-bd7b-94445b11de48 is not authorized to perform: eks:DeleteCluster on resou
rce: arn:aws:eks:us-east-1:<deducted>:cluster/foo

Logs: /aws/lambda/dummy-stack1-awscdkawseksCl-OnEventHandler42BEBAE0-F7RpMPmuSPA5

at throwDefaultError (/var/runtime/node_modules/@aws-sdk/node_modules/@smithy/smithy-client/dist-cjs/default-
error-handler.js:8:22)
at /var/runtime/node_modules/@aws-sdk/node_modules/@smithy/smithy-client/dist-cjs/default-error-handler.js:18
:39
at de_DeleteClusterCommandError (/var/runtime/node_modules/@aws-sdk/client-eks/dist-cjs/protocols/Aws_restJso
n1.js:1526:20)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async /var/runtime/node_modules/@aws-sdk/node_modules/@smithy/middleware-serde/dist-cjs/deserializerMiddle
ware.js:7:24

I am not sure if update clusterName should trigger replacement but obviously we probably need to add the relevant permissions to the cluster resource handler.

Expected Behavior

update the clusterName should not fail. Preferably in-place update but if replacement is necessary, it should not fail and roll back.

Current Behavior

fail and roll back

Reproduction Steps

as described above

Possible Solution

  1. Let's test if we can simply trigger the in-palace update rather than replacement.
  2. If replacement is necessary, add eks:DeleteCluster on the cluster resource to the custom resource handler role.

Additional Information/Context

No response

CDK CLI Version

v2.130.0

Framework Version

No response

Node.js Version

v18.16.0

OS

mac os x

Language

TypeScript

Language Version

No response

Other information

No response

@pahud pahud added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Feb 27, 2024
@github-actions github-actions bot added the @aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service label Feb 27, 2024
@pahud
Copy link
Contributor Author

pahud commented Feb 27, 2024

internal tracking: D120674266

@pahud pahud added the p1 label Feb 27, 2024
@pahud pahud self-assigned this Feb 27, 2024
@pahud pahud added effort/medium Medium work item – several days of effort and removed needs-triage This issue or PR still needs to be triaged. labels Feb 27, 2024
@pahud
Copy link
Contributor Author

pahud commented Feb 27, 2024

From CFN's perspective, updating the cluster Name does trigger replacement:

https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-eks-cluster.html#cfn-eks-cluster-name

So I think it makes sense here.

If fact the cluster resource handler role does have eks:DeleteCluster

creationRole.addToPolicy(new iam.PolicyStatement({
actions: [
'eks:CreateCluster',
'eks:DescribeCluster',
'eks:DescribeUpdate',
'eks:DeleteCluster',
'eks:UpdateClusterVersion',
'eks:UpdateClusterConfig',
'eks:CreateFargateProfile',
'eks:TagResource',
'eks:UntagResource',
],
resources: resourceArns,
}));
but the resource ARN actually has changed to the new one hence not authorized.

@pahud
Copy link
Contributor Author

pahud commented Feb 27, 2024

OK I guess the best solution is to add a note in the doc to work around like this as it doesn't make sense to allow the cluster admin role to eks:DeleteCluster on *.

    const cluster = new eks.Cluster(this, 'demo-eks-cluster', {
      vpc,
      clusterName: 'foo', // will rename to 'bar'
      defaultCapacity: 0,
      version: eks.KubernetesVersion.V1_29,
      kubectlLayer: new KubectlLayer(this, 'kubectlLayer'),
    });

    // allow the cluster admin role to delete the 'foo' cluster
    cluster.adminRole.addToPolicy(new iam.PolicyStatement({
      actions: ['eks:DeleteCluster'],
      resources: [ 
        Stack.of(this).formatArn({ service: 'eks', resource: 'cluster', resourceName: 'foo' }),
    ]
    }))

@pahud
Copy link
Contributor Author

pahud commented Feb 27, 2024

related to #24174

@pahud
Copy link
Contributor Author

pahud commented Feb 27, 2024

closing with #29283

@pahud pahud closed this as completed Feb 27, 2024
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

mergify bot pushed a commit that referenced this issue Feb 27, 2024
### Issue # (if applicable)

As described in #29282 , when renaming the cluster, an additional temporary IAM policy will be required. I am proposing the doc update to clarify this with this PR.

Closes #29282 #24174

### Reason for this change

To address this use case.

### Description of changes



### Description of how you validated changes



### Checklist
- [x] My code adheres to the [CONTRIBUTING GUIDE](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md) and [DESIGN GUIDELINES](https://github.com/aws/aws-cdk/blob/main/docs/DESIGN_GUIDELINES.md)

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service bug This issue is a bug. effort/medium Medium work item – several days of effort p1
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant