-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
eks: fail to create eks nodegroup in cn-north-1 #24696
Comments
Hi, Let me clarify this first.
|
I found the root cause here:
In China, this should be |
@pahud I just create the eks cluster, not create the nodegroup and nodegroup role, this is my cdk code :
I deploy the stack in cn-north-1, but the stack roll back finally. And I check the cfn stack error, cfn stack prompted a sub-stack creation failure, so I checked the error message from the sub-stack and find the following log: I am sure the policy 'AmazonElasticContainerRegistryPublicReadOnly' is aws managed policy which only use for global region, I cannt find this iam policy in China region. Could you please help to check if it is the same root cause ? |
Yes I can't deploy even this to import { App, Stack, StackProps,
aws_eks as eks,
aws_ec2 as ec2,
} from 'aws-cdk-lib';
import { KubectlV25Layer as KubectlLayer } from '@aws-cdk/lambda-layer-kubectl-v25';
const vpc = ec2.Vpc.fromLookup(this, 'Vpc', { isDefault: true });
const cluster = new eks.Cluster(this, 'Cluster', {
vpc,
version: eks.KubernetesVersion.V1_25,
kubectlLayer: new KubectlLayer(this, 'KubectlLayer'),
}) The error message is just as you described above:
Looks like the EKS is expecting ec2 service principal name as |
@Bruce-Lu674 I created #24743 for the missing |
@pahud Many thanks for your help, |
@Bruce-Lu674 The relevant team is working on it but I don't have ETA at this moment but I will update here when I see the issue is fixed(hopefully very soon). btw, are you able to successfully deploy eks with cdk 2.65 in |
@pahud Yes, I can deploy the EKS Cluster via cdk v2.65 in cn-north-1. |
Hi @Bruce-Lu674 Are you able to deploy the cluster AND a nodegroup with cdk v2.65.0 in cn-north-1 like this? const cluster = new eks.Cluster(this, 'Cluster', {
vpc,
version: eks.KubernetesVersion.V1_24,
defaultCapacity: 0,
kubectlLayer,
});
const ng = cluster.addNodegroupCapacity('NG', {
desiredSize: 2,
}); |
Hi Pahud @pahud , yes, I can create the eks cluster via v2.65 and v2.66, but without the Nodegroup resource.
Here is my python code:
|
@Bruce-Lu674 Unfortunately I can't even successfully deploy the cluster. I'll keep diving deep for the root cause. btw, do you have account on cdk.dev slack? Can you ping me on the slack so we can directly discuss more details? |
Hi I am on CDK version 2.74.0, and this is still an issue. Thanks |
We are still working with internal teams to fix this but unfortunately no ETA at this moment. I'll share the update if any. EKS in CN is having 2 additional issues as well and we probably need to fix them before we are allowed to deploy with the latest CDK. |
…Cloud regions (#25215) Reopening this PR because #25170 was closed by accident. As ECR Public is not available in China regions and GovCloud, `AmazonElasticContainerRegistryPublicReadOnly` IAM managed policy would not be available in those affected regions and should not be attached to the role. This PR implements a CfnCondition to determine if ECR public is available based on `Aws.Partition` of the deploying region and conditionally attach `AmazonElasticContainerRegistryPublicReadOnly` to the kubectl-provider handler role. This PR has been tested in the following regions: - [x] *cn-north-1 - [x] *cn-northwest-1 - [x] us-east-1 * I can confirm the role is created correctly in cn regions but due to - #24358 - #24696 The cluster and nodegroup are still failing to create in CN. Closes #24743 #24808 #25178
I can confirm we can successfully deploy EKS cluster in China regions with escape hatches as below: import { KubectlV26Layer as KubectlLayer } from '@aws-cdk/lambda-layer-kubectl-v26';
const cluster = new eks.Cluster(scope, 'EksCluster', {
vpc,
version: eks.KubernetesVersion.V1_26,
kubectlLayer: new KubectlLayer(scope, 'KubectlLayer'),
defaultCapacity: 2,
});
// override the service principal for the default nodegroup
overrideServicePrincipal(cluster.defaultNodegroup?.role.node.defaultChild as iam.CfnRole)
const ng = cluster.addNodegroupCapacity('NG', {
desiredSize: 2,
});
// override the service principal for the additional nodegroup
overrideServicePrincipal(ng.role.node.defaultChild as iam.CfnRole)
function overrideServicePrincipal(role: iam.CfnRole) {
role.addPropertyOverride('AssumeRolePolicyDocument.Statement.0.Principal.Service', ['ec2.amazonaws.com', 'ec2.amazonaws.com.cn'])
} % kubectl get no
NAME STATUS ROLES AGE VERSION
ip-10-0-140-206.cn-north-1.compute.internal Ready <none> 2m34s v1.26.2-eks-a59e1f0
ip-10-0-141-57.cn-north-1.compute.internal Ready <none> 2m20s v1.26.2-eks-a59e1f0
ip-10-0-174-210.cn-north-1.compute.internal Ready <none> 2m34s v1.26.2-eks-a59e1f0 This is a temporary fix for this issue from CDK. |
Hello @pahud , We are still encountering below when using latest cdk version to create eks and corresponding resources like helm chart etc, and I tested cdk-2.65.0 which looks good, however, it's hard for us to use this cdk version considering other facts, so do we have a ETA or workaround for this issue? 2023-05-19 14:12:02 UTC+0800 HandlerServiceRoleFCDC14AE |
@justin007755 This bug should have been fixed in #25215 Please install the latest AWS CDK and let me know if it works for you. |
Describe the bug
Hi, folks
I met a promble when use aws python cdk to create eks cluster. Please find information below:
My local env:
(.venv) [ec2-user@ip-10-0-1-73 python-cdk]$ cdk --version
2.67.0 (build b6f7f39)
(.venv) [ec2-user@ip-10-0-1-73 python-cdk]$ python3 --version
Python 3.7.10
(.venv) [ec2-user@ip-10-0-1-73 python-cdk]$ cat /proc/version
Linux version 5.10.144-127.601.amzn2.x86_64 (mockbuild@ip-10-0-44-229) (gcc10-gcc (GCC) 10.3.1 20210422 (Red Hat 10.3.1-1), GNU ld version 2.35-21.amzn2.0.1) #1 SMP Thu Sep 29 01:11:59 UTC 2022
Here is the core code:
I manually create the Node Role, and the cdk will deploy successfully, but when i remove the node_role parameter, like these:
Below error messages will be thrown :
But i think the principals [ec2.amazonaws.com.cn] is right in cn-north-1 region.
Could you please help to check this problem ?
Expected Behavior
When I do not specify the node role in the method, i think cdk will automaticallycreate the node role.
Method doc : https://docs.aws.amazon.com/cdk/api/v1/python/aws_cdk.aws_eks/Cluster.html#aws_cdk.aws_eks.Cluster.add_nodegroup_capacity
Current Behavior
In the cn-north-1 region, CDk create the node role failed.
I had checked the principals which in my another ec2 role, the configuration [ec2.amazonaws.com.cn] is right.
It seems that CDK cannot recognize this principals
Reproduction Steps
Refer to the CDK code, when remove the node_role, it will create failed in cn-north-1 region.
Possible Solution
manually create the node role, and hard-code in the cdk code
Additional Information/Context
No response
CDK CLI Version
2.67.0
Framework Version
No response
Node.js Version
v16.18.0
OS
Amazon Linux2
Language
Python
Language Version
3.7.10
Other information
No response
The text was updated successfully, but these errors were encountered: