Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(eks): Support isolated VPCs #12171

Open
2 tasks
iliapolo opened this issue Dec 20, 2020 · 17 comments
Open
2 tasks

(eks): Support isolated VPCs #12171

iliapolo opened this issue Dec 20, 2020 · 17 comments
Labels
@aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service effort/small Small work item – less than a day of effort feature-request A feature should be added or improved. p1 service-api This issue is due to a problem in a service API

Comments

@iliapolo
Copy link
Contributor

iliapolo commented Dec 20, 2020

Provisioning clusters inside an isolated vpc (i.e no internet access) is not currently supported.
This is because the lambda functions that operate the cluster need to invoke the EKS service, which does not offer a VPC endpoint.

See aws/containers-roadmap#298

Use Case

We've seen users mentioning their environment uses an isolated VPC.

Other

Adding some information here to possibly facilitate alternative approaches.

If you have a proxy setup, you can inject proxy information to the handlers via custom environment variables.

const proxy = "https://proxy.mycompany.com:8080/”;
new eks.Cluster(this, 'Cluster', {
  ...,

  kubectlEnvironment: {
    HTTPS_PROXY: proxy,
  },

  clusterHandlerEnvironment: {
    HTTPS_PROXY: proxy
  }
})

Also, following is a list of AWS services that our Lambda handlers interact with in order to operate the cluster. All of these services offer a VPC endpoint except for EKS.

  • Lambda
  • Step Functions
  • CloudFormation
  • STS
  • S3
  • EKS

Related: #10036

Once EKS does offer a VPC endpoint, it would be nice if we just provision the necessary endpoints given if we identify that the VPC does not have internet access (internet gateway, NAT).

  • 👋 I may be able to implement this feature request
  • ⚠️ This feature might incur a breaking change

This is a 🚀 Feature Request

@iliapolo iliapolo added p2 feature-request A feature should be added or improved. @aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service effort/small Small work item – less than a day of effort blocked Work is blocked on this issue for this codebase. Other labels or comments may indicate why. labels Dec 20, 2020
mergify bot pushed a commit that referenced this issue Dec 21, 2020
Allow all our lambda handlers to be provisioned inside the cluster VPC.

The `KubectlProvider` handlers were already placed inside the VPC is they could have, the missing was to include the `ClusterHandler`. This is now possible via the `placeClusterHandlerInVpc` (names are welcome) property.

Default value remains `false` because if the VPC happens to be isolated (i.e no outbound internet access) this would break the deployment. (See #12171)

Closes #9509 

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
flochaz pushed a commit to flochaz/aws-cdk that referenced this issue Jan 5, 2021
Allow all our lambda handlers to be provisioned inside the cluster VPC.

The `KubectlProvider` handlers were already placed inside the VPC is they could have, the missing was to include the `ClusterHandler`. This is now possible via the `placeClusterHandlerInVpc` (names are welcome) property.

Default value remains `false` because if the VPC happens to be isolated (i.e no outbound internet access) this would break the deployment. (See aws#12171)

Closes aws#9509 

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
@BowlesCR
Copy link

In my scenario, my "isolated" subnets aren't really isolated from the internet as I use a TGW to route traffic via an egress network. If you try for private and natGateways=0, CDK insists you call them isolated. If you call them isolated, you can't put EKS on them.

Is there a workaround to this, or could there be some sort of "I know what I'm doing" override added?

@iliapolo
Copy link
Contributor Author

iliapolo commented Mar 1, 2021

@BowlesCR

If you call them isolated, you can't put EKS on them.

If they are not actually isolated, you should be able to use them. Are you getting some kind of error?

This issue refers only to truly isolated subnets that have no internet access.

@BowlesCR
Copy link

BowlesCR commented Mar 1, 2021

Yes:
jsii.errors.JSIIError: There are no 'Private' subnet groups in this VPC. Available types: Isolated,Public

aws_eks.Cluster() appears to be inferring the subnet type from the endpoint_access param, and aws_eks.EndpointAccess doesn't have an enum for ISOLATED.

I think I just found the correct way to do this, which is leave endpoint_access = PRIVATE, but manually specify vpc_subnets = vpc.isolated_subnets

As I'm thinking about this more... I think my complaint is more properly lodged with the natGateways=0 requires ISOLATED logic... I fear that calling them isolated will lead to someone making a poor assumption about their (lack of) internet access down the road.

@iliapolo
Copy link
Contributor Author

iliapolo commented Mar 1, 2021

I think my complaint is more properly lodged with the natGateways=0 requires ISOLATED logic

I agree about that. Might be worth opening a separate issue for the ec2 package. I'm still a little fuzzy though on the error you get from the EKS construct.

appears to be inferring the subnet type from the endpoint_access param

It doesn't really do that. The only logic pertaining to subnets is that we try and select the private subnets from the configured VPC, but we actually treat ISOLATED as PRIVATE there.

Would help if you could attach the full stack trace and/or code snippet.

@BowlesCR
Copy link

BowlesCR commented Mar 1, 2021

Sure thing
Stacktrace (file paths lightly sanitized):

jsii.errors.JavaScriptError: 
  Error: There are no 'Private' subnet groups in this VPC. Available types: Isolated,Public
      at Vpc.selectSubnetObjectsByType (/tmp/jsii-kernel-Jsmxfu/node_modules/@aws-cdk/aws-ec2/lib/vpc.js:206:19)
      at Vpc.selectSubnetObjects (/tmp/jsii-kernel-Jsmxfu/node_modules/@aws-cdk/aws-ec2/lib/vpc.js:172:28)
      at Vpc.selectSubnets (/tmp/jsii-kernel-Jsmxfu/node_modules/@aws-cdk/aws-ec2/lib/vpc.js:59:30)
      at /tmp/jsii-kernel-Jsmxfu/node_modules/@aws-cdk/aws-eks/lib/cluster.js:265:77
      at Array.map (<anonymous>)
      at new Cluster (/tmp/jsii-kernel-Jsmxfu/node_modules/@aws-cdk/aws-eks/lib/cluster.js:265:59)
      at /tmp/tmpyat3mqfa/lib/program.js:2720:58
      at Kernel._wrapSandboxCode (/tmp/tmpyat3mqfa/lib/program.js:3148:24)
      at Kernel._create (/tmp/tmpyat3mqfa/lib/program.js:2720:34)
      at Kernel.create (/tmp/tmpyat3mqfa/lib/program.js:2461:29)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "./CDK-Infrastructure/app.py", line 382, in <module>
    main()
  File "./CDK-Infrastructure/app.py", line 338, in main
    eks_stack = EksStack(
  File "./CDK-Infrastructure/.env/lib64/python3.9/site-packages/jsii/_runtime.py", line 83, in __call__
    inst = super().__call__(*args, **kwargs)
  File "./CDK-Infrastructure/cdk_infrastructure/cdk_eks/cdk_eks_stack.py", line 48, in __init__
    self.cluster = eks.Cluster(
  File "./CDK-Infrastructure/.env/lib64/python3.9/site-packages/jsii/_runtime.py", line 83, in __call__
    inst = super().__call__(*args, **kwargs)
  File "./CDK-Infrastructure/.env/lib64/python3.9/site-packages/aws_cdk/aws_eks/__init__.py", line 7895, in __init__
    jsii.create(Cluster, self, [scope, id, props])
  File "./CDK-Infrastructure/.env/lib64/python3.9/site-packages/jsii/_kernel/__init__.py", line 265, in create
    response = self.provider.create(
  File "./CDK-Infrastructure/.env/lib64/python3.9/site-packages/jsii/_kernel/providers/process.py", line 348, in create
    return self._process.send(request, CreateResponse)
  File "./CDK-Infrastructure/.env/lib64/python3.9/site-packages/jsii/_kernel/providers/process.py", line 330, in send
    raise JSIIError(resp.error) from JavaScriptError(resp.stack)
jsii.errors.JSIIError: There are no 'Private' subnet groups in this VPC. Available types: Isolated,Public
Subprocess exited with error 1

cdk_eks_stack.py Line 48:

        self.cluster = eks.Cluster(
            self,
            "Cluster",
            cluster_name=cluster_name,
            vpc=vpc,
            version=eks.KubernetesVersion.V1_18,
            default_capacity=0,
            endpoint_access=eks.EndpointAccess.PRIVATE,
            masters_role=adminRole,
            secrets_encryption_key=secrets_key,
            security_group=security_group,
            # vpc_subnets=vpc.isolated_subnets,
        )

@iliapolo
Copy link
Contributor Author

iliapolo commented Mar 1, 2021

Ok I understand now. Yeah your solution is appropriate, without the vpc_subnets config we try to select PRIVATE subnets and fail. Thanks.

@BowlesCR
Copy link

BowlesCR commented Mar 1, 2021

Excellent. Thank you for taking a look.

@ericzbeard ericzbeard added the feature/enhancement A new API to make things easier or more intuitive. A catch-all for general feature requests. label Apr 2, 2021
@iliapolo iliapolo removed their assignment Jun 27, 2021
mergify bot pushed a commit that referenced this issue Nov 5, 2021
…ll cluster handler functions (#17200)

## Summary

This PR is intended for CDK EKS users who require all traffic to be routed through a proxy. Currently if a user does not allow internet connections to the VPC without going through a proxy, then deploying an EKS cluster will result in a timeout error:

```sh
Received response status [FAILED] from custom resource. Message returned: Error: 2021-10-20T14:20:47.028Z d86e3ef4-45ce-4130-988f-c4663f7f8c80 Task timed out after 60.06 seconds
```
Fixes: #12469, SIM D29159517
Related to but does not resolve: `#12171

## ⚙️ Changes

_Expand each list item for additional details._

<details>
<summary><strong>Corrected "Cluster Handler" docs to clarify that 2 lambdas are created (<code>onEventHandler</code>, <code>isCompleteHandler</code>)</strong></summary>
<br />

Our docs [currently describe the "Cluster Handler" as one Lambda function that interacts with the EKS API](https://docs.aws.amazon.com/cdk/api/latest/docs/aws-eks-readme.html#cluster-handler). However this is not accurate. The "Cluster Handler" actually creates [two Lambdas](https://github.com/aws/aws-cdk/blob/0cabb9f2d2f50c03337cd6f35bf47fc54ada3a21/packages/%40aws-cdk/aws-eks/lib/cluster-resource-provider.ts#L69-L96) for the Custom Resource, `onEventHandler` and `isCompleteHandler`, both interact with the AWS API.

</details>

<details>
<summary><strong>Passes the <code>clusterHandlerEnvironment</code> to both Cluster Handler Lambdas</strong></summary>
<br />

The `clusterHandlerEnvironment` is the [recommended method](https://docs.aws.amazon.com/cdk/api/latest/docs/aws-eks-readme.html#cluster-handler) of passing a proxy url (i.g. `http_proxy: 'http://my-proxy.com:3128'`) to the Cluster Handler. 

Currently the `clusterHandlerEnvironment` is only passed to the Cluster Handler's `onEventHandler` Lambda. [The `onEventHandler` was believed to be the only Cluster Handler Lambda that interacts with the AWS EKS API](#12469 (comment)), however this is not entirely true. Both the `onEventHandler` and `isCompleteHandler` call the AWS EKS API.

Following the execution process of `isCompleteHandler` when creating an EKS cluster:

1. [`index.isComplete()` (this is the Lambda handler)](https://github.com/aws/aws-cdk/blob/0cabb9f2d2f50c03337cd6f35bf47fc54ada3a21/packages/%40aws-cdk/aws-eks/lib/cluster-resource-handler/index.ts#L48)
2. [`common.isComplete()`](https://github.com/aws/aws-cdk/blob/0cabb9f2d2f50c03337cd6f35bf47fc54ada3a21/packages/%40aws-cdk/aws-eks/lib/cluster-resource-handler/common.ts#L59)
3. [`cluster.isCreateComplete()`](https://github.com/aws/aws-cdk/blob/0cabb9f2d2f50c03337cd6f35bf47fc54ada3a21/packages/%40aws-cdk/aws-eks/lib/cluster-resource-handler/cluster.ts#L56)
4. [`cluster.isActive()`](https://github.com/aws/aws-cdk/blob/0cabb9f2d2f50c03337cd6f35bf47fc54ada3a21/packages/%40aws-cdk/aws-eks/lib/cluster-resource-handler/cluster.ts#L196)
5. [Request to EKS API](https://github.com/aws/aws-cdk/blob/0cabb9f2d2f50c03337cd6f35bf47fc54ada3a21/packages/%40aws-cdk/aws-eks/lib/cluster-resource-handler/cluster.ts#L198) (results in timeout because proxy is not used)

This change allows the user to pass proxy urls as environment variables to **both** Lambdas using `clusterHandlerEnvironment`.

</details>

<details>
<summary><strong>Renames the prop <code>onEventLayer</code> -> <code>proxyAgentLayer</code>, and provides the layer to both Cluster Handler Lambdas</strong></summary>
<br />

The proxy-agent layer is now used in both `onEventHandler` and `isCompleteHandler` lambdas in order to support proxy configurations. Because of this change, i've deprecated the original `onEventLayer` and created a new prop `proxyAgentLayer` since we will now be passing this prop into more than just the `onEventHandler` Lambda.

The `onEventLayer` prop was introduced [a few weeks ago (sept 24)](#16657) so it should not impact many users (if any). The prop would only be used if the user wishes to bundle the layer themselves with a custom proxy agent. 

This prop follows the [same user customization we allow with the kubectl handler](https://docs.aws.amazon.com/cdk/api/latest/docs/@aws-cdk_aws-eks.Cluster.html#kubectllayer). 

Another suitable name for this prop could have been `clusterHandlerLayer` but I chose `proxyAgentLayer` because it represents **what** the layer is used for, instead of describing **where** it's used. This also follows the convention of the pre-existing [`kubectlLayer` prop](https://docs.aws.amazon.com/cdk/api/latest/docs/@aws-cdk_aws-eks.Cluster.html#kubectllayer).

</details>

<details>
<summary><strong>Adds the EKS cluster prop <code>clusterHandlerSecurityGroup</code></strong></summary>
<br />

If a proxy address is provided to the Cluster Handler Lambdas, but the proxy instance is not open to the world, then the dynamic IPs of the Cluster Handler Lambdas will be denied access. To solve this, i've implemented a new Cluster prop `clusterHandlerSecurityGroup`. This `clusterHandlerSecurityGroup` prop will allow the user to pass a Security Group to both Lambda functions and the Custom Resource provider. 

This is very similar to how we [already allow users to pass Security Groups to the Kubectl Handler](https://github.com/aws/aws-cdk/blob/7f194000697b85deb410ae0d7f7d4ac3c2654bcc/packages/%40aws-cdk/aws-eks/lib/kubectl-provider.ts#L83)

</details>

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
TikiTDO pushed a commit to TikiTDO/aws-cdk that referenced this issue Feb 21, 2022
…ll cluster handler functions (aws#17200)

## Summary

This PR is intended for CDK EKS users who require all traffic to be routed through a proxy. Currently if a user does not allow internet connections to the VPC without going through a proxy, then deploying an EKS cluster will result in a timeout error:

```sh
Received response status [FAILED] from custom resource. Message returned: Error: 2021-10-20T14:20:47.028Z d86e3ef4-45ce-4130-988f-c4663f7f8c80 Task timed out after 60.06 seconds
```
Fixes: aws#12469, SIM D29159517
Related to but does not resolve: `aws#12171

## ⚙️ Changes

_Expand each list item for additional details._

<details>
<summary><strong>Corrected "Cluster Handler" docs to clarify that 2 lambdas are created (<code>onEventHandler</code>, <code>isCompleteHandler</code>)</strong></summary>
<br />

Our docs [currently describe the "Cluster Handler" as one Lambda function that interacts with the EKS API](https://docs.aws.amazon.com/cdk/api/latest/docs/aws-eks-readme.html#cluster-handler). However this is not accurate. The "Cluster Handler" actually creates [two Lambdas](https://github.com/aws/aws-cdk/blob/0cabb9f2d2f50c03337cd6f35bf47fc54ada3a21/packages/%40aws-cdk/aws-eks/lib/cluster-resource-provider.ts#L69-L96) for the Custom Resource, `onEventHandler` and `isCompleteHandler`, both interact with the AWS API.

</details>

<details>
<summary><strong>Passes the <code>clusterHandlerEnvironment</code> to both Cluster Handler Lambdas</strong></summary>
<br />

The `clusterHandlerEnvironment` is the [recommended method](https://docs.aws.amazon.com/cdk/api/latest/docs/aws-eks-readme.html#cluster-handler) of passing a proxy url (i.g. `http_proxy: 'http://my-proxy.com:3128'`) to the Cluster Handler. 

Currently the `clusterHandlerEnvironment` is only passed to the Cluster Handler's `onEventHandler` Lambda. [The `onEventHandler` was believed to be the only Cluster Handler Lambda that interacts with the AWS EKS API](aws#12469 (comment)), however this is not entirely true. Both the `onEventHandler` and `isCompleteHandler` call the AWS EKS API.

Following the execution process of `isCompleteHandler` when creating an EKS cluster:

1. [`index.isComplete()` (this is the Lambda handler)](https://github.com/aws/aws-cdk/blob/0cabb9f2d2f50c03337cd6f35bf47fc54ada3a21/packages/%40aws-cdk/aws-eks/lib/cluster-resource-handler/index.ts#L48)
2. [`common.isComplete()`](https://github.com/aws/aws-cdk/blob/0cabb9f2d2f50c03337cd6f35bf47fc54ada3a21/packages/%40aws-cdk/aws-eks/lib/cluster-resource-handler/common.ts#L59)
3. [`cluster.isCreateComplete()`](https://github.com/aws/aws-cdk/blob/0cabb9f2d2f50c03337cd6f35bf47fc54ada3a21/packages/%40aws-cdk/aws-eks/lib/cluster-resource-handler/cluster.ts#L56)
4. [`cluster.isActive()`](https://github.com/aws/aws-cdk/blob/0cabb9f2d2f50c03337cd6f35bf47fc54ada3a21/packages/%40aws-cdk/aws-eks/lib/cluster-resource-handler/cluster.ts#L196)
5. [Request to EKS API](https://github.com/aws/aws-cdk/blob/0cabb9f2d2f50c03337cd6f35bf47fc54ada3a21/packages/%40aws-cdk/aws-eks/lib/cluster-resource-handler/cluster.ts#L198) (results in timeout because proxy is not used)

This change allows the user to pass proxy urls as environment variables to **both** Lambdas using `clusterHandlerEnvironment`.

</details>

<details>
<summary><strong>Renames the prop <code>onEventLayer</code> -> <code>proxyAgentLayer</code>, and provides the layer to both Cluster Handler Lambdas</strong></summary>
<br />

The proxy-agent layer is now used in both `onEventHandler` and `isCompleteHandler` lambdas in order to support proxy configurations. Because of this change, i've deprecated the original `onEventLayer` and created a new prop `proxyAgentLayer` since we will now be passing this prop into more than just the `onEventHandler` Lambda.

The `onEventLayer` prop was introduced [a few weeks ago (sept 24)](aws#16657) so it should not impact many users (if any). The prop would only be used if the user wishes to bundle the layer themselves with a custom proxy agent. 

This prop follows the [same user customization we allow with the kubectl handler](https://docs.aws.amazon.com/cdk/api/latest/docs/@aws-cdk_aws-eks.Cluster.html#kubectllayer). 

Another suitable name for this prop could have been `clusterHandlerLayer` but I chose `proxyAgentLayer` because it represents **what** the layer is used for, instead of describing **where** it's used. This also follows the convention of the pre-existing [`kubectlLayer` prop](https://docs.aws.amazon.com/cdk/api/latest/docs/@aws-cdk_aws-eks.Cluster.html#kubectllayer).

</details>

<details>
<summary><strong>Adds the EKS cluster prop <code>clusterHandlerSecurityGroup</code></strong></summary>
<br />

If a proxy address is provided to the Cluster Handler Lambdas, but the proxy instance is not open to the world, then the dynamic IPs of the Cluster Handler Lambdas will be denied access. To solve this, i've implemented a new Cluster prop `clusterHandlerSecurityGroup`. This `clusterHandlerSecurityGroup` prop will allow the user to pass a Security Group to both Lambda functions and the Custom Resource provider. 

This is very similar to how we [already allow users to pass Security Groups to the Kubectl Handler](https://github.com/aws/aws-cdk/blob/7f194000697b85deb410ae0d7f7d4ac3c2654bcc/packages/%40aws-cdk/aws-eks/lib/kubectl-provider.ts#L83)

</details>

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
@ArtiomL
Copy link

ArtiomL commented Apr 19, 2022

Hi,
Is there a way to disable the handler lambdas as part of the deployment?
And if not - would it make sense to add this as a selectable option?

@pahud
Copy link
Contributor

pahud commented Feb 1, 2023

At this moment(cdk 2.63.0), it's possible to deploy a private eks endpoint with nodegroup in the PRIVATE_WITH_EGRESS subnets. Check out the sample below:

    const cluster = new eks.Cluster(this, 'Cluster', {
      vpc,
      version: eks.KubernetesVersion.V1_24,
      // private endpoint only
      endpointAccess: eks.EndpointAccess.PRIVATE,
      vpcSubnets: [
        { subnetType: ec2.SubnetType.PRIVATE_WITH_EGRESS },
      ],
      // lambda handler with vpc access
      placeClusterHandlerInVpc: true,
      kubectlLayer: new KubectlLayer(this, 'KUbectlLayer'),
      defaultCapacity: 0,
    });
    // nodegroup in privage subnet with egress to access internet without any vpc endpoints
    cluster.addNodegroupCapacity('NG', {
      subnets: vpc.selectSubnets({ subnetType: ec2.SubnetType.PRIVATE_WITH_EGRESS, }),
    })

@peterwoodworth peterwoodworth removed the blocked Work is blocked on this issue for this codebase. Other labels or comments may indicate why. label May 11, 2023
@peterwoodworth peterwoodworth added service-api This issue is due to a problem in a service API and removed feature/enhancement A new API to make things easier or more intuitive. A catch-all for general feature requests. labels May 11, 2023
@emanserav
Copy link

emanserav commented Jul 27, 2023

hi @pahud
can we get some details on why the EKS cluster cannot currently be deployed on Isolated Subnets ?
What could be the issue(s) within the Lambda functions ?
Why would the Lambda functions need Internet access ? (shouldn't be enough to reach out the EKS (and other endpoints as needed inside the VPC) cluster Endpoint (private) ?)
thank you

@emanserav
Copy link

emanserav commented Jul 27, 2023

If I am using the Isolated subnets in the VPC and we don't use a NAT but instead we direct all outgoing networking through a proxy, is there a way to pass this proxy setup to the EKS Cluster Construct (or somehow to the Lambdas deployed as part of the Cluster construct) ?
This is just for the case where not looking or understanding on why the Lambdas deployed by the EKS Cluster construct will need Internet access and just setup the proxy for those Lambdas so they can reach the Internet.

here is the error I am currently getting when trying to create the EKS Cluster, the error happens on the Lambdas deployed as part of the EKS Cluster L2 Construct when trying to update the k8s cluster auth manifest (logical ID AwsAuthmanifest):
Received response status [FAILED] from custom resource. Message returned: Error: b'\nConnect timeout on endpoint URL: "https://sts.amazonaws.com/"\nUnable to connect to the server: getting credentials: exec: executable aws failed with exit code 255 (Client.Timeout exceeded while awaiting headers)\n' Logs:

(I am also wondering why the STS global https://sts.amazonaws.com/ as I am already using an STS endpoint in the VPC by region, I was expecting this one to be tried to be reached by the Lambdas)

@pahud
Copy link
Contributor

pahud commented Aug 16, 2023

hi @pahud can we get some details on why the EKS cluster cannot currently be deployed on Isolated Subnets ? What could be the issue(s) within the Lambda functions ? Why would the Lambda functions need Internet access ? (shouldn't be enough to reach out the EKS (and other endpoints as needed inside the VPC) cluster Endpoint (private) ?) thank you

Yes technically the eks cluster can be associated with isolated subnets but the primary consider for that is - If your lambda function is associated with isolated subnets, it can access the control plane but won't be able to access the EKS service API until some private endpoints are enabled or http_proxy configured. It's still unclear to us how to configure correctly in CDK so I would suggest associate PRIVATE_WITH_EGRESS subnets for vpcSubnets and make sure your lambda function won't associate with isolated subnets without appropriate http_proxy configuration.

vpcSubnets: [
        { subnetType: ec2.SubnetType.PRIVATE_WITH_EGRESS },
      ],

@ClaudiusMZ
Copy link

ClaudiusMZ commented Aug 24, 2023

If I am using the Isolated subnets in the VPC and we don't use a NAT but instead we direct all outgoing networking through a proxy, is there a way to pass this proxy setup to the EKS Cluster Construct (or somehow to the Lambdas deployed as part of the Cluster construct) ? This is just for the case where not looking or understanding on why the Lambdas deployed by the EKS Cluster construct will need Internet access and just setup the proxy for those Lambdas so they can reach the Internet.

here is the error I am currently getting when trying to create the EKS Cluster, the error happens on the Lambdas deployed as part of the EKS Cluster L2 Construct when trying to update the k8s cluster auth manifest (logical ID AwsAuthmanifest): Received response status [FAILED] from custom resource. Message returned: Error: b'\nConnect timeout on endpoint URL: "https://sts.amazonaws.com/"\nUnable to connect to the server: getting credentials: exec: executable aws failed with exit code 255 (Client.Timeout exceeded while awaiting headers)\n' Logs:

(I am also wondering why the STS global https://sts.amazonaws.com/ as I am already using an STS endpoint in the VPC by region, I was expecting this one to be tried to be reached by the Lambdas)


I run into the same issue - this should help you:

  const cluster = new eks.Cluster(this, 'your-cluster', {
    clusterName:                '<your-cluster-name>',
    version:                    eks.KubernetesVersion.V1_27,
    kubectlLayer:               new KubectlV27Layer(this, 'kubectl-v27-layer'),
    endpointAccess:             eks.EndpointAccess.PRIVATE,
    vpc:                        vpc,
    vpcSubnets:                 [{ subnets: [<isolated_subnet_1>, <isolated_subnet_2>] }],
    ...
    placeClusterHandlerInVpc:   true,
    clusterHandlerEnvironment:  { AWS_STS_REGIONAL_ENDPOINTS: 'regional'},
    kubectlEnvironment:         { AWS_STS_REGIONAL_ENDPOINTS: 'regional'}
  });

requires: regional vpc endpoint for sts

@emanserav
Copy link

emanserav commented Aug 27, 2023

thank you @ClaudiusMZ for your input but that didn't help.
Before your comment (using cdk 2.89.0) I've tried to insert my proxy (by inserting proxy common ENV variables: http_proxy, https_proxy, no_proxy) in the 2 fields:

  • clusterHandlerEnvironment
  • kubectlEnvironment

and I was getting
"Received response status [FAILED] from custom resource. Message returned: Error: connect ETIMEDOUT <some PUBLIC_IP>:443 at TCPConnectWrap.afterConnect
[as oncomplete] (node:net:1494:16) (RequestId: <req_ID>)_
"

So when I saw your comment I thought that you may have a good point there and that will go directly to the regional STS end point as expected but
However, I got exactly the same error (if matters or not but cdk now was 2.93.0).

Therefore: I still suspect the handlers don't pick up the proxy, nor reaching directly the STS endpoints defined in VPC as you mentioned (without proxy)

Analyzing more on what I wrote above actually I am guessing now WHY the Internet is needed (my original question) and we need to go back to how to setup the proxy: the only way for Control Plane to reach the STS is through internet due to the EKS Control Plane which is AWS Managed.

And I will just add for others in case they will bump into this, I am talking specifically for CDK TS, my guts tells me that if I will do CDK Python proxy may work (who knows ?!) the proxy setup in node/js may not be as nice as in python, or is it and I am missing still something very easy here ?

@github-actions github-actions bot added p1 and removed p2 labels Sep 24, 2023
@github-actions
Copy link

This issue has received a significant amount of attention so we are automatically upgrading its priority. A member of the community will see the re-prioritization and provide an update on the issue.

@vishwanjalijadhav
Copy link

vishwanjalijadhav commented Nov 4, 2023

We are also trying to create the EKS cluster through CDK in private subnets (VPC has internet access via proxy) and we are using the enterprise proxy. But we are getting below error, if we try to use the 'placeClusterHandlerInVpc ' true and setting the proxy for cluster handler environment.
| Custom::AWSCDK-EKS-Cluster | copitoekscluster1683824D
Received response status [FAILED] from custom resource. Message returned: Error: connect ETIMEDOUT 52.94.204.134:443
at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1555:16) (RequestId: b840dd7e-0ec4-41c9-add2-848bfd523de9)

Cluster instantiation code looks like below:

const eksCluster = new eks.FargateCluster(this, 'copito-eks-cluster', {
version: eks.KubernetesVersion.V1_27,
kubectlEnvironment: {
https_proxy: 'enterprise proxy url',
},
clusterHandlerEnvironment: {
https_proxy: 'enterprise proxy url',
},
clusterHandlerSecurityGroup : proxysecurityGroup,
mastersRole : masterRole,
clusterName : props?.clusterName,
vpc: copitoDevVPC,
endpointAccess: eks.EndpointAccess.PRIVATE,
vpcSubnets: [{ subnetType: ec2.SubnetType.PRIVATE_WITH_EGRESS}],
placeClusterHandlerInVpc : true,
clusterLogging: [
eks.ClusterLoggingTypes.API,
eks.ClusterLoggingTypes.AUTHENTICATOR,
eks.ClusterLoggingTypes.SCHEDULER,
eks.ClusterLoggingTypes.CONTROLLER_MANAGER
],
});

If we don't use 'placeClusterHandlerInVpc' i.e. set it to false, then we are getting below error,

Exception: b'Unable to connect to the server: proxyconnect tcp: x509: certificate signed by unknown authority\n'
Traceback (most recent call last):
File "/var/task/index.py", line 20, in handler
return patch_handler(event, context)
File "/var/task/patch/init.py", line 50, in patch_handler
kubectl([ 'patch', resource_name, '-n', resource_namespace, '-p', patch_json, '--type', patch_type ])
File "/var/task/patch/init.py", line 66, in kubectl
raise Exception(output)

Our enterprise proxy instance shows successful connection with both EKS and STS endpoints. I believe there is no issue with proxy.

@caretak3r
Copy link

I got this partially working, by creating the necessary VPC endpoints required to get the lambdas to communicate properly without modifying any security groups. I used what @ClaudiusMZ provided above:

place_cluster_handler_in_vpc=True,  # Place the cluster handler in the VPC
cluster_handler_environment={"AWS_STS_REGIONAL_ENDPOINTS": "regional"},
kubectl_environment={"AWS_STS_REGIONAL_ENDPOINTS": "regional"},

Our VPC has two subnets that are fully private, no NAT Gateways, No IGWs, EKS API is PRIVATE, only routes are to local and VPC Gateway Endpoint for S3. I created the following VPC endpoints:

image

Seems... excessive, but I went through the woes of figuring out what the lambdas needed in order to complete setting up the cluster, and provisioning kube manifests and some helm charts. In this case, I have to host the charts and images in a private ECR in the same VPC. So far it looks possible to accomplish.

The CDK documentation leaves a lot to be desired in terms of what is being done behind the scenes in the L2 constructs I am using, but by digging into the typescript cdk codebase I was able to make sense of some of it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service effort/small Small work item – less than a day of effort feature-request A feature should be added or improved. p1 service-api This issue is due to a problem in a service API
Projects
None yet
Development

No branches or pull requests

10 participants