Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

‼️ (iam): OpenIdConnectProvider defaults to first thumbprint instead of root CA thumbprint #8607

Closed
bleish opened this issue Jun 17, 2020 · 20 comments · Fixed by #22509 or #22802
Closed
Labels
@aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service @aws-cdk/aws-iam Related to AWS Identity and Access Management bug This issue is a bug. effort/medium Medium work item – several days of effort management/tracking Issues that track a subject or multiple issues p0

Comments

@bleish
Copy link

bleish commented Jun 17, 2020

Please add your +1 👍 to let us know you have encountered this

Status: IN-PROGRESS

Overview:

The iam.OpenIdConnectProvider resource contains logic that dynamically fetches the certificate thumbprint required to create an OpenID Connect provider. However, as of now, it mistakenly fetches the leaf certificate of the provider, instead of the root one.

As long as the leaf certificate is valid, this doesn't have an impact your applications. However, once the certificate is rotated, your application will fail to use the provider to authenticate against AWS services. Since leaf certificates are rotated frequently, you are in danger of disruption. This will probably manifest in Access Denied errors.

Workaround:

If you are using the iam.OpenIdConnectProvider construct in conjunction with an EKS cluster:

import * as iam from `aws-cdk-lib/aws-iam`;
import * as eks from `aws-cdk-lib/aws-eks`;

const cluster = new eks.Cluster(this, "EKSCluster", {
  version: eks.KubernetesVersion.V1_21,
});

new iam.OpenIdConnectProvider(this, "OIDCProvider", {
  url: cluster.clusterOpenIdConnectIssuerUrl,
});

Switch to use the OpenIdConnectProvider construct from the EKS library:

import * as eks from `aws-cdk-lib/aws-eks`;

const cluster = new eks.Cluster(this, "EKSCluster", {
  version: eks.KubernetesVersion.V1_21,
});

new eks.OpenIdConnectProvider(this, "OIDCProvider", {
  url: cluster.clusterOpenIdConnectIssuerUrl,
});

The reason this works is because the eks.OpenIdConnectProvider hardcodes the correct thumbprint for EKS.

If you are using the iam.OpenIdConnectProvider in conjunction with other services, make sure you pass the thumbprint explicitly to the construct, instead of relying on its dynamic fetching capabilities.

import * as iam from `aws-cdk-lib/aws-iam`;

new iam.OpenIdConnectProvider(this, "OIDCProvider", {
  url: cluster.clusterOpenIdConnectIssuerUrl,
    clientIds: ["sts.amazonaws.com"],
    thumbprints: ['<pass-thumbprint-here>'],
});

To obtain the correct thumbprint for your provider, follow these instructions.

Solution:

We are working on a fix to the iam.OpenIdConnectProvider construct so that if correctly fetches the root certificate thumbprint. See PR. Once it is merged, the fix will be available in the following CDK release, at which point a simple deployment will fix the issue in your environment.


Originally reported as

When deploying an OpenIdConnectProvider construct using the oidc issuer url retrieved from an EKS cluster (the domain is oidc.eks.us-west-2.amazonaws.com) and no value for the thumbprints property, the resulting auto-obtained thumbprint doesn't match the one I get from following the steps provided here.

Reproduction Steps

const cluster = new eks.Cluster(this, "EKSCluster");

new iam.OpenIdConnectProvider(this, "EKSOIDCProvider", {
    url: cluster.clusterOpenIdConnectIssuerUrl,
    clientIds: ["sts.amazonaws.com"]
});

Error Log

See Other for a related error.

Environment

  • CLI Version : 1.45.0
  • Framework Version:
  • Node.js Version: 14.4.0
  • OS : Windows 10 (10.0.18363 Build 18363)
  • Language (Version): TypeScript (3.7.2)

Other

If I try to deploy a cluster autoscaler to my EKS cluster using a service account role tied to that provider, the pod enters a CrashLoopBackOff state with the error message:

F0617 20:04:12.561996 1 aws_cloud_provider.go:376] Failed to create AWS Manager: WebIdentityErr: failed to retrieve credentials
caused by: InvalidIdentityToken: OpenIDConnect provider's HTTPS certificate doesn't match configured thumbprint
status code: 400, request id: 00ec5e59-3672-4264-b5d4-c44f573ff50c

If I instead follow the guide to retrieve the correct thumbprint via openssl and provide that to the OpenIdConnectProvider construct, the cluster autoscaler successfully deploys.


This is 🐛 Bug Report

@bleish bleish added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Jun 17, 2020
@SomayaB SomayaB added @aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service @aws-cdk/aws-iam Related to AWS Identity and Access Management labels Jun 18, 2020
@rix0rrr rix0rrr assigned eladb and unassigned rix0rrr Jun 19, 2020
@sogos
Copy link

sogos commented Jun 20, 2020

Same problem observed in CLI Version: 1.46.0

@eladb eladb added p1 and removed @aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service labels Jun 22, 2020
@eladb
Copy link
Contributor

eladb commented Jun 24, 2020

@bleish @sogos can you use cluster.addServiceAccount() instead of explicitly creating the OIDC provider (see docs)?

@eladb eladb added this to the EKS Developer Preview milestone Jun 24, 2020
@eladb eladb removed needs-triage This issue or PR still needs to be triaged. p1 labels Jun 24, 2020
@eladb eladb removed this from the EKS Developer Preview milestone Jun 24, 2020
@sogos
Copy link

sogos commented Jun 24, 2020

Hello @eladb ,

That seem to work perfectly 👍

Tested with alb-ingress-controller:
https://gitlab.com/thibault.cordier1/cdk-eks-fargate/-/blob/master/lib/eks-stack.ts#L68

But i don't understand why the thumbprint on the identity provider is false.
I use the command:

openssl s_client -servername oidc.eks.eu-west-1.amazonaws.com -showcerts -connect oidc.eks.eu-west-1.amazonaws.com:443 | openssl x509  -fingerprint -noout

@eladb
Copy link
Contributor

eladb commented Jun 24, 2020

Glad to hear that addServiceAccount() worked for you. We need to investigate why the default behavior of the iam.OpenIdConnectProvider resource is unable to extract the correct thumbprint from EKS. The current implementation in @aws-cdk/aws-eks actually hard codes the thumbprint (ref).

@eladb
Copy link
Contributor

eladb commented Jun 24, 2020

Our EKS code currently has the following comment:

For some reason EKS isn't validating the root certificate but a intermediat certificate
which is one level up in the tree. Because of the a constant thumbprint value has to be
stated with this OpenID Connect provider. The certificate thumbprint is the same for all the regions.

thumbprints: [ '9e99a48a9960b14926bb7f3b02e22da2b0ab7280' ],

@eladb eladb changed the title OpenIdConnectProvider obtains incorrect thumbprint when an explicit one isn't provided OpenIdConnectProvider obtains incorrect thumbprint for EKS Jun 24, 2020
@eladb eladb assigned rix0rrr and unassigned eladb Jun 24, 2020
@eladb eladb added the p2 label Jun 24, 2020
@eladb
Copy link
Contributor

eladb commented Jun 24, 2020

Routing to IAM

@bleish
Copy link
Author

bleish commented Jun 24, 2020

@eladb @sogos

It looks like the lambda is using the tlsSocket.getPeerCertificate method to retrieve the intermediate certificate. By default, that only retrieves the first certificate in the chain. If you pass true to the method it'll return the full chain, linked by the issuerCertificate property. You can follow the chain until you reach the root CA, after which issuerCertificate becomes a circular reference (self-signed). I just tried this and was able to get the proper certificate and thumbprint. In my case (issuer host = oidc.eks.us-west-2.amazonaws.com) the chain was:

*.execute-api.us-west-2.amazonaws.com -> Amazon -> Amazon Root CA 1 -> Starfield Services Root Certificate Authority - G2

The last giving the expected thumbprint of 9e99a48a9960b14926bb7f3b02e22da2b0ab7280.

This is an issue beyond just retrieving the EKS issuer thumbprint. For any issuer, if the first certificate in the chain is self-signed, it'll be the root, but in any other case the intermediate is retrieved instead.

The Node.js TLS documentation for getPeerCertificate gives more info that could be useful in implementing this.

@eladb
Copy link
Contributor

eladb commented Jun 24, 2020

@bleish wow thanks so much for the detailed explanation. Would you say the preferred behavior for the OIDC provider resource is to always use the root CA thumbprint?

@eladb eladb changed the title OpenIdConnectProvider obtains incorrect thumbprint for EKS OpenIdConnectProvider uses first thumbprint instead of root Jun 24, 2020
@eladb eladb changed the title OpenIdConnectProvider uses first thumbprint instead of root OpenIdConnectProvider defaults to first thumbprint instead of root CA thumbprint Jun 24, 2020
@bleish
Copy link
Author

bleish commented Jun 24, 2020

@eladb
It seems so. According to their documentation:

When you create an OpenID Connect (OIDC) identity provider in IAM, you must supply a thumbprint. IAM requires the thumbprint for the root certificate authority (CA) that signed the certificate used by the external identity provider (IdP). The thumbprint is a signature for the CA's certificate that was used to issue the certificate for the OIDC-compatible IdP. When you create an IAM OIDC identity provider, you are trusting identities authenticated by that IdP to have access to your AWS account.

One gotcha I ran into when testing the getPeerCertificate(true) method was that the certificate chain contained one more certificate beyond the root CA. When following the openssl steps from the documentation above, I get a chain of 4 certificates, and the last certificate in that chain (Starfield Services Root Certificate Authority - G2) returns the expected 9e99a48a9960b14926bb7f3b02e22da2b0ab7280 thumbprint. However, using getPeerCertificate(true) I get one more certificate beyond that, but it has no Common Name value and is self-signed (so issuerCertificate is circular at that point). I don't know what this certificate is, as the openssl method didn't contain it. Maybe self-signed certs aren't included in the openssl chain? Someone more knowledgable in this area than I am might understand what's going on.

@bleish
Copy link
Author

bleish commented Jun 25, 2020

It looks like openssl s_client doesn't always return the full certificate chain. The documentation for the -showcerts flag on s_client states:

Displays the server certificate list as sent by the server: it only consists of certificates the server has sent (in the order the server has sent them). It is not a verified chain.

Using getPeerCertificate(true) does appear to return the full chain, so it seems the Open ID Connect Identity Provider doesn't necessarily expect the absolute root of the chain, but the last certificate provided by the server (Starfield Services Root Certificate Authority - G2 in this case). If that's true, then I'm not sure how, using the tls module, to ensure the correct thumbprint is being obtained.

@chrisjgray
Copy link

My organization is also having this same issue. Can this get prioritized?

@comcalvi
Copy link
Contributor

We're working on a fix for this issue right now. If you can't upgrade to the CDK version that contains the fix, you can workaround this issue on an old version by changing any iam.OpenIdConnectProviders that use an EKS Cluster's clusterOpenIdConnectIssuerUrl to use the OpenIdConnectProvider from aws-eks like so:

import * as iam from `aws-cdk-lib/aws-iam`;
import * as eks from `aws-cdk-lib/aws-eks`;

const cluster = new eks.Cluster(this, "EKSCluster", {
  version: eks.KubernetesVersion.V1_21,
});

new eks.OpenIdConnectProvider(this, "OIDCProvider", {
  url: cluster.clusterOpenIdConnectIssuerUrl,
});

Or, if you prefer to use aws-iam, you can hardcode the thumbprint to that of the root certificate (which is exactly what the aws-eks module does), but be aware that EKS will eventually rotate the root certificate as well:

import * as iam from `aws-cdk-lib/aws-iam`;
import * as eks from `aws-cdk-lib/aws-eks`;

const cluster = new eks.Cluster(this, "EKSCluster", {
  version: eks.KubernetesVersion.V1_21,
});

new iam.OpenIdConnectProvider(this, "OIDCProvider", {
  url: cluster.clusterOpenIdConnectIssuerUrl,
    clientIds: ["sts.amazonaws.com"],
    thumbprints: ['9e99a48a9960b14926bb7f3b02e22da2b0ab7280'],
});

vinayak-kukreja added a commit to cdklabs/aws-cdk-notices that referenced this issue Oct 31, 2022
Signed-off-by: Vinayak Kukreja <vinakuk@amazon.com>
@mergify mergify bot closed this as completed in #22509 Nov 4, 2022
mergify bot pushed a commit that referenced this issue Nov 4, 2022
…rtificate (#22509)

Currently, the IAM OIDC Provider is retrieving leaf certificates for a given url. The validity for these certificates is not that long. This can cause an outage for the customer since they might not be aware of when the certificate is going to expire. We have seen an [outage](#8607) in EKS due to this issue. 

This change will help retrieving root certificates instead of leaf certificates. The validity of root certificate is much more than the leaf certificates. I am also adding validations for the certificate and also informing the customer if there retrieved certificate is going to expire within six months when they do a new deployment.

Fixes #8607

Signed-off-by: Vinayak Kukreja <vinakuk@amazon.com>

### All Submissions:

* [x] Have you followed the guidelines in our [Contributing guide?](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md)

### Adding new Unconventional Dependencies:

* [ ] This PR adds new unconventional dependencies following the process described [here](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md/#adding-new-unconventional-dependencies)

### New Features

* [x] Have you added the new feature to an [integration test](https://github.com/aws/aws-cdk/blob/main/INTEGRATION_TESTS.md)?
	* [x] Did you use `yarn integ` to deploy the infrastructure and generate the snapshot (i.e. `yarn integ` without `--dry-run`)?

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
@github-actions
Copy link

github-actions bot commented Nov 4, 2022

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

Naumel pushed a commit that referenced this issue Nov 4, 2022
…rtificate (#22509)

Currently, the IAM OIDC Provider is retrieving leaf certificates for a given url. The validity for these certificates is not that long. This can cause an outage for the customer since they might not be aware of when the certificate is going to expire. We have seen an [outage](#8607) in EKS due to this issue. 

This change will help retrieving root certificates instead of leaf certificates. The validity of root certificate is much more than the leaf certificates. I am also adding validations for the certificate and also informing the customer if there retrieved certificate is going to expire within six months when they do a new deployment.

Fixes #8607

Signed-off-by: Vinayak Kukreja <vinakuk@amazon.com>

### All Submissions:

* [x] Have you followed the guidelines in our [Contributing guide?](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md)

### Adding new Unconventional Dependencies:

* [ ] This PR adds new unconventional dependencies following the process described [here](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md/#adding-new-unconventional-dependencies)

### New Features

* [x] Have you added the new feature to an [integration test](https://github.com/aws/aws-cdk/blob/main/INTEGRATION_TESTS.md)?
	* [x] Did you use `yarn integ` to deploy the infrastructure and generate the snapshot (i.e. `yarn integ` without `--dry-run`)?

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
@iliapolo iliapolo pinned this issue Nov 6, 2022
@iliapolo iliapolo added p0 and removed p2 labels Nov 6, 2022
@iliapolo
Copy link
Contributor

iliapolo commented Nov 6, 2022

Reopening. See #22802

@iliapolo iliapolo reopened this Nov 6, 2022
@iliapolo iliapolo changed the title OpenIdConnectProvider defaults to first thumbprint instead of root CA thumbprint ‼️ (iam): OpenIdConnectProvider defaults to first thumbprint instead of root CA thumbprint Nov 6, 2022
@iliapolo iliapolo added management/tracking Issues that track a subject or multiple issues @aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service labels Nov 8, 2022
mergify bot pushed a commit to cdklabs/aws-cdk-notices that referenced this issue Nov 9, 2022
Adding a notice for IAM/EKS OIDC Issue where the OIDC provider currently is retrieving short lived leaf certificates instead of root certificates which have a longer expiration date.
@mergify mergify bot closed this as completed in #22802 Nov 12, 2022
mergify bot pushed a commit that referenced this issue Nov 12, 2022
…f root (#22802)

Currently, the implementation of the OIDC provider custom resource fetches the certificate chain using:

https://github.com/aws/aws-cdk/blob/9bde9f3149cbfa6e7b97204f54e7cef5c9127971/packages/%40aws-cdk/aws-iam/lib/oidc-provider/external.ts#L40

It then extracts the root certificate by detecting a circular reference in the `cert.issuerCertificate` property.

https://github.com/aws/aws-cdk/blob/9bde9f3149cbfa6e7b97204f54e7cef5c9127971/packages/%40aws-cdk/aws-iam/lib/oidc-provider/external.ts#L46

As it turns out, this results in the wrong certificate being extracted. I observed this while running an [EKS integration test](https://github.com/aws/aws-cdk/blob/main/packages/%40aws-cdk/aws-eks/test/integ.eks-service-account-sdk-call.ts).

The current certificate thumbprint that is extracted is: `AD7E1C28B064EF8F6003402014C3D0E3370EB58A`.
While the expected thumbprint is: `9E99A48A9960B14926BB7F3B02E22DA2B0AB7280`. (this is the value we used to [hardcode](https://github.com/aws/aws-cdk/blob/v2.50.0/packages/%40aws-cdk/aws-eks/lib/oidc-provider.ts#L49))

The [recommended way](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_providers_create_oidc_verify-thumbprint.html) for extracting the correct thumbprint according to AWS is using `openssl s_client -showcerts`. When I tried this, I did in fact see that the last certificate returned by this command has the correct thumbprint. 

During investigation, I noticed that `socket.getPeerCertificate(true)` returns another additional certificate, acting as the root one. This is aligned with what the [comment](#8607 (comment)) made in the original issue. This additional certificate is not the correct one, and we should be using the one just before it in the chain. There is however no way to detect that "second to last" certificate in this way, because it doesn't resolve to a circular reference. 

After some digging, I switched the implementation to use `socket.getPeerX509Certificate()`, a new method that only exists in Node16. This method skips over the incorrect certificate, and results in the correct thumbprint. 

<img width="539" alt="Screen Shot 2022-11-06 at 10 04 51 PM" src="https://user-images.githubusercontent.com/1428812/200195623-6735377b-a82f-472f-884d-7bec450c32c6.png">

Fixes #8607

----

### All Submissions:

* [X] Have you followed the guidelines in our [Contributing guide?](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md)

### Adding new Unconventional Dependencies:

* [ ] This PR adds new unconventional dependencies following the process described [here](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md/#adding-new-unconventional-dependencies)

### New Features

* [ ] Have you added the new feature to an [integration test](https://github.com/aws/aws-cdk/blob/main/INTEGRATION_TESTS.md)?
	* [ ] Did you use `yarn integ` to deploy the infrastructure and generate the snapshot (i.e. `yarn integ` without `--dry-run`)?

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
@github-actions
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

iliapolo added a commit that referenced this issue Nov 14, 2022
…f root (#22802)

Currently, the implementation of the OIDC provider custom resource fetches the certificate chain using:

https://github.com/aws/aws-cdk/blob/9bde9f3149cbfa6e7b97204f54e7cef5c9127971/packages/%40aws-cdk/aws-iam/lib/oidc-provider/external.ts#L40

It then extracts the root certificate by detecting a circular reference in the `cert.issuerCertificate` property.

https://github.com/aws/aws-cdk/blob/9bde9f3149cbfa6e7b97204f54e7cef5c9127971/packages/%40aws-cdk/aws-iam/lib/oidc-provider/external.ts#L46

As it turns out, this results in the wrong certificate being extracted. I observed this while running an [EKS integration test](https://github.com/aws/aws-cdk/blob/main/packages/%40aws-cdk/aws-eks/test/integ.eks-service-account-sdk-call.ts).

The current certificate thumbprint that is extracted is: `AD7E1C28B064EF8F6003402014C3D0E3370EB58A`.
While the expected thumbprint is: `9E99A48A9960B14926BB7F3B02E22DA2B0AB7280`. (this is the value we used to [hardcode](https://github.com/aws/aws-cdk/blob/v2.50.0/packages/%40aws-cdk/aws-eks/lib/oidc-provider.ts#L49))

The [recommended way](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_providers_create_oidc_verify-thumbprint.html) for extracting the correct thumbprint according to AWS is using `openssl s_client -showcerts`. When I tried this, I did in fact see that the last certificate returned by this command has the correct thumbprint.

During investigation, I noticed that `socket.getPeerCertificate(true)` returns another additional certificate, acting as the root one. This is aligned with what the [comment](#8607 (comment)) made in the original issue. This additional certificate is not the correct one, and we should be using the one just before it in the chain. There is however no way to detect that "second to last" certificate in this way, because it doesn't resolve to a circular reference.

After some digging, I switched the implementation to use `socket.getPeerX509Certificate()`, a new method that only exists in Node16. This method skips over the incorrect certificate, and results in the correct thumbprint.

<img width="539" alt="Screen Shot 2022-11-06 at 10 04 51 PM" src="https://user-images.githubusercontent.com/1428812/200195623-6735377b-a82f-472f-884d-7bec450c32c6.png">

Fixes #8607

----

* [X] Have you followed the guidelines in our [Contributing guide?](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md)

* [ ] This PR adds new unconventional dependencies following the process described [here](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md/#adding-new-unconventional-dependencies)

* [ ] Have you added the new feature to an [integration test](https://github.com/aws/aws-cdk/blob/main/INTEGRATION_TESTS.md)?
	* [ ] Did you use `yarn integ` to deploy the infrastructure and generate the snapshot (i.e. `yarn integ` without `--dry-run`)?

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
@drankard
Copy link

if this is Closed, shouldn't notice 8607 be removed ? or do i still have to acknowledge it even thoug its fixed by a release?

@mrgrain
Copy link
Contributor

mrgrain commented Dec 2, 2022

if this is Closed, shouldn't notice 8607 be removed ? or do i still have to acknowledge it even thoug its fixed by a release?

Apologies for this. We had an issue that wrongly showed to notice for non affected versions. This should be fixed now.

@fadinasr
Copy link

fadinasr commented Dec 8, 2022

@mrgrain I have CDK v2.51.1 (build 3d30cdb) and I still got the notice also. I'm currently working on configuring Github actions OpenIdConnectProviders to configure a role for GitHub actions

@mrgrain
Copy link
Contributor

mrgrain commented Dec 9, 2022

@fadinasr Thanks for letting me know. I can't reproduce this.
We do cache the notice data, and you might have had a version locally with the wrong version ranges. Maybe that's why. Although the cache expires rather quickly.
image
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service @aws-cdk/aws-iam Related to AWS Identity and Access Management bug This issue is a bug. effort/medium Medium work item – several days of effort management/tracking Issues that track a subject or multiple issues p0
Projects
None yet