Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds support for AWS clients to use EKS OIDC web identity tokens when running in a cluster with IRSA #295

Conversation

VariableExp0rt
Copy link
Contributor

What type of PR is this?

/kind feature

Any specific area of the project related to this PR?

/area outputs

What this PR does / why we need it:

When running falcosidekick on an EKS cluster that is configured to use IAM Roles for Service Accounts (IRSA) as explained here, instead of using long term credentials, falcosidekick fails to initialise AWS outputs because it can't find valid credentials. Specifically in our case, we have disabled the ability for the pods to be able to hit the IMDSV2 to obtain node credentials too.

When IRSA is configured with a service account annotation, EKS injects two environment variables AWS_ROLE_NAME and AWS_WEB_IDENTITY_TOKEN_FILE into the container, pointing at an OIDC token file for which it can exchange to assume a role (taking into account any other conditions). Due to the way that this currently works, it is required to refresh these credentials, which is why I have added a new field for an STS WebIdentityProvider, and conditionally checked this, overwriting c.AWSSession where necessary. I've had an initial go at solving this problem, and am looking for some feedback on this PR (I'm a Go hobbyist at best!).

This is my first contribution to the project :) Hopefully I'm not too far off the mark with this, and that there is appetite to see this implemented in the kubernetes community.

Fixes #

Special notes for your reviewer:

@poiana
Copy link

poiana commented Dec 7, 2021

Welcome @VariableExp0rt! It looks like this is your first PR to falcosecurity/falcosidekick 🎉

@poiana poiana added the size/L label Dec 7, 2021
@poiana poiana requested review from cpanato and nibalizer December 7, 2021 10:35
@Issif Issif added this to the 2.25.0 milestone Dec 7, 2021
… running in a cluster

Signed-off-by: Liam Baker <liam.baker@starlingbank.com>
@VariableExp0rt VariableExp0rt force-pushed the feature/eks_oidc_iam_service_account_assume branch from 13d2180 to f7a528f Compare December 7, 2021 10:50
@Issif
Copy link
Member

Issif commented Dec 7, 2021

Hello,

Thank you for your PR, it's very valuable and that's a use-case we noticed too.

There's another PR about changing the auth method for access to AWS IAM, we need to check if they are compatible each other.

@VariableExp0rt
Copy link
Contributor Author

Hello,

Thank you for your PR, it's very valuable and that's a use-case we noticed too.

There's another PR about changing the auth method for access to AWS IAM, we need to check if they are compatible each other.

Absolutely, no problem at all!

@Issif
Copy link
Member

Issif commented Dec 7, 2021

Hello,
Thank you for your PR, it's very valuable and that's a use-case we noticed too.
There's another PR about changing the auth method for access to AWS IAM, we need to check if they are compatible each other.

Absolutely, no problem at all!

I reviewed the other PR quickly, and it appears, it does (and even more) same thing than your PR:
image

cc @jonahjon

@VariableExp0rt
Copy link
Contributor Author

Hello,
Thank you for your PR, it's very valuable and that's a use-case we noticed too.
There's another PR about changing the auth method for access to AWS IAM, we need to check if they are compatible each other.

Absolutely, no problem at all!

I reviewed the other PR quickly, and it appears, it does (and even more) same thing than your PR: image

cc @jonahjon

I don't think I'd be able to use the technique that the other PR adds, because our network policies in kubernetes don't allow us to query instance metadata (since we don't actually need it) :( (which is called out in the EKS security best practices here )

@VariableExp0rt
Copy link
Contributor Author

I'd be more than happy to add the metadata lookup to this PR, which then covers the same use cases? https://github.com/falcosecurity/falcosidekick/pull/220/files#diff-40835b8017b541aae1eb78a1396a0489b7a8b7ace95e03ec88967ba6f4534e0bR350

  1. Keys
  2. IRSA
  3. Metadata (default node role)

@VariableExp0rt
Copy link
Contributor Author

@Issif I am coming at this as a security engineer (maybe selfishly so), but we've explicitly decided not to make Instance Metadata available for applications running in pods to use because;

  1. If the App get's popped, it might be able to leverage the node role to perform privilege escalation if IMDSV2 is available
  2. Defaulting to the node role would require additions to the IAM policy associated with the instance profile, which would not be operating within the principle of least privilege

That was the motivation behind making this a config option.

@Issif
Copy link
Member

Issif commented Dec 7, 2021

Maybe I'm wrong, but metadata are only called for retrieving the region, I don't know if it's still relevant when we use

@Issif I am coming at this as a security engineer (maybe selfishly so), but we've explicitly decided not to make Instance Metadata available for applications running in pods to use because;

1. If the App get's popped, it might be able to leverage the node role to perform privilege escalation if IMDSV2 is available

2. Defaulting to the node role would require additions to the IAM policy associated with the instance profile, which would not be operating within the principle of least privilege

That was the motivation behind making this a config option.

I agree, but it seems metadata are only used for getting the region, so the auth with OIDC should work. The example I pasted shows the usage of eks annotation eg.

@VariableExp0rt
Copy link
Contributor Author

Maybe I'm wrong, but metadata are only called for retrieving the region, I don't know if it's still relevant when we use

@Issif I am coming at this as a security engineer (maybe selfishly so), but we've explicitly decided not to make Instance Metadata available for applications running in pods to use because;

1. If the App get's popped, it might be able to leverage the node role to perform privilege escalation if IMDSV2 is available

2. Defaulting to the node role would require additions to the IAM policy associated with the instance profile, which would not be operating within the principle of least privilege

That was the motivation behind making this a config option.

I agree, but it seems metadata are only used for getting the region, so the auth with OIDC should work. The example I pasted shows the usage of eks annotation eg.

The OIDC token should already contain all the information of the role it's trying to assume (the EKS annotation takes care of that) , I'm not sure I fully understand the need to get the region? ( I could be missing something)

In any case, we're not even able to query the metadata service to get the region (because network policies do no allow it). As I mentioned, I'd be happy to add that though as the last lookup for the credentials (in the event anyone would want to use it), even if we're not using it (preventative controls are in place anyway). Is the region not already a required field at the top level of the config?

@Issif
Copy link
Member

Issif commented Dec 7, 2021

You're right aws.region seems mandatory, I didn't remember that (and I wrote in readme that's optionnal, my bad). I reached @jonahjon on Slack to have his point of view too. Except the aws.region setting, both of your solutions manage credentials with OIDC through annotation.

If you feel confident, I'm ok you integrate in your PR, the 3 methods for credentials:

  • OIDC
  • IRSA
  • Access Keys

For IRSA and Keys, we can either set aws.region either use the Metadata method to retrieve the intel.

WDYT?

cc @jonahjon

@jonahjon
Copy link

jonahjon commented Dec 7, 2021

There are really 3 methods to get AWS access

  1. IRSA (Using OIDC)
  2. Keys
  3. Node Role

If this PR can work by checking for 1 > 2 > 3 before giving up and panicing, then it's accomplishing what I set out in my original PR.

In the example where IRSA is not included we don't want this change to have any impact on existing customers though, so making sure it's still compatible with method 2, and 3 are important.

@VariableExp0rt
Copy link
Contributor Author

Thanks @Issif and @jonahjon, I'll add the Metadata retrieval which will be the last thing it will try in order to configure credentials before panicing. As far as the bits in here https://github.com/falcosecurity/falcosidekick/pull/295/files#diff-40835b8017b541aae1eb78a1396a0489b7a8b7ace95e03ec88967ba6f4534e0bR181-R186 go, I saw somewhere in the AWS documentation that the refreshing of the token when it expires also needed to be handled, which is what what check does.

I don't have immediate access to an EKS cluster to test this though.

As long as you are both happy I'll try and get this done by tomorrow?

Thanks again @Issif and @jonahjon!

@Issif
Copy link
Member

Issif commented Dec 7, 2021

@VariableExp0rt thank you, and no rush, I've still some PR to review, and I will not be ready to release 2.25.0 before 2022 I think.

@jonahjon
Copy link

jonahjon commented Dec 7, 2021

Thanks @Issif and @jonahjon, I'll add the Metadata retrieval which will be the last thing it will try in order to configure credentials before panicing. As far as the bits in here https://github.com/falcosecurity/falcosidekick/pull/295/files#diff-40835b8017b541aae1eb78a1396a0489b7a8b7ace95e03ec88967ba6f4534e0bR181-R186 go, I saw somewhere in the AWS documentation that the refreshing of the token when it expires also needed to be handled, which is what what check does.

I don't have immediate access to an EKS cluster to test this though.

As long as you are both happy I'll try and get this done by tomorrow?

Thanks again @Issif and @jonahjon!

Awesome, the metadata will be a nice addition!

The kubelet handles IRSA token refresh when the TTL is past 80% duration, so the only scenario where this is needed is one without the Kubelet running. I believe prior to 1.12 manual refresh had to be handled by user though.

@VariableExp0rt
Copy link
Contributor Author

Thanks @Issif and @jonahjon, I'll add the Metadata retrieval which will be the last thing it will try in order to configure credentials before panicing. As far as the bits in here https://github.com/falcosecurity/falcosidekick/pull/295/files#diff-40835b8017b541aae1eb78a1396a0489b7a8b7ace95e03ec88967ba6f4534e0bR181-R186 go, I saw somewhere in the AWS documentation that the refreshing of the token when it expires also needed to be handled, which is what what check does.
I don't have immediate access to an EKS cluster to test this though.
As long as you are both happy I'll try and get this done by tomorrow?
Thanks again @Issif and @jonahjon!

Awesome, the metadata will be a nice addition!

The kubelet handles IRSA token refresh when the TTL is past 80% duration, so the only scenario where this is needed is one without the Kubelet running. I believe prior to 1.12 manual refresh had to be handled by user though.

Awesome, if the refresh is handled automatically I will remove those extra checks and streamline it a little more 👍

Signed-off-by: Liam Baker <liam.baker@starlingbank.com>
@poiana poiana added size/M and removed size/L labels Dec 7, 2021
Liam Baker added 4 commits December 7, 2021 17:34
Signed-off-by: Liam Baker <liam.baker@starlingbank.com>
Signed-off-by: Liam Baker <liam.baker@starlingbank.com>
Signed-off-by: Liam Baker <liam.baker@starlingbank.com>
Signed-off-by: Liam Baker <liam.baker@starlingbank.com>
@VariableExp0rt
Copy link
Contributor Author

@Issif I think I'm there with the logic, I've removed all the config bits, instead opting to keep this fairly opaque so that users don't have to change anything in their config files. The bit I missed actually was that with AWS_STS_REGIONAL_ENDPOINTS env var set to true everything is handled for you, otherwise the credential provider is configured. Now we catch both. In the next few days I'll try and get some unit tests in for this :) For clarity;

Checks in the following order;

  1. IRSA -
    1.1. Checks if regional endpoints env var is false, configures the session with the WebIdentityRoleProvider accordingly
  2. Runs through the normal credential provider chain
    2.1. Keys
    2.2. Files (shared config)
    2.3. IRSA with AWS_STS_REGIONAL_ENDPOINTS set to true
    2.4. Instance metadata

Let me know what you think of the ordering, which I can change if needed!

Liam Baker added 2 commits December 9, 2021 11:39
…oints false

Signed-off-by: Liam Baker <liam.baker@starlingbank.com>
Signed-off-by: Liam Baker <liam.baker@starlingbank.com>
@Issif
Copy link
Member

Issif commented Dec 10, 2021

@Issif I think I'm there with the logic, I've removed all the config bits, instead opting to keep this fairly opaque so that users don't have to change anything in their config files. The bit I missed actually was that with AWS_STS_REGIONAL_ENDPOINTS env var set to true everything is handled for you, otherwise the credential provider is configured. Now we catch both. In the next few days I'll try and get some unit tests in for this :) For clarity;

Checks in the following order;

1. IRSA -
   1.1. Checks if regional endpoints env var is false, configures the session with the WebIdentityRoleProvider accordingly

2. Runs through the normal credential provider chain
   2.1. Keys
   2.2. Files (shared config)
   2.3. IRSA with `AWS_STS_REGIONAL_ENDPOINTS` set to `true`
   2.4. Instance metadata

Let me know what you think of the ordering, which I can change if needed!

Seems good to me, thank you a lot. I don't have time for reviewing right now, I will next week.

Copy link
Member

@Issif Issif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

Thank you a lot

@poiana
Copy link

poiana commented Dec 23, 2021

LGTM label has been added.

Git tree hash: 489e8623cf824c1e96912e3cfa5fb296b69edf5b

@poiana
Copy link

poiana commented Dec 23, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Issif, VariableExp0rt

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@poiana poiana merged commit 10ffedc into falcosecurity:master Dec 23, 2021
@Issif Issif mentioned this pull request May 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants