Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restrict access to EC2 metadata #22826

Merged

Conversation

danwinship
Copy link
Contributor

@danwinship danwinship commented May 13, 2019

Block access to the EC2 metadata service from (non-hostNetwork) pods.

The EC2 metadata IP is actually in the IPv4 link-local address range, so technically it's incorrect for us to be forwarding packets addressed to that IP range out of the SDN anyway. So in addition to the iptables rules, this also adjusts the OVS flows to drop all packets destined for that range. (The iptables rules are still needed to block output via egress-router and multus interfaces.)

So, there are three parts:

  • iptables rules to block access just to 169.254.169.254, for security
  • OVS flows to block access to all of 169.254.0.0/16, for correctness
  • RestrictedEndpointsAdmission changes to block creating services pointing to all of 169.254.0.0/16, for security and correctness

@openshift-ci-robot openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels May 13, 2019
@openshift-ci-robot openshift-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels May 13, 2019
@squeed
Copy link
Contributor

squeed commented May 14, 2019

The ingress operator fails to come up, thus blocking all other operators that depend on routes:

2019-05-14T00:19:03.754Z ERROR operator.entrypoint ingress-operator/main.go:93 failed to create DNS manager {"error": "failed to create AWS DNS manager: couldn't get region from metadata: RequestError: send request failed\ncaused by: Get http://169.254.169.254/latest/meta-data/placement/availability-zone: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)"}

@danwinship
Copy link
Contributor Author

The ingress operator fails to come up, thus blocking all other operators that depend on routes:

yes, i had tried to hack the sdn to special-case the openshift-ingress-operator namespace to work around this, but my first attempt got it wrong

@cgwalters
Copy link
Member

Looks like this is passing CI now. How do you forsee the "production" version of this being different?

One offhand comment, if the OVS action drop means just that, it might be nicer (if possible) to do the iptables equivalent of reject-with icmp-port-unreachable so that a CI test could verify this without needing to wait for a timeout.

@danwinship
Copy link
Contributor Author

Huh. OK, so:

  1. We need to figure out what to do with the ingress operator. Either make it gratuitously hostNetwork, or have some other way to special-case it. Actually, it's probably not even "gratuitous"; 169.254.0.0/16 is the IPv4 link-local address range, so it's technically incorrect to be routing traffic for those IPs from the SDN onto the physical network anyway... It makes sense to say that only hostNetwork pods should be able to access that IP.
  2. This being just a quick hack to test things out, I didn't have to worry about "attackers"; I could assume that any pod trying to access the metadata IP was just going to try to access it directly. For a real version we need to think more about all possible avenues of attack like with the MCS patch.
  3. OVS can't do ICMP rejects, but if we end up implementing it in iptables instead, we could

@danwinship
Copy link
Contributor Author

Filed openshift/cluster-ingress-operator#235 about making the ingress operator hostNetwork

@openshift-ci-robot openshift-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels May 14, 2019
@danwinship danwinship changed the title WIP: restrict access to EC2 metadata Restrict access to EC2 metadata May 14, 2019
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 14, 2019
@danwinship
Copy link
Contributor Author

Updated, repushed, and updated the initial comment to reflect the new commit.

This will fail e2e-aws until cluster-ingress-operator is updated.

ironcladlou added a commit to ironcladlou/cluster-ingress-operator that referenced this pull request May 14, 2019
Access to ec2 metadata will soon be restricted
(openshift/origin#22826). Eliminate the ec2 metadata
dependency by discovering AWS region information from cluster config. This
commit uses the deprecated install config for metatadata; once
openshift/installer#1725 merges, supported cluster
config will provide the region information and the code can be refactored.
ironcladlou added a commit to ironcladlou/cluster-ingress-operator that referenced this pull request May 14, 2019
Access to ec2 metadata will soon be restricted
(openshift/origin#22826). Eliminate the ec2 metadata
dependency by discovering AWS region information from cluster config. This
commit uses the deprecated install config for metatadata; once
openshift/installer#1725 merges, supported cluster
config will provide the region information and the code can be refactored.
@ironcladlou
Copy link
Contributor

openshift/cluster-ingress-operator#238 eliminates ingress operator's dependency on ec2 metadata, hope this helps!

openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/cluster-ingress-operator that referenced this pull request May 15, 2019
Access to ec2 metadata will soon be restricted
(openshift/origin#22826). Eliminate the ec2 metadata
dependency by discovering AWS region information from cluster config. This
commit uses the deprecated install config for metatadata; once
openshift/installer#1725 merges, supported cluster
config will provide the region information and the code can be refactored.
@knobunc
Copy link
Contributor

knobunc commented May 15, 2019

/retest

@cgwalters
Copy link
Member

(Not a network person but) seems sane to me!

@cgwalters
Copy link
Member

(Also, CI is passing now)

@squeed
Copy link
Contributor

squeed commented May 16, 2019

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label May 16, 2019
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danwinship, squeed

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@squeed
Copy link
Contributor

squeed commented May 16, 2019

/cherrypick release-4.1

@openshift-cherrypick-robot

@squeed: once the present PR merges, I will cherry-pick it on top of release-4.1 in a new PR and assign it to you.

In response to this:

/cherrypick release-4.1

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@squeed
Copy link
Contributor

squeed commented May 16, 2019

A quick glance at the ci failure, it seems that prometheus was oomkilled...

@squeed
Copy link
Contributor

squeed commented May 16, 2019

/retest

@openshift-cherrypick-robot

@squeed: new pull request created: #22849

In response to this:

/cherrypick release-4.1

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@danwinship danwinship deleted the anti-ec2-metadata-hackery branch May 29, 2019 13:23
@paleg
Copy link

paleg commented Jun 10, 2019

Hey guys, is it possible to configure that behavior (enable for given pods) or disable completely? I understand your security concerns, but I need to assign AWS IAM roles to pods via EC2 metadata (and I accept all possible risks), currently in 3.11 I use kiam/kube2iam, but with 4.1 it is not possible anymore. What is your recommendation for such cases when pod needs access to metadata? I can't use direct assume roles from pods as it involves application code changes.

@cgwalters
Copy link
Member

Hey guys, is it possible to configure that behavior (enable for given pods)

Probably simplest is to skip those iptables rules for privileged pods.

@danwinship
Copy link
Contributor Author

OK, so it looks like kube2iam and kiam work by having a hostNetwork pod that acts as a proxy, and then adding iptables rules so that when pods try to connect to the metadata IP, they reach the hostNetwork pod instead. The first half would still work under OpenShift (we don't block hostNetwork pods from accessing the metadata IP) but the second half does not. In OpenShift, the "client" pods would have to explicitly connect to the proxy pod rather than trying to connect to the metadata IP. I don't know if there are any kube2iam/kiam alternatives that work that way?

Probably simplest is to skip those iptables rules for privileged pods.

Making a pod privileged gives it a large amount of power that could be abused accidentally or intentionally; we shouldn't force administrators to mark pods privileged merely as an indicator that they want some other capability.

@paleg
Copy link

paleg commented Jun 10, 2019

In OpenShift, the "client" pods would have to explicitly connect to the proxy pod rather than trying to connect to the metadata IP. I don't know if there are any kube2iam/kiam alternatives that work that way?

Connecting to 169.254.169.254 is a default behavior of AWS SDK for automatic credentials obtaining. Connecting explicitly to proxy pod requires application code changes (or AWS SDK library patching) and makes no sense as in that case it's easier to leverage assume role mechanism. Edit: before using sts:AssumeRole from pod we still need to authenticate i.e. get access/secret keys from somewhere (usually from metadata or hardcoded in application/environment).

Making a pod privileged gives it a large amount of power that could be abused accidentally or intentionally

Totally agree here.

@paleg
Copy link

paleg commented Jun 10, 2019

Ideally I'd prefer to be able to turn off this behavior with some configuration option (turned on by default), and then I can rely on kiam to restrict access to metadata (which pods can ask what metadata paths).

@squeed
Copy link
Contributor

squeed commented Jun 11, 2019

Perhaps we could rewrite requests from ordinary pods to another IP address that is normally blackholed, but configure kube2iam to listen on that IP?

@jkassis
Copy link

jkassis commented Aug 13, 2020

Perhaps we could rewrite requests from ordinary pods to another IP address that is normally blackholed, but configure kube2iam to listen on that IP?

The only solution left on the table for kiam / kube2iam for Openshift 4 is recommended by kube2iam (https://github.com/jtblin/kube2iam). It is to hack the deployment configurations of all pods that need access to AWS API to use the http_proxy and no_proxy environment variables. This makes a total mess of deployment configurations because the HOST_IP needs to be captured in an env, etc.

Furthermore, the no_proxy env variable is not consistently supported, so my application is actually broken on Openshift 4 out of the box. It will not work.

Furthermore, this firewall rule is blocking installation of linkerd... https://github.com/linkerd/linkerd2/issues/4851.

As a result of all of this I tried migrating to OVSKubernetes, but this firewall rule still breaks.

I'm quite confused why Amazon API IP address would get special attention here. What does the SDN do with other addresses in the link-local space? Whatever it does, it should probably just do the same thing for the Amazon API rather than raise the firewall.

Frankly this PR looks like malice and unfortunately renders Openshift useless for applications that require access to AWS cloud services outside of the dreaded service catalog AFAICT.

@jkassis
Copy link

jkassis commented Aug 13, 2020

danwinship:anti-ec2-metadata-hackery really?

@jkassis
Copy link

jkassis commented Aug 13, 2020

Making a pod privileged gives it a large amount of power that could be abused accidentally or intentionally; we shouldn't force administrators to mark pods privileged merely as an indicator that they want some other capability.

agree.

@jkassis
Copy link

jkassis commented Aug 13, 2020

@paleg @cgwalters did you get through this?

@danwinship
Copy link
Contributor Author

I'm quite confused why Amazon API IP address would get special attention here.

For the same reason kube2iam exists; because a pod that has unrestricted access to the metadata service can leverage that to get node-level privileges.

We are aware that the current situation is broken. We eventually want it to be possible to use kube2iam. Unfortunately, fixing this is not currently high on the priority list.

@cgwalters
Copy link
Member

Making a pod privileged gives it a large amount of power that could be abused accidentally or intentionally; we shouldn't force administrators to mark pods privileged merely as an indicator that they want some other capability.

Sure, I could imagine that we add a pod security policy for this - I think that'd be something good to take to a Kubernetes upstream enhancement.

In the meantime though, I think the workaround is making kube2iam privileged (at least when deployed in OpenShift).

@danwinship
Copy link
Contributor Author

Unfortunately, the problem isn't with kube2iam, it's with everything else; when any other pod attempts to reach the metadata service, we kill that connection before kube2iam has a chance to intercept it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants