Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

End-to-End IP transparency using Proxy Protocol #4128

Closed
donbowman opened this issue Aug 13, 2018 · 33 comments
Closed

End-to-End IP transparency using Proxy Protocol #4128

donbowman opened this issue Aug 13, 2018 · 33 comments
Labels
design proposal Needs design doc/proposal before implementation no stalebot Disables stalebot from closing an issue
Milestone

Comments

@donbowman
Copy link
Contributor

donbowman commented Aug 13, 2018

Issue Template

Title: End-to-End IP transparency using Proxy Protocol

Description:

[It is my intent to work on below, soliciting input, feedback, and assistance]

The general chain of events (in a GKE Kubernetes at least) model is that we have: Client->LoadBalancer->Ingress/Front->Sidecar->service as a sequence. The LoadBalancer does NAT, the Ingress is a proxy server, the istio-sidecar is a proxy server. This leads to 3 levels of address
translation.

Some services need to see the Origin IP to operate efficiently. Methods like X-Forwarded-For, REMOTE_ADDR, exist in HTTP. But, there is no general equivalent for other protocols, and these are unreliable.

General approaches like RFC7974 have been tried (this one fails for ipv6 or when there are more TCP options than average present).

The Kubernetes LoadBalancer exposes a externalTrafficPolicy: Local method to allow keeping the Origin IP. Note, this has a side-affect (see groups and issue ).

HAProxy implemented Proxy Protocol
as a general means around this problem.

In Envoy, we need to act as 3 roles:

  • Consumer. In AWS the cloud Load Balancer will natively speak Proxy Protocol, including placing
    a TLV extension (PP2_TYPE_AWS/PP2_SUBTYPE_AWS_VPCE_ID) on to indicate the VPC
  • Originator. Envoy as an Ingress of Front Proxy translates the IP, and should allow transparency
  • Sidecar. If we look at mmproxy, we can see a method to be completely transparent, implemented in cloudflare/mmproxy

If all 3 roles are enabled, we can implement services like Geo IP management in our services.

1 Phase (Consumer) is complete (except for the TLV parsing of AWS extensions).

In order to implement the Originator component, information needs to be placed into the socket after connect() and before any other reads/writes. In addition, we need to handle connection-pooling (since absent Envoy, a single connection could never have >1 source IP). Thus an inbound connection being
onward routed can only use a connection-pool connection that advertised the same origin IP.

Potentially a set of Routing matching fields are needed, to operate on the 'Origin' IP vs the 'translated'.

On the 'input side' of Envoy there are a set of filters, but there does not appear to be a single place to hook into all of the outbound (for all protocols since proxy procotol is not specific to http). The closest I can see is:

connection_impl.cc, ~L594

  // The local address can only be retrieved for IP connections. Other
  // types, such as UDS, don't have a notion of a local address.
  if (socket_->remoteAddress()->type() == Address::Type::Ip) {
    socket_->setLocalAddress(Address::addressFromFd(fd()), false);
  }
}

For the Sidecar (mmproxy), this would involve binding to a non-local IP, and an IP tables rule.
The wire protocol would be otherwise unchanged.

Comments?

a) the general thinking, usefuless of the feature?
b) the components?
c) pointers to where the code would be inserted/type of config needed/concerns/gotchas?

@donbowman
Copy link
Contributor Author

Somewhat similar requirement was articulated here #2659

@mattklein123 mattklein123 added the design proposal Needs design doc/proposal before implementation label Aug 14, 2018
@mattklein123
Copy link
Member

@donbowman this is a meaty topic and related to a few other open issues we have. I will respond hopefully later today or Thursday. cc @PiotrSikora and @ggreenway also.

@mattklein123
Copy link
Member

mattklein123 commented Aug 17, 2018

@donbowman in general I think this is a very useful area for development and there has already been a bunch of related work and thinking that we can build on (though nothing has been completed or merged).

  • General issue around upstream filters: filters: allow filter installation for upstream connections #173. @alanconway was working on this and I believe has some working code but I'm not sure of the current status. It would be great to get the core upstream filter changes merged.
  • Setting outbound client SNI based on incoming host/authority: setting outbound (envoy as client) SNI based on host/authority  #2690. The reason I bring up this issue is that it's a similar concept of needing a new type of connection pool in which a connection can be acquired that meets certain requirements (in that case SNI, in your case proxy proto). I postulated in that issue that I think this could be handled by a new type of connection pool implementation that potentially wraps other connection pools. This is also of interest in the TCP proxy case now that @zuercher has developed a generic TCP connection pool. It's nice that we can potentially solve this for both TCP and HTTP cases.
  • Setting upstream proxy proto on connections: Proxy proto header generation #1031. I guess perhaps a duplicate of this issue. If it is we can close that issue since I think this one now has more information.

Hopefully there is enough info in the linked issues to get started. FYI I'm out for the next month starting tomorrow. @ggreenway @PiotrSikora @zuercher should be able to help with continued iterations here. I would recommend putting together a small gdoc design document on this if you plan on following through and implementing. That way we can review and comment on something a bit more concrete. Thank you!

@donbowman
Copy link
Contributor Author

Thank you.
I have a PoC of outbound (transparency to upstream) working. It will need work in the connect pool as you say above to select one w/ the right property.

@donbowman
Copy link
Contributor Author

Work to add pre-bind & transparent was added previously
https://github.com/envoyproxy/envoy/pull/3682/files/3a23133c900983589cdb77f1de6eba7121445d6c
(e.g. SocketOptionFactory::buildIpTransparentOptions())

@donbowman
Copy link
Contributor Author

Also referenced in istio:
istio/istio#5679

@donbowman
Copy link
Contributor Author

And referenced in #2659

@alanconway
Copy link

alanconway commented Aug 29, 2018 via email

@klarose
Copy link
Contributor

klarose commented Sep 10, 2018

FYI I'll be taking over the PoC and polishing it up for Don. I'll try to get something along with a rough outline of the design in the next week or two.

@klarose
Copy link
Contributor

klarose commented Oct 2, 2018

I have the beginnings of a design up here: https://docs.google.com/document/d/1md08XUBfVG9FwPUZixhR3f77dFRCVGJz2359plhzXxo/edit?usp=sharing

If there's a template I should be conforming to, please let me know. I can adapt what I have to it.

The design is currently probably a bit more verbose than needed, but I felt that summarising the prototype and its findings would be worthwhile.

The meat of the proposal is at the end, in the Design section. To start the discussion, I have given a high level overview of the three components we'll be implementing and what we need them to do. The most complicated part, I think, is the connection pool, since it's where we'll actually be applying the transparency.

I'd like feedback on whether my proposed solution seems reasonable, and on any questions I've posed in the comments.

Thanks.

@klarose
Copy link
Contributor

klarose commented Oct 15, 2018

@mattklein123 would you be able to look over my design doc? If not, could you nominate someone to do so? :) It's still pretty rough, but I have sketched out the overall solution, and have begun to flesh out some of the more complication portions.

@mattklein123
Copy link
Member

@klarose sorry for the delay. I'm OOO for 2 weeks and not working much. I might have time to take a look but no guarantees. In the meantime if either @ggreenway or @alyssawilk could take a look that would be great, otherwise I will take a look when I get back.

@mattklein123 mattklein123 self-assigned this Oct 16, 2018
@mattklein123
Copy link
Member

@klarose sorry for the delay. I took an initial pass through the document. My main initial feedback is in how we think about composing the connection pools, and how we can make this work for SNI based transparent routing also which has been a long requested feature. I should be more responsive to doc comments now so we can go back and forth as needed. Also happy to discuss on the next community call live. cc @alyssawilk @ggreenway @PiotrSikora

@klarose
Copy link
Contributor

klarose commented Oct 30, 2018

@mattklein123 Thanks for taking a look! It looks like you almost got to the fun stuff -- namely the proposal. I'd really appreciate it if you could comment on that part -- the high level and detailed design. It's not quite complete yet, but I think it gives a good overview of what I plan on doing. If you think I'm on the right track there, I'll finish it off with the current strategy. Otherwise, I'll go back to the drawing board!

@mattklein123
Copy link
Member

@klarose sorry I stopped reading when I did. Yes it sounds like we are on the same page with the 2-level connection pool. My main comment is that I think we can pass LbContext into the CP stream creation function and everything will come together super cleanly across both TCP and HTTP. Assuming you agree this all sounds good to me at a high level.

@stale
Copy link

stale bot commented Dec 4, 2018

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions.

@stale stale bot added the stale stalebot believes this issue/PR has not been touched recently label Dec 4, 2018
@hzxuzhonghu
Copy link
Contributor

/remove stale

@stale stale bot removed the stale stalebot believes this issue/PR has not been touched recently label Dec 4, 2018
@stale
Copy link

stale bot commented Jan 4, 2019

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions.

@stale stale bot added the stale stalebot believes this issue/PR has not been touched recently label Jan 4, 2019
@donbowman
Copy link
Contributor Author

/remove stale
the PR is still in process of being reviewed w/ my colleague @klarose

@stale stale bot removed the stale stalebot believes this issue/PR has not been touched recently label Jan 4, 2019
@stale
Copy link

stale bot commented Feb 4, 2019

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions.

@stale stale bot added the stale stalebot believes this issue/PR has not been touched recently label Feb 4, 2019
@donbowman
Copy link
Contributor Author

the issue is still underway, PR #5670 is part of it.

@stale stale bot removed the stale stalebot believes this issue/PR has not been touched recently label Feb 4, 2019
@stale
Copy link

stale bot commented Mar 6, 2019

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions.

@stale stale bot added the stale stalebot believes this issue/PR has not been touched recently label Mar 6, 2019
@klarose
Copy link
Contributor

klarose commented Mar 7, 2019

I'm still working on this; I've just been busy the last few weeks. I'm hoping to have the next PR up soon.

@stale stale bot removed the stale stalebot believes this issue/PR has not been touched recently label Mar 7, 2019
@zengyuxing007
Copy link
Contributor

the envoy send proxy protocol, Has it been achieved? @klarose

@klarose
Copy link
Contributor

klarose commented Mar 11, 2019

@zengyuxing007 It has not been. I've been focusing on providing IP transparency to a local service. I'm not sure when i'll get to sending the proxy protocol. If you're volunteering, feel free to take that chunk over. It can certainly be done in parallel to my current tasks. :)

@stale
Copy link

stale bot commented Apr 10, 2019

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions.

@stale stale bot added the stale stalebot believes this issue/PR has not been touched recently label Apr 10, 2019
@klarose
Copy link
Contributor

klarose commented Apr 10, 2019

Still plugging away at this. Most recent related PR Is #6516

@stale stale bot removed the stale stalebot believes this issue/PR has not been touched recently label Apr 10, 2019
@mattklein123 mattklein123 added the no stalebot Disables stalebot from closing an issue label Apr 10, 2019
@mattklein123 mattklein123 added this to the 1.11.0 milestone Apr 10, 2019
@andredasilvapinto
Copy link

I'm sorry but how is "add connection pool circuit breaker stats" related to providing support for L4 proxy protocol?

@klarose
Copy link
Contributor

klarose commented Apr 10, 2019

It's an artifact of the design we've chosen to achieve the goal of end to end ip transparency. We needed to add a capability that could lead to an unlimited number of connection pools, which in turn required that we add the ability to limit that number. Once we added that limit, we needed statistics for users to understand whether the limit is being hit.

Supporting L4 Proxy Protocol as an input and output is only one part of this issue. We still need to be able to spoof the IP at a sidecar in order to achieve our goal. That is what I have been working on.

At this point, Envoy supports:

  1. Spoofing the downstream source IP for http connection pools.
  2. Consuming Proxy Protocol to determine the downstream source IP
  3. Using the actual downstream source IP, as seen by Envoy, to determine the downstream source IP.

After the PR I linked is finished, I'll be working on Consuming the XFF to determine the downstream source IP. After that, I'll add the spoofing capability for generic TCP connection pools.

After that, I can start looking into originating proxy protocol, if somebody hasn't already gotten to it first.

At that point we can call this issue closed.

@andredasilvapinto
Copy link

andredasilvapinto commented Apr 10, 2019

OK. Thanks for the reply.

Personally I'm only interested in the very last part (originating the proxy protocol) with no side car. We were planning on using envoy L4 as our edge proxy but the lack of source ip visibility in the upstream hosts made us change our mind. The company I work at will actually pay for multiple nginx plus licenses just because of this particular feature (Proxy protocol support). I don't know how many other cases there are out there like ours (I know that at least Google was also looking for this feature in 2017 #1031 ), but I just wanted to let you know about it.

I would happily try to contribute but I'm by far not a C++ developer (last time I used it at any meaningful level was when I was in the university). I took a look at the plugin support a few months ago, but from what I saw, I believe this would need to touch code outside of that domain into core envoy.

In any case thanks for your contributions. I look forward to that feature set.

@klarose
Copy link
Contributor

klarose commented May 2, 2019

The PR for reading the source IP from the XFF is up: #6790

@mattklein123
Copy link
Member

I think this is done so I'm going to close this. Let's open smaller issues if there is further work to do.

@juanvasquezreyes
Copy link

juanvasquezreyes commented Jul 10, 2020

is there like an end to end example using tcp_proxy and how to set it up with the proxy protocol?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design proposal Needs design doc/proposal before implementation no stalebot Disables stalebot from closing an issue
Projects
None yet
Development

No branches or pull requests

8 participants