-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Moved PROXY protocol wrap to execute before the TLS wrap #3195
Conversation
The proxy protocol documentation (https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt) does not explicitly state whether the proxy protocol should be wrapping TLS or TLS should wrap proxy, however, it multiple places it suggests that proxy is supposed to wrap TLS since part of the protocol contains information about the underlying TLS state (ALPN, SNI header, etc. -- see sections 2.2.1 and 2.2.2 for instance). Of course, this is only in version 2, and we only currently handle version 1, and there is no guidance at all for version 1, so you can either infer that the intent should be the same, or not infer anything at all. Given the lack of any specification in the protocol + the vague hints (at least for v2 which can handle it) that PROXY is supposed to wrap TLS, not the other way around, and given Vault's security focus which encourages end-to-end TLS without unwrapping by a load balancer (which is often required by the industry standards our clients must adhere to), we implemented it with PROXY wrapping TLS. I don't think that's the wrong call, regardless of what Fabio does, and AFAIK it works perfectly fine with ELB (and probably other LBs) in TCP-proxying mode. Accordingly: I'm happy to have an option that allows flipping the order (with the current behavior the default), but I won't merge a PR that does it unilaterally. |
ELB sends PROXYv1-wrapped TLS traffic as expected (when I tcpdump I see plaintext PROXYv1 header), but Vault server does not understand it. Interestingly So after this observation I read through some library codes, and I think the TLS connection should read from underlying PROXY-wrapped conection, so that it consumes PROXY protocol header at the very first. In order to do this PROXY wrapping should occur before TLS wrapping (code-wise). I agree with your explanation that PROXY should wrap TLS, but isn't Vault's current implementation unwrapping in the wrong order? |
TLS wrapping is here: https://github.com/hashicorp/vault/blob/master/command/server.go#L454 - inside the NewListener function. Note how in the block below that is where the proxy protocol wrapping happens, so the proxy handler should be wrapping the tls listener.
What should be happening with Vault right now is that the LB gets a TCP connection with TLS data, then the LB tacks on the proxy stuff in front so you get PROXY(TLS(data)). Then the last wrapper of the listener in Vault is PROXY, which wraps the TLS listener, which wraps the underlying data -- the reverse of what the LB should be sending. If in your testing this is the ordering of the data that you see in your tcpdumps but to work Vault needs to have the ordering of the listeners changed, then I'm honestly a bit mystified. |
Currently Vault wraps the TLS listener with PROXY listener, so this line of go-proxyproto - which should read and consume PROXY header - will read from underlying TLS connection: |
@solmonk For that line to be reading the TLS connection data, the section above where it peeks at the data and tries to find a proxy proto header mismatch has to fail. If it finds a mismatch it exits out without attempting to read a header line. It could be buggy! But I'd find it odd if it was. Any chance you can share the tcpdump data you have so I can be looking at the same input data as you? Also, can you share your vault config at the same time so I can see what combination of options you set on the listener? |
My configuration looks like this: 443 receives direct traffic, and 8199 receives from ELB.
tcpdump on failing situation is below. I redacted ip addresses for some reasons, but hope this still helps... Just calling
It seems the server just receives plaintext PROXY protocol header that starts with
Oops I missed that part! but AFAIK |
Peek shouldn't advance the reader...see https://golang.org/pkg/bufio/#Reader.Peek Any chance you'd be willing to share the actual cap files in a non public context, e.g. via email/keybase/etc. with me? |
Since it is a buffered reader, Peek actually reads from io.Reader and fills the buffer. I sent the email with the captured tcpdump to you. During the dump I also did some printf debugging and checked here is the exact error position. |
@solmonk Peek reads from the reader, but if you look at the code in go-proxyproto, you'll see that it peeks until it knows that it's seeing a proxy header, then it does a read up until a new line to get the rest of it. Assuming that the proxy header is in the expected format (the information, then the newline), data after the newline should be the TLS data, which should not have been snarfed. |
Got the tcpdump. Nothing looks out of sort initially when examining the trace, but will dig in. If you're happy building a branch of Vault I could make a branch that adds some debugging into go-proxyproto. That would help make it clear exactly what the library thinks is going on (e.g. does it think it's actually not PROXY coming in for some reason, which would then cause it to send the full underlying data to the TLS handler, which would then choke as you saw). |
I think my comments were a bit messy so to sum up my claim:
As I mentioned I was already doing some kind of debugging by adding some printfs to the code, so yes, I am happy building a branch :) |
Ah. I think I see what you're saying now: You're saying that when attempting a read on an underlying tls connection it triggers the TLS handshake. Since the proxy header is not yet fully read in, the handshake is attempted on the proxy data instead of the underlying TLS data. Is that your analysis? |
Yes and I am pretty confident with it. |
OK so then it seems like there are two ways forward, which are not exclusive:
|
Quoted from your link of PROXY protocol documentation. PROXY header should be placed at the very first of TCP connection regardless of version, which also means nothing (including TLS) can wrap PROXY. The doc (plus its implementation example) states enough that only header is the difference between two versions. I don't think the current behavior could be an option; instead I consider it a bug that have to be fixed. |
I was saying that your suggested behavior (TLS wrapping PROXY) could be an option, not the current behavior. |
Hmm that's confusing because my suggested behavior is PROXY protocol wrapping TLS, yet in the code it can be achieved by TLS listener wrapping PROXY listener (just as shown in this PR) because bytestream processing starts from the innermost listener.. Maybe the word "wrap" used in both context causes problem? |
Bytestream processing doesn't start from the innermost connection, it starts at the outermost. |
I think I did another bad explanation :( The execution order should be that but outer connection always reads from the data that inner connection already processed, unless we make some kind of specialized implementation as you said. I cannot think of any advantages for doing that over just flipping the order though. |
I have two reasons I'm hesitant about just flipping the order:
Unfortunately the alternatives are pretty onerous. I'm going to bring this up internally with the team and we'll figure out how we want to proceed. |
OK, we're going to do it this way. Likely it won't break due to any change in the TLS libraries until at least Go 2, and we don't really want to implement a wrapping connection. Thanks! |
In my setup I use AWS ELB with PROXY protocol enabled as a TCP proxy that transparently passes SSL traffic to vault. However, when the vault server receives traffic it throws the following error:
It seems Vault currently does not work if both PROXY protocol and TLS are enabled. Based on what I saw in other successful implementations (https://github.com/fabiolb/fabio/blob/master/proxy/listen.go#L28), PROXY protocol listener wrapping should execute before TLS listener wrapping. I tested this change on my setup described above and the problem was fixed.