-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Envoy does not adhere to HTTP/2 RFC 7540 #6767
Comments
@alyssawilk @PiotrSikora thoughts on this? I haven't read the relevant RFCs in detail to fully understand what is needed. |
Intersection of H2 specs and TLS handshake? I eagerly anticipate Piotr sorting this out :-P |
The gist is that browsers coalesce HTTP/2 connections pretty aggressively: when a browser opens connection to As long as all However, if One solution would be to send Another solution would be using HTTP/2 ORIGIN frame (RFC8336) to advertise allowed hostnames on a given listener/filter chain (but this requires a global list as well, and this extension is supported only by a few clients). |
Is is possible to reprioritize this issue? We have a use case where we have thousands of services behind hundreds of FQDN's that are served by a set of identical envoys, (all using a wildcard TLS cert). We exhibit this exact issue when HTTP/2 is enabled but not when enforcing usage of HTTP/1.1. |
Here's the CVE for this vulnerability https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-11767 |
CC @envoyproxy/security-team |
I can think of 3 alternatives:
|
What is the plan for fixing https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-11767 in envoy? |
AFAIK there is no plan currently. Someone needs to own this issue and drive a resolution if they are passionate about fixing it. |
I hacked up a 421 response when the virtual host lookup fails and this works for a simple case that I tried. If this is a reasonable approach I'd need a bit of help to polish it up and figure out how to write tests for it. https://gist.github.com/jpeach/e01f5f752eed5ffd09ea1f18634d1fc5 |
I think I managed to find a workaround: On Envoy instance I added an "envoy.lua" HTTP filter, that checks if the response code is a 404 (the same code, that is being generated for non-existent route) AND checks if the "x-envoy-upstream-service-time" header is NOT present. The Lua code: function envoy_on_response(response_handle)
if response_handle:headers():get(":status") == "404" and response_handle:headers():get("x-envoy-upstream-service-time") == nil then
response_handle:headers():replace(":status", "421")
end
end Example configuration on Envoy (fetched by LDS): "http_filters": [
{
"name": "envoy.lua",
"typed_config": {
"@type": "type.googleapis.com/envoy.extensions.filters.http.lua.v3.Lua",
"inline_code": `
function envoy_on_response(response_handle)
if response_handle:headers():get(":status") == "404" and response_handle:headers():get("x-envoy-upstream-service-time") == nil then
response_handle:headers():replace(":status", "421")
end
end`
}
},
{
"name": "envoy.filters.http.router"
}
] |
Nice! That's a bit cleaner than my equivalent 👍 |
May be a dumb question - if we have a scenario like
(from Piotr above) How can we distinguish from a browser coalescing the request from someone legitimately sending a request with Also, RFC 7540 is about HTTP/2. The above example can be done with HTTP 1 - do we expect a 421 still? |
I am exploring the relationship among SNI, SAN in cert and Host in Http. |
We now use this Lua Snippet: function envoy_on_request(request_handle)
local streamInfo = request_handle:streamInfo()
if streamInfo:requestedServerName() ~= "" then
if (string.sub(streamInfo:requestedServerName(), 0, 2) = "*." and not string.find(request_handle:headers():get(":authority"), string.sub(streamInfo:requestedServerName(), 1))) then
request_handle:respond({[":status"] == "421"}, "Misdirected Request")
end
if (string.sub(streamInfo:requestedServerName(), 0, 2) ~= "*." and streamInfo:requestedServerName() ~= request_handle:headers():get(":authority")) then
request_handle:respond({[":status"] = "421"}, "Misdirected Request")
end
end
end EDIT: Fixed for HTTP requests where requestedServerName is empty |
(random passer by) @htuch This is an example of a very valuable and simple use case that needs Lua. |
@kyessenov ack. Presumably Wasm could also work here at the expense of a heavier weight everything. |
This is such an integral part of HTTP/2, that it should be fixed in Envoy core, instead of requiring users to implement workarounds in Lua/Wasm. |
I think the meta-point here is that Lua/Wasm are useful temporary patches until a release with a fix can be deployed. |
@PiotrSikora yeah I agree. @lambdai do you still want to work on it, or should we leave it open for grabs? |
(yes yes, inb4 why don't you make an MR ;-) ) |
It's not an envoy problem, but a wrong envoy config. The problem can be solved using standard means
|
@a-b-v you (probably) don't want to 421 unless its a valid domain on a different SNI. Otherwise you should 404. |
imho if such a request is received, it's exactly exists valid domain suitable for the wildcard certificate on a different SNI. Or it's an attack and I don't care that to answer) |
anyway if a client receives 421, it must (MAY according to RFC) retry the request over a different connection. and this connection will have the correct SNI and a server will return 404 if no domain exists |
We haven't had a ton of movement on this; are we still looking for an owner? |
Yes, needs an owner and a viable proposal for how to fix (without breaking something else). |
I think @lambdai's first option makes the most sense:
I'll probably need some help as I'm still onboarding to envoy' but I'd be happy to drive this forward next release, especially since it looks like a CVE has been filed against this behavior |
/assign @keithmattix |
I think the easiest solution is an option on the HCM to enforce matching SNI and |
Yep that's exactly what I'm planning to implement. Optional flag on HCM for those who want stricter :authority -> SNI checking. |
I think this will both break traffic that would otherwise work entirely, even outside of HTTP/2 re-use, result in a decrease in connection pooling, and incorrectly lead to clients retrying when they shouldn't since it was a legitimate 404 not a misdirect (hopefully the browsers are smart enough not to infinite loop!). A flag to do so... sure. But I wouldn't turn it on in Istio, for example. |
@howardjohn what's the alternative? IIUC, each HCM would have to know about all of the other listeners/FCMs to distinguish between a true 404 vs. a retriable connection re-use error |
With a list of all hostnames that are currently supported, it would be easy with the current capabilities to only return a 421 when appropriate: just add all the "wrong here but supported" to another vhost that has a direct_response 421, and then have a wildcard vhost that returns 404. Or am I missing something in the details? There are a lot of comments between this issue and the Istio one; I may be missing something important. |
@ggreenway not sure I completely understand your point. So each filter chain (match( data structure would have a list of all of the hostnames within its listener so it can know if the request is valid? |
I think what @howardjohn is asking for is if a request for a known hostname but on the wrong filter chain arrives, respond with 421, and if a request for an unknown hostname arrives, respond with 404. Assuming that is correct, this can be configured by adding routes (via VirtualHost) in each HCM (across all filter chains) for all known hostnames that are not correct for the current HCM, and responding with a 421 from those routes. |
Ahh I understand now; yes I think that's plausable |
Hey everyone, glad to see some movement and excitement on this issue - that's great! As someone who is affected by this bug, this is my workaround: certs are single domains only. So connection pooling is already limited since clients cannot pool across all the hostnames that are served by the same envoy/istio instance. The problem I ran into is I had some certs with N domains, and some with 1 and it was all intermixed on a single load balancer envoy. You might be saying "well why don't you use a wildcard then?" and I would say "I am using lets encrypt which discourages/makes wildcard certs harder to use." So I have N certs for N subdomains all under .foo.bar.com and that is how I like it (maybe, but with certmanager tooling it certainly isn't oppressively hard to manage). |
letsencrypt rightfully has quotas on certs and if you do generate a lot of urls, say for a sandbox environment ($username-$servicename.sandbox.our.domain), using wildcard is actually what is recomended/needed to not block your 'real' certs... and that's where this breaks |
@keithmattix @lambdai @zhaohuabing @arkodg
I propose an idea that is currently implemented in the Higress fork of Envoy. Let's see if it's OK, I can submit this implementation as a PR:
This architecture is actually similar to the implementation in Nginx, where in Nginx, SNI is used first to find the server block to complete the TLS handshake, and then the |
Title: Envoy does not adhere to HTTP/2 RFC 7540
Description:
[optional Relevant Links:]
The text was updated successfully, but these errors were encountered: