Envoy does not adhere to HTTP/2 RFC 7540 #6767

trevorlinton · 2019-05-01T14:21:34Z

Title: Envoy does not adhere to HTTP/2 RFC 7540

Description:

RFC 7540 Section 9.1.1 and 9.1.2 specifies when a request coming in through a re-used HTTP/2 connection is accidentally sent to a non-origin but authoritative server that a 421 response should be returned. This can happen if two servers one with a wildcard certificate (e.g., a.example.com) and another server (b.example.com) with a non-wildcard on the same IP address using SNI responds to requests those meant for server b.example.com will accidentally be forwarded down the re-used HTTP/2 connection for a.example.com. In this situation a.example.com should send back a 421 to indicate the request was destined for b.example.com. This forces browsers to re-establish a new connection, re-negotiate the SNI, and thus the backing server and subsequently route to the correct origin.

[optional Relevant Links:]

https://tools.ietf.org/html/rfc7540#section-9.1.1 - section describing connection re-use
https://tools.ietf.org/html/rfc7540#section-9.1.2 - section describing misdirected response
https://bugs.chromium.org/p/chromium/issues/detail?id=954160#c5 - The bug was originally filed against Chromium however they indicated Istio was the issue.
Istio does not adhere to HTTP/2 RFC 7540 istio/istio#13589 - The bug was then filed against istio who indicated envoy was the issue.

mattklein123 · 2019-05-02T18:46:47Z

@alyssawilk @PiotrSikora thoughts on this? I haven't read the relevant RFCs in detail to fully understand what is needed.

alyssawilk · 2019-05-02T19:07:15Z

Intersection of H2 specs and TLS handshake? I eagerly anticipate Piotr sorting this out :-P

PiotrSikora · 2019-05-02T20:01:47Z

The gist is that browsers coalesce HTTP/2 connections pretty aggressively: when a browser opens connection to www.example.com and is presented with a certificate for *.example.com during TLS handshake, then it will re-use this connection for all requests to *.example.com, as long as the hostname resolves to the same IP (and some browsers don't even care about that).

As long as all *.example.com hostnames are served by the same listener/filter chain, this shouldn't be an issue in Envoy, since routing is happening on a per-request, and not per-connection basis (please correct me if I'm wrong).

However, if www.example.com (with *.example.com certificate) is served by one listener/filter chain, and app.example.com is served by another listener/filter chain, then we have an issue, since connections are latched to a single listener/filter chain for the lifetime of the connection (again, please correct me if I'm wrong), and if the connection to www.example.com is established first, then requests to app.example.com will be coalesced on the same connection, using configuration for www.example.com, and forwarded to the wrong backend.

One solution would be to send 421 Misdirected Request response to requests for hostnames that are not configured on a given listener/filter chain (but this wouldn't work if *.example.com is configured), or send 421 Misdirected Request response to requests for hostnames that are configured on other listeners/filter chains (but this requires a global list of all configured hostnames).

Another solution would be using HTTP/2 ORIGIN frame (RFC8336) to advertise allowed hostnames on a given listener/filter chain (but this requires a global list as well, and this extension is supported only by a few clients).

jcrowthe · 2019-08-20T21:43:07Z

Is is possible to reprioritize this issue? We have a use case where we have thousands of services behind hundreds of FQDN's that are served by a set of identical envoys, (all using a wildcard TLS cert). We exhibit this exact issue when HTTP/2 is enabled but not when enforcing usage of HTTP/1.1.

haf · 2020-04-15T07:10:26Z

Here's the CVE for this vulnerability https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-11767

htuch · 2020-04-15T14:14:02Z

CC @envoyproxy/security-team

jpeach · 2020-04-28T07:01:05Z

One solution would be to send 421 Misdirected Request response to requests for hostnames that are not configured on a given listener/filter chain (but this wouldn't work if *.example.com is configured), or send 421 Misdirected Request response to requests for hostnames that are configured on other listeners/filter chains (but this requires a global list of all configured hostnames).

I can think of 3 alternatives:

What if you could specify the HTTP response code for an RBAC filter DENY? Then the management server that configured the HCM can add an RBAC policy for the server names that it allows on that HCM and generate 421 on DENY.
The management server could program the SNI server name check in the Lua filter, generating a 421 when it didn't match.
Add a dedicated filter that can be configured with the acceptable SNI server names.

HarinadhD · 2020-06-18T03:02:23Z

What is the plan for fixing https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-11767 in envoy?

mattklein123 · 2020-06-18T15:01:23Z

AFAIK there is no plan currently. Someone needs to own this issue and drive a resolution if they are passionate about fixing it.

jpeach · 2020-07-17T01:32:14Z

I hacked up a 421 response when the virtual host lookup fails and this works for a simple case that I tried. If this is a reasonable approach I'd need a bit of help to polish it up and figure out how to write tests for it.

https://gist.github.com/jpeach/e01f5f752eed5ffd09ea1f18634d1fc5

lunighty · 2020-09-07T04:02:02Z

I think I managed to find a workaround:

On Envoy instance I added an "envoy.lua" HTTP filter, that checks if the response code is a 404 (the same code, that is being generated for non-existent route) AND checks if the "x-envoy-upstream-service-time" header is NOT present.

The Lua code:

function envoy_on_response(response_handle)
    if response_handle:headers():get(":status") == "404" and response_handle:headers():get("x-envoy-upstream-service-time") == nil then
        response_handle:headers():replace(":status", "421")
    end
end

Example configuration on Envoy (fetched by LDS):

"http_filters": [
    {
        "name": "envoy.lua",
        "typed_config": {
            "@type": "type.googleapis.com/envoy.extensions.filters.http.lua.v3.Lua",
            "inline_code": `
function envoy_on_response(response_handle)
    if response_handle:headers():get(":status") == "404" and response_handle:headers():get("x-envoy-upstream-service-time") == nil then
        response_handle:headers():replace(":status", "421")
    end
end`
        }
    },
    {
        "name": "envoy.filters.http.router"
    }
]

jpeach · 2020-09-07T06:16:56Z

I think I managed to find a workaround:

On Envoy instance I added an "envoy.lua" HTTP filter, that checks if the response code is a 404 (the same code, that is being generated for non-existent route) AND checks if the "x-envoy-upstream-service-time" header is NOT present.

Nice! That's a bit cleaner than my equivalent 👍

GoelDeepak · 2020-12-08T18:05:23Z

@jpeach @lunighty Is there a plan to have a fix in Envoy code or do you think the current approach with EnvoyFilter is sufficient?

howardjohn · 2021-02-10T01:17:44Z

May be a dumb question - if we have a scenario like

However, if www.example.com (with *.example.com certificate) is served by one listener/filter chain, and app.example.com is served by another listener/filter chain, then we have an issue, since connections are latched to a single listener/filter chain for the lifetime of the connection (again, please correct me if I'm wrong), and if the connection to www.example.com is established first, then requests to app.example.com will be coalesced on the same connection, using configuration for www.example.com, and forwarded to the wrong backend.

(from Piotr above)

How can we distinguish from a browser coalescing the request from someone legitimately sending a request with SNI=ww.example.com HOST=app.example.com? It seems like from Envoy's perspective, these are identical.

Also, RFC 7540 is about HTTP/2. The above example can be done with HTTP 1 - do we expect a 421 still?

lambdai · 2021-02-11T01:01:42Z

I am exploring the relationship among SNI, SAN in cert and Host in Http.
I will share a document after

maennchen · 2021-05-04T13:35:35Z

We now use this Lua Snippet:

function envoy_on_request(request_handle)
  local streamInfo = request_handle:streamInfo()
  if streamInfo:requestedServerName() ~= "" then
    if (string.sub(streamInfo:requestedServerName(), 0, 2) = "*." and not string.find(request_handle:headers():get(":authority"), string.sub(streamInfo:requestedServerName(), 1))) then
      request_handle:respond({[":status"] == "421"}, "Misdirected Request")
    end
    if (string.sub(streamInfo:requestedServerName(), 0, 2) ~= "*." and streamInfo:requestedServerName() ~= request_handle:headers():get(":authority")) then
      request_handle:respond({[":status"] = "421"}, "Misdirected Request")
    end
  end
end

EDIT: Fixed for HTTP requests where requestedServerName is empty
EDIT 2: Fixed for Wildcard SNI described by @lambdai

kyessenov · 2024-01-19T04:43:38Z

(random passer by) @htuch This is an example of a very valuable and simple use case that needs Lua.

htuch · 2024-01-19T05:41:53Z

@kyessenov ack. Presumably Wasm could also work here at the expense of a heavier weight everything.

PiotrSikora · 2024-01-23T19:17:43Z

(random passer by) @htuch This is an example of a very valuable and simple use case that needs Lua.

This is such an integral part of HTTP/2, that it should be fixed in Envoy core, instead of requiring users to implement workarounds in Lua/Wasm.

htuch · 2024-01-23T21:59:20Z

I think the meta-point here is that Lua/Wasm are useful temporary patches until a release with a fix can be deployed.

kyessenov · 2024-01-23T22:11:02Z

@PiotrSikora yeah I agree. @lambdai do you still want to work on it, or should we leave it open for grabs?

ldemailly · 2024-01-24T18:13:14Z

temporary patch

opened this issue on May 1, 2019

(yes yes, inb4 why don't you make an MR ;-) )

a-b-v · 2024-05-15T17:17:19Z

It's not an envoy problem, but a wrong envoy config. The problem can be solved using standard means

- "@type": type.googleapis.com/envoy.config.listener.v3.Listener
  name: https
  address:
    socket_address:
      address: 0.0.0.0
      port_value: 443
      protocol: TCP
  listener_filters:
  - name: "envoy.filters.listener.tls_inspector"
    typed_config:
      "@type": type.googleapis.com/envoy.extensions.filters.listener.tls_inspector.v3.TlsInspector
  filter_chains:
  - name: https-domain-with-sni-and-wilcard-certificate
    filter_chain_match:
      server_names:
      - "a.domain.com"
    filters:
    - name: https-domain-with-sni-and-wilcard-certificate
      typed_config:
        "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
        stat_prefix: https-domain-with-sni
        route_config:
          name: https-domain-with-sni-and-wilcard-certificate
          virtual_hosts:
          - name: 'a.domain.com'
            domains:
            - "a.domain.com"
            routes:
            - match:
                prefix: "/"
              route:
                cluster: upstream
             - name: 'other_domains'
            domains:
            - "*"
            routes:
            - match:
                prefix: "/"
              direct_response:
                status: 421
        http_filters:
        - name: https-domain-with-sni-and-wilcard-certificate
          typed_config:
            "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
    transport_socket:
      name: https-domain-with-sni-and-wilcard-certificate
      typed_config:
        "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext
        common_tls_context:
          tls_certificates:
          - certificate_chain:
              filename: wildcard.crt
            private_key:
              filename: wildcard.key

howardjohn · 2024-05-15T17:25:01Z

@a-b-v you (probably) don't want to 421 unless its a valid domain on a different SNI. Otherwise you should 404.

a-b-v · 2024-05-15T17:33:06Z

imho if such a request is received, it's exactly exists valid domain suitable for the wildcard certificate on a different SNI. Or it's an attack and I don't care that to answer)

a-b-v · 2024-05-16T08:43:28Z

anyway if a client receives 421, it must (MAY according to RFC) retry the request over a different connection. and this connection will have the correct SNI and a server will return 404 if no domain exists

keithmattix · 2024-07-15T17:32:55Z

We haven't had a ton of movement on this; are we still looking for an owner?

ggreenway · 2024-07-15T18:38:58Z

Yes, needs an owner and a viable proposal for how to fix (without breaking something else).

keithmattix · 2024-07-15T19:09:42Z

I think @lambdai's first option makes the most sense:

HCM doesn't verify requested server name(SNI) with :authority. Theoretically, the first http stream in within the connection can raise the issue.
Should we provide an option in HCM to return 421 on the above conflict?

I'll probably need some help as I'm still onboarding to envoy' but I'd be happy to drive this forward next release, especially since it looks like a CVE has been filed against this behavior

keithmattix · 2024-07-15T20:09:02Z

/assign @keithmattix

ggreenway · 2024-07-15T20:31:00Z

I think the easiest solution is an option on the HCM to enforce matching SNI and :authority and return 421 if they don't match. That's not technically a full solution, because it may be valid to reuse a connection for some subdomains and not others, and this would preclude some valid connection reuse, but it would force everything into a working state, be straightforward to implement, and easy to understand and reason about.

keithmattix · 2024-07-15T20:33:57Z

Yep that's exactly what I'm planning to implement. Optional flag on HCM for those who want stricter :authority -> SNI checking.

howardjohn · 2024-07-15T20:44:01Z

I think this will both break traffic that would otherwise work entirely, even outside of HTTP/2 re-use, result in a decrease in connection pooling, and incorrectly lead to clients retrying when they shouldn't since it was a legitimate 404 not a misdirect (hopefully the browsers are smart enough not to infinite loop!). A flag to do so... sure. But I wouldn't turn it on in Istio, for example.

keithmattix · 2024-07-15T20:49:21Z

@howardjohn what's the alternative? IIUC, each HCM would have to know about all of the other listeners/FCMs to distinguish between a true 404 vs. a retriable connection re-use error

ggreenway · 2024-07-15T21:36:03Z

With a list of all hostnames that are currently supported, it would be easy with the current capabilities to only return a 421 when appropriate: just add all the "wrong here but supported" to another vhost that has a direct_response 421, and then have a wildcard vhost that returns 404.

Or am I missing something in the details? There are a lot of comments between this issue and the Istio one; I may be missing something important.

keithmattix · 2024-07-16T17:27:01Z

@ggreenway not sure I completely understand your point. So each filter chain (match( data structure would have a list of all of the hostnames within its listener so it can know if the request is valid?

ggreenway · 2024-07-16T19:59:45Z

I think what @howardjohn is asking for is if a request for a known hostname but on the wrong filter chain arrives, respond with 421, and if a request for an unknown hostname arrives, respond with 404.

Assuming that is correct, this can be configured by adding routes (via VirtualHost) in each HCM (across all filter chains) for all known hostnames that are not correct for the current HCM, and responding with a 421 from those routes.

keithmattix · 2024-07-16T20:01:00Z

Ahh I understand now; yes I think that's plausable

ryanobjc · 2024-07-18T17:45:00Z

Hey everyone, glad to see some movement and excitement on this issue - that's great!

As someone who is affected by this bug, this is my workaround: certs are single domains only.

So connection pooling is already limited since clients cannot pool across all the hostnames that are served by the same envoy/istio instance. The problem I ran into is I had some certs with N domains, and some with 1 and it was all intermixed on a single load balancer envoy. You might be saying "well why don't you use a wildcard then?" and I would say "I am using lets encrypt which discourages/makes wildcard certs harder to use."

So I have N certs for N subdomains all under .foo.bar.com and that is how I like it (maybe, but with certmanager tooling it certainly isn't oppressively hard to manage).

ldemailly · 2024-07-18T20:01:19Z

letsencrypt rightfully has quotas on certs and if you do generate a lot of urls, say for a sandbox environment ($username-$servicename.sandbox.our.domain), using wildcard is actually what is recomended/needed to not block your 'real' certs... and that's where this breaks

johnlanni · 2024-11-20T03:18:16Z

@keithmattix @lambdai @zhaohuabing @arkodg
I previously considered placing domains with the same certificate in the same filter chain and putting the VirtualHost configuration for each domain in the HCM's RDS. However, I later found that this was not a good approach, mainly because:

Increased the management complexity of the control plane, for example, it is difficult to maintain a simple mapping relationship between Gateway resources (Gateway API standard) and filter chain.
When the domain's certificate changes, it causes LDS to rebuild, resulting in downstream connections being disconnected.

I propose an idea that is currently implemented in the Higress fork of Envoy. Let's see if it's OK, I can submit this implementation as a PR:

Based on the SRDS mechanism, it supports dividing the scope according to the :authority request header and extends this to support wildcard domain lookup (through iterative lookup).
This way, all HCMs in the filter chain share the same SRDS configuration, decoupling the SNI matching process from the VirtualHost lookup process.
Special handling is required for mTLS scenarios, which can be achieved by adding an allow_server_names configuration to the HCM to prevent access to mTLS-protected VirtualHosts using unexpected SNI.

This architecture is actually similar to the implementation in Nginx, where in Nginx, SNI is used first to find the server block to complete the TLS handshake, and then the Host request header is used to find the server block to apply the HTTP routing. And during this process, special handling is done for mTLS scenarios:
nginx/nginx@b720f65

trevorlinton mentioned this issue May 1, 2019

Istio does not adhere to HTTP/2 RFC 7540 istio/istio#13589

Open

mattklein123 added design proposal Needs design doc/proposal before implementation help wanted Needs help! labels May 2, 2019

alexbrand mentioned this issue Sep 12, 2019

Requests can be misrouted due to HTTP/2 Connection Coalescing under certain scenarios projectcontour/contour#1493

Closed

phylake mentioned this issue Oct 4, 2019

Envoy memory optimization: do better filter_chain_match organization projectcontour/contour#1636

Closed

mattklein123 added area/tls area/http labels Dec 30, 2019

haf mentioned this issue Apr 14, 2020

404 NR when using browser on multiple ingress gateways istio/istio#9429

Closed

jpeach mentioned this issue Apr 27, 2020

Possible envoy regression causes HTTP 404 projectcontour/contour#2468

Closed

howardjohn mentioned this issue Jan 22, 2021

Analyzer: duplicate certificate in Gateway istio/istio#29435

Merged

lambdai self-assigned this Feb 10, 2021

This was referenced Feb 19, 2021

Ingress V1: hostname wildcard support projectcontour/contour#2138

Closed

Lua filter: Expose streamInfo::requestedServerName() #15142

Closed

howardjohn mentioned this issue Mar 5, 2021

GEP: Client Certificate Verification for Gateway Listeners kubernetes-sigs/gateway-api#91

Open

howardjohn mentioned this issue Jan 26, 2024

Wildcard certificate do not work over multiple gateways and/or virtual services istio/istio#49023

Closed

2 tasks

arkodg mentioned this issue Feb 22, 2024

Multiple Gateway listeners with different hostnames and same certificate not working in a browser session envoyproxy/gateway#2675

Open

taoky mentioned this issue Jun 10, 2024

Mac Safari 17.5 环境下，获取 ISO 返回 421 Misdirected Request ustclug/discussions#459

Closed

repokitteh-read-only bot assigned keithmattix Jul 15, 2024

farcaller mentioned this issue Aug 26, 2024

HTTP/2 connection coalescing breaks Gateway API istio/istio#52853

Open

2 tasks

Envoy does not adhere to HTTP/2 RFC 7540 #6767

Envoy does not adhere to HTTP/2 RFC 7540 #6767

Comments

trevorlinton commented May 1, 2019

mattklein123 commented May 2, 2019

alyssawilk commented May 2, 2019

PiotrSikora commented May 2, 2019

jcrowthe commented Aug 20, 2019

haf commented Apr 15, 2020

htuch commented Apr 15, 2020

jpeach commented Apr 28, 2020

HarinadhD commented Jun 18, 2020

mattklein123 commented Jun 18, 2020

jpeach commented Jul 17, 2020

lunighty commented Sep 7, 2020 • edited Loading

jpeach commented Sep 7, 2020

GoelDeepak commented Dec 8, 2020

howardjohn commented Feb 10, 2021 • edited Loading

lambdai commented Feb 11, 2021

maennchen commented May 4, 2021 • edited Loading

kyessenov commented Jan 19, 2024

htuch commented Jan 19, 2024

PiotrSikora commented Jan 23, 2024

htuch commented Jan 23, 2024

kyessenov commented Jan 23, 2024

ldemailly commented Jan 24, 2024

a-b-v commented May 15, 2024

howardjohn commented May 15, 2024

a-b-v commented May 15, 2024

a-b-v commented May 16, 2024

keithmattix commented Jul 15, 2024

ggreenway commented Jul 15, 2024

keithmattix commented Jul 15, 2024

keithmattix commented Jul 15, 2024

ggreenway commented Jul 15, 2024

keithmattix commented Jul 15, 2024

howardjohn commented Jul 15, 2024

keithmattix commented Jul 15, 2024

ggreenway commented Jul 15, 2024

keithmattix commented Jul 16, 2024

ggreenway commented Jul 16, 2024

keithmattix commented Jul 16, 2024

ryanobjc commented Jul 18, 2024

ldemailly commented Jul 18, 2024 • edited Loading

johnlanni commented Nov 20, 2024

lunighty commented Sep 7, 2020 •

edited

Loading

howardjohn commented Feb 10, 2021 •

edited

Loading

maennchen commented May 4, 2021 •

edited

Loading

ldemailly commented Jul 18, 2024 •

edited

Loading