Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

h2 connection pool is limited by SETTINGS_MAX_CONCURRENT_STREAMS #2941

Closed
PiotrSikora opened this issue Mar 29, 2018 · 44 comments
Closed

h2 connection pool is limited by SETTINGS_MAX_CONCURRENT_STREAMS #2941

PiotrSikora opened this issue Mar 29, 2018 · 44 comments
Assignees
Labels

Comments

@PiotrSikora
Copy link
Contributor

PiotrSikora commented Mar 29, 2018

It turns out that gRPC async client won't open more than SETTINGS_MAX_CONCURRENT_STREAMS concurrent streams to xDS gRPC server (good!), but EDS monitors are long-lived streams, waiting forever, and it doesn't look that gRPC async client opens more than a single HTTP/2 connection to backend (bad!), which means that total number of working EDS endpoints is limited by xDS's settings, and only the first SETTINGS_MAX_CONCURRENT_STREAMS EDS will be able to establish HTTP/2 stream and receive responses.

The solution is for gRPC async client to open another HTTP/2 connection once it reaches xDS's SETTINGS_MAX_CONCURRENT_STREAMS.

Temporary workaround is to increase SETTINGS_MAX_CONCURRENT_STREAMS on xDS server, but that breaks once there are middle proxies involved.

See istio/istio#4593 for background.

cc @htuch @mattklein123 @costinm @ldemailly @lizan

@htuch
Copy link
Member

htuch commented Mar 29, 2018

Unoptimized EDS is the root cause. With ADS, we have a single EDS stream globally, which is nice for the obvious reasons. With non-ADS, currently Envoy opens a new stream for each cluster, regardless of whether they point at the same management server and could be converged. I think it would be best to go in and fix EDS to reuse streams. It's not that hard, I'd SWAG this as 2-3 days effort including tests.

@ldemailly
Copy link

ldemailly commented Mar 29, 2018

That would only address envoy->pilot but not envoy->unmodified grpc app

we can't assume all applications will change their configuration to deal with envoy limitations ?
(unless you are saying that issue is specific to the grpc client used for *DS v2 but not for the grpc/h2 filter ?)

@lizan
Copy link
Member

lizan commented Mar 29, 2018

I believe same problem exists for HTTP Async Client as well, if the cluster is configured with H2. The gRPC Async Client is simply depending on that.

@lizan
Copy link
Member

lizan commented Mar 29, 2018

@htuch EDS optimizing is good and solves the exact issue for istio, but I feel fixing gRPC async client (and underlying HTTP async client) is more important. We cannot rely on gRPC stream optimization for all cases.

@mattklein123
Copy link
Member

I don't really understand what EDS stream optimization means. Do you mean changing the API to put requests/responses over a single stream?

The proper fix here is actually at the HTTP/2 connection pool level. The pool should be able to create multiple connections per backend host if needed. There are other reasons we actually want this also (allowing for more connection/stream fan out in the case of middle proxies).

@mattklein123 mattklein123 changed the title gRPC async client is limited by SETTINGS_MAX_CONCURRENT_STREAMS h2 connection pool is limited by SETTINGS_MAX_CONCURRENT_STREAMS Mar 29, 2018
@htuch
Copy link
Member

htuch commented Mar 29, 2018

@mattklein123 today we have 1 EDS stream per cluster. EDS can have multiple cluster subscriptions per stream, but this isn't done today outside of ADS.

@htuch
Copy link
Member

htuch commented Mar 29, 2018

There are no API changes, only Envoy implementation tuning.

@mattklein123
Copy link
Member

mattklein123 commented Mar 29, 2018

@htuch OK I see. Can we open a different issue on that? I would like to keep this issue open to track h2 connection pool which we should also fix at some point.

@htuch
Copy link
Member

htuch commented Mar 29, 2018

@mattklein123 Sure, #2943.

@costinm
Copy link
Contributor

costinm commented Mar 30, 2018

One possible option (or hack?) would be to override the server-sent 'max streams' until multiple connections are implemented. If the app happens to use a h2 stack that doesn't enforce max streams
they may work.

Or document how to set max stream for common languages and stacks (along with docs on how to
set h2 without mtls, which is also required for upstream h2 apps behind envoy).

The biggest problem was the lack of clear information about this limitation.

@PiotrSikora
Copy link
Contributor Author

One possible option (or hack?) would be to override the server-sent 'max streams' until multiple connections are implemented. If the app happens to use a h2 stack that doesn't enforce max streams . they may work.

That's going to break more stuff than fix.

@costinm
Copy link
Contributor

costinm commented Mar 30, 2018

Extra info:

  • http2 implementaton in golang1.10 sets default to 250 ( with a comment "make it 100 as GFE"), golang grpc defaults to 100.
  • Piotr is right - at least golang impl does check and terminates the connection if the advertised max stream is not respected (processHeaders()).

So the real fix - or lots of docs and changes in upstream servers are required, no quick hack.

@mattklein123
Copy link
Member

For folks looking for management server workarounds in the interim, here is an example of a Go management server change which is pretty trivial: projectcontour/contour#308

@mattklein123 mattklein123 added this to the 1.7.0 milestone Apr 1, 2018
@ldemailly
Copy link

I am not able to reproduce this issue.

I wonder if the problem with istio pilot was that the proxy is/was setup as TCP ? @costinm

the only settings I see when using envoy as a grpc (h2) proxy is:
22:17:01 http2: Framer 0xc420630000: read SETTINGS len=6, settings: INITIAL_WINDOW_SIZE=268435456

while I go directly against my same go server with a max streams of 16:

22:21:38 http2: Framer 0xc4204e8000: read SETTINGS len=6, settings: MAX_CONCURRENT_STREAMS=16

and either way I can do > 100 simultaneous streams on 1 connection

@ldemailly
Copy link

if I remove maxstream on the backend side I really push it (doing 1000 simultaneous streams * 2 connections) I get about 15% errors:

22:33:59 W grpcrunner.go:96> Error making grpc call: rpc error: code = Internal desc = transport: received the unexpected content-type "text/plain"
...
Ended after 17.840324262s : 4000 calls. qps=224.21
Sleep times : count 2000 avg 4.5242081 +/- 1.416 min 1.570214834 max 6.106976232 sum 9048.41625
Aggregated Function Time : count 4000 avg 4.6826156 +/- 2.232 min 1.732992693 max 9.83520712 sum 18730.4624
# range, mid point, percentile, count
>= 1.73299 <= 2 , 1.8665 , 10.28, 411
> 2 <= 3 , 2.5 , 28.68, 736
> 3 <= 4 , 3.5 , 48.15, 779
> 4 <= 5 , 4.5 , 61.98, 553
> 5 <= 7.5 , 6.25 , 83.17, 848
> 7.5 <= 9.83521 , 8.6676 , 100.00, 673
# target 50% 4.13382
# target 75% 6.53597
# target 90% 8.44727
# target 99% 9.69641
# target 99.9% 9.82133
Ping SERVING : 3398
Ping -1 : 602

I get in the envoy ingress logs (no errors on the envoy side car) :

[2018-04-03T05:29:44.084Z] "POST /fgrpc.PingServer/Ping HTTP/2" 503 UO 23 57 6096 - "10.138.0.12" "grpc-go/1.11.1" "cd0cf852-3577-9a57-a19c-d6435302ff2b" "fortio-stage.istio.io:80" "-"

What is UO ?

@PiotrSikora
Copy link
Contributor Author

PiotrSikora commented Apr 3, 2018

@ldemailly I'm pretty sure that gRPC xDS client talks directly to selected cluster over HTTP/2, so listener settings (HTTP or TCP proxy) for Pilot wouldn't matter.

As for the difference in SETTINGS received when proxying via Envoy vs connecting directly, those settings are per hop, so that's expected. When you say that you can do >100 simultaneous streams over single connection, you mean single connection to Envoy, correct? Envoy still establishes multiple connections to the backend server (respecting 16 streams per connection)?

From documentation:
UO: Upstream overflow (circuit breaking) in addition to 503 response code.

@prune998
Copy link

prune998 commented Apr 3, 2018

I'm dealing with this kind of issues using Envoy (Istio) since last week where I'm trying to DDOS my application. istio/istio#4658

Go client (GO 1.10 / gRPC) -TLS- > Envoy (Istio 0.7.0) --> server (Go 1.10 / gRPC)

Trying to DDOS my app, I start some Go client, which in fact each simulate 100 real clients, so 100 TCP cnx to Envoy each.
My client then send a unary request to one endpoint, then open a stream and start sending the payload.
Usually, one client = 1TCP + 1 GET + 1 STREAM.

I see Envoy opening 4 TCP cnx to my server application, proxying every client connections into it.

Which means that, if my gRPC server is setup to allow 250 streams per client, I will be able to handle 1000 clients only (250 streams * 4 TCP connections).

While I'm not sure for the numbers, I can confirm the behaviour.
Setting a higher limit in the server using grpc.MaxConcurrentStreams(math.MaxInt32), mitigated the issue.

My conclusion is that :

  • issue is not linked to config endpoints (RDS...).
  • according to the GO implementation golang/net@1c05540
Currently if the http2.Transport hits SettingsMaxConcurrentStreams for a
server, it just makes a new TCP connection and creates the stream on the
new connection. This CL updates that behavior to instead block RoundTrip
until a new stream is available.

Which means that if Envoy only opens 1 TCP cnx to the server, and if the MaxConcurrentStreams is 250, Envoy will not be able to handle more than 250 clients

I hope I'm wrong, but this is the behaviour I see.
While 100 or 250 is clearly too low, 100.000 sounds around a good value for both gRPC and TCP to use max ressources without exhaustion.

I can't find anything about the real behaviour Envoy should have regarding this
I'm not a dev and have really hard time understanding Envoy C++ code.
Could anyone point us to the right piece of code and maybe explain what is going on in this situation ?

Also, I f my server is set to MaxConcurrentStreams, what will be transmitted to the client ? Is it the same value or the one (which one ?) from the circuit breaker config ?

@prune998
Copy link

prune998 commented Apr 4, 2018

I also noted something special with the pool of TCP cnx used by Envoy when using gRPC Streams.

setup

  • Envoy installed as K8s Ingress using Istio
  • Istio DestinationPolicy defined for the the service as :
spec:
  circuitBreaker:
    simpleCb:
      httpConsecutiveErrors: 10
      httpDetectionInterval: 1s
      httpMaxEjectionPercent: 100
      httpMaxPendingRequests: 1000
      httpMaxRequests: 100000
      httpMaxRequestsPerConnection: 500
      maxConnections: 100000
      sleepWindow: 30s
  destination:
    name: useredged
  • I can confirm envoy is using my setup doing a curl http://localhost:15000/clusters :
out.useredged.dev.svc.cluster.local|grpc::default_priority::max_connections::100000
out.useredged.dev.svc.cluster.local|grpc::default_priority::max_pending_requests::1000
out.useredged.dev.svc.cluster.local|grpc::default_priority::max_requests::100000
out.useredged.dev.svc.cluster.local|grpc::default_priority::max_retries::3
out.useredged.dev.svc.cluster.local|grpc::high_priority::max_connections::1024
out.useredged.dev.svc.cluster.local|grpc::high_priority::max_pending_requests::1024
out.useredged.dev.svc.cluster.local|grpc::high_priority::max_requests::1024
out.useredged.dev.svc.cluster.local|grpc::high_priority::max_retries::3

Note that the httpMaxRequestsPerConnection parameter is not displayed here...

load test

I'm using our gRPC client to load test. Each client opens 500 TCP connextions to Envoy then start a bidirectional stream and send one message per second.
I'm starting one client at a time, letting all the cx and rq to setup.

In the above setup, everything is fine until a reach the 4000 active requests :

  • 2000 is OK
    cluster.out.useredged.dev.svc.cluster.local|grpc.upstream_rq_active: 2000
    cluster.out.useredged.dev.svc.cluster.local|grpc.upstream_cx_active: 6
  • 3000 is OK
    cluster.out.useredged.dev.svc.cluster.local|grpc.upstream_rq_active: 3000
    cluster.out.useredged.dev.svc.cluster.local|grpc.upstream_cx_active: 8
  • 4000 start failing some requests
    cluster.out.useredged.dev.svc.cluster.local|grpc.upstream_rq_503: 2500
    cluster.out.useredged.dev.svc.cluster.local|grpc.upstream_rq_5xx: 2500
    cluster.out.useredged.dev.svc.cluster.local|grpc.upstream_rq_active: 1500
    cluster.out.useredged.dev.svc.cluster.local|grpc.upstream_cx_active: 8

Discussion

this can be reproduced anytime.
If I increase the httpMaxRequestsPerConnection in the DestinationPolicy to 1000, I will be able to reach 8000 rq before getting 503s.

At the same time I can see hundreds of active cx on a HTTP 1.1 endpoint.
Why can't Envoy scale the number of TCP cnx to the backend server ?

@prune998
Copy link

I can now add some more info.
I still don't understand where the issue is.
My issue does not concern EDS or XDS but Envoy proxying from my client to my server.
In this situation, Envoy creates ONE TCP connexion to the upstream server per CPU. In my tests with 2 CPUs --> 2 TCP connexions
Then Envoy will multiplex HTTP/2 gRPC calls into the two TCP cnx.
Here are the scenarios for N clients on servers with 1 CPU :

one Envoy

N client --> 1TCP/1Stream each --> Envoy --> 1TCP/N Stream --> server

Everything working as expected, can go up to 1024 Streams as per Envoy's default. If I try more, I do get some 503's, which is the expected behaviour

two Envoy

N client --> 1TCP/1Stream each --> Envoy --> 1TCP/N Stream --> Envoy Sidecar --> 1TCP/N Stream --> server

In fact, what is happening is :

N client --> 1TCP/1Stream each --> Envoy --> 1TCP/100 Stream --> Envoy Sidecar --> 1TCP/100 Stream --> server

Which means if N > 100, I lose connexions.

I can't find anything why the limit is 100, but I can reproduce this situation every time.
If I have 2 CPUs, I see 2 TCP cnx and 100 streams in each...

I'm still digging...

@prune998
Copy link

I really think this issue is not linked to istio/istio#4940
see istio/istio#4940 (comment)

Maybe I should open a new issue here as it really sound to live inside Envoy ?

@mattklein123
Copy link
Member

@prune998 your issue is a different issue. I know what the issue is. Please delete all your comments from this issue as they are unrelated and open them in a new issue. Thanks.

@mattklein123 mattklein123 added this to the 1.10.0 milestone Dec 14, 2018
@mjpitz
Copy link

mjpitz commented Jan 28, 2019

Doesn't seem like anyone is actively working on this. Mind if I look into implementing this?

@htuch
Copy link
Member

htuch commented Jan 29, 2019

@mjpitz I have invited you to envoyproxy org, if you accept I can assign you the issue.

@mjpitz
Copy link

mjpitz commented Jan 29, 2019

@htuch : Joined

@htuch htuch removed the help wanted Needs help! label Jan 29, 2019
@mjpitz
Copy link

mjpitz commented Jan 30, 2019

Alright. I spent a fair bit of time diving into the code base.

General Notes:

  • conn_pool.cc maintains the pooling semantic
  • manages 2 channels (primary_client_ and draining_client_)
  • all requests are sent over the primary_client_ until max_requests_per_connection is reached
  • if this is set to 0, then maxTotalStreams() is used
  • maxTotalStreams is set to 1 << 29 by default
  • when max_requests_per_connection is reached, the primary and draining clients are flipped
  • there's a comment in the movePrimaryClientToDraining method about it not worth keeping a list
  • max_requests_per_connection differs from SETTINGS_MAX_CONCURRENT_STREAMS
    • max_requests_per_connection comes from the cds service

When primary is swapped to draining:

  • draining_client_ is closed
  • primary_client_ is moved to draining
  • primary_client_ is nulled out

Implementation proposal:

  • All changes isolated to source/common/http/http2/conn_pool.cc
  • Move away from primary/draining semantic to an elastic list
  • ActiveClientPtrs are pruned from the list when:
    • there are 0 active request on the client
    • and there are newer connections in the pool
  • By leveraging a linked list, we will:
    • have constant time access to the "primary_client_" which will now be the last element in the list
    • easily prune "draining_clients_" which will now be older elements in the list

Thoughts on this approach? Anything major I'm missing or things I should consider?

@stale
Copy link

stale bot commented Mar 1, 2019

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions.

@stale stale bot added the stale stalebot believes this issue/PR has not been touched recently label Mar 1, 2019
@mattklein123 mattklein123 removed this from the 1.10.0 milestone Mar 1, 2019
@stale stale bot removed the stale stalebot believes this issue/PR has not been touched recently label Mar 1, 2019
@mjpitz
Copy link

mjpitz commented Mar 6, 2019

Sorry I haven't gotten to this yet. I got sidetracked with a presentation. Hopefully I should be able to get to it in the next couple of weeks.

@stale
Copy link

stale bot commented Apr 5, 2019

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions.

@stale stale bot added the stale stalebot believes this issue/PR has not been touched recently label Apr 5, 2019
@ldemailly
Copy link

keepalive

@stale stale bot removed the stale stalebot believes this issue/PR has not been touched recently label Apr 9, 2019
@mjpitz
Copy link

mjpitz commented Apr 9, 2019

I've been talking myself in circles so it would be nice to talk some of this through some more before implementing.

Since being assigned this ticket, I've been reading up and understanding more about how the Envoy threading model works. This was well captured by the following blog post and tech talk:

A quick TLDR: envoy maintains a pool of workers (configured by --concurrency, default is number of cores on the machine). Each worker maintains its own connection (e.g a thread local connection). To me, this means that the maximum number of concurrent streams would actually be concurrency * MAX_CONCURRENT_STREAMS (assuming even distribution of requests across connections). It would seem that if we need to increase the throughput of the system to get around MAX_CONCURRENT_STREAMS, we could simply modify the --concurrency parameter.

One concern that came to mind was how this impacts the purpose / intent of MAX_CONCURRENT_STREAMS. From what I can gather as far as resources go, it's purpose isn't well documented. As an engineer, I think about this setting as a way for service owners to throttle workloads being performed by their clients. By implementing an "auto-scaling" feature like this, we effectively bypass this setting. I've read a few articles around this and many people are being suggested to work with the service owner to better understand the workload they are trying to perform. Here's one issue under the http2-spec detailing a similar response.

Open Questions:

  • Is my understanding around concurrency * MAX_CONCURRENT_STREAMS true?
  • Does anyone have any insight into the intent / purpose / application of MAX_CONCURRENT_STREAMS?

Ultimately, I've found myself in a personal philosophical debate of "Sure, I could add this feature, but should I?" Getting some clarity around my questions above will help resolve my internal debate.

@ldemailly
Copy link

Does anyone have any insight into the intent / purpose / application of MAX_CONCURRENT_STREAMS?

Afaik, it is the maximum multiplexing of logical grpc streams over a single tcp socket/connection; it's not meant to mean no additional streams should exist between the client and server but just a measure of limitation of throughput and congestion over a single socket (imo)

@mjpitz
Copy link

mjpitz commented Apr 9, 2019

I know gRPC defaults to one connection per backend returned via name resolver (at least in Java). The google-apis/gax project has a ChannelPool that allows you to increase the number of connections per backend. (just to add another reference to where we're already bypassing this constraint)

@htuch
Copy link
Member

htuch commented Apr 9, 2019

@mjpitz it's probably a good idea to distinguish between data and control plane here. For data plane and backends, we will effectively have an upper bound of concurrency * MAX_CONCURRENT_STREAMS. For the control plane, since this only runs on the main thread, we only have MAX_CONCURRENT_STREAMS.

I agree with the idea of making the connection pool support multiple connections. @oschaaf has also been thinking about this in the context of https://github.com/envoyproxy/envoy-perf/tree/master/nighthawk

@mattklein123 mattklein123 added the help wanted Needs help! label Apr 10, 2019
@mattklein123
Copy link
Member

I agree with the idea of making the connection pool support multiple connections. @oschaaf has also been thinking about this in the context of https://github.com/envoyproxy/envoy-perf/tree/master/nighthawk

Yes, we should definitely do this. There are additional reasons including mitigating head of line blocking in certain cases. I think we might have an issue open specifically on this but I can't remember and I quick search doesn't return anything.

@mjpitz
Copy link

mjpitz commented Apr 10, 2019

Got an initial patch based on my proposal from back in January. Working on running tests. Here's the initial diff.

https://github.com/mjpitz/envoy/pull/1/files

@mjpitz
Copy link

mjpitz commented Apr 10, 2019

@mattklein123 : if we're also looking to mitigate hol-blocking, we probably want to provision the connections ahead of time yeah?

edit: actually.. preemtively establishing connections doesn't seem like it would help much... looks like there are a couple other good solutions though

@mattklein123
Copy link
Member

@mjpitz yeah, pre-creation is another optimization, though IMO I would recommend tracking that under a new/different issue. I've discussed wanting to do this many times with @alyssawilk (who also has thoughts on allowing multiple h2 connections).

@htuch
Copy link
Member

htuch commented Apr 10, 2019

Prefetch is also something that @oschaaf and I have discussed in the context of Nighthawk.

@mjpitz
Copy link

mjpitz commented Apr 10, 2019

So I'm not sure how I thought I was originally going to get the settings info from the h2 conn_pool. Getting deeper into the code, it seems like that's pretty well encapsulated by the client_codec (probably as it should be).

I'm curious what other ideas are floating around so I'll check in on this again tomorrow.

@mattklein123
Copy link
Member

Fixed with recent connection pool changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants