configurable HTTP/2 PING timeouts on kestrel #15104

chrisdot · 2019-10-17T09:34:17Z

I'm trying to evaluate gRPC for my client/server context.

I looked briefly into ASP.Net core 3 and its new exciting gRPC support. As I need to use the gRPC streaming feature in a long lived connected scenario, I needed the HTTP/2 PING feature to detect the zombie connections (basically to know: "Am I still having an up and running connection to the client?")

So I opened a question on the gRPC project , but it seems that this feature can not be implemented as there is no kestrel API to control the HTTP/2 PING.

It seems taht other languages/stacks implement an API like that in other HTTP/2 languages/stacks (like node's HTTP/2)

So I would need a support of HTTP/2 PING feature in kestrel (and HttpClient) to be able to have a implementation of a heartbeat feature on gRPC-dotnet.

Tratcher · 2019-10-17T12:35:40Z

In kestrel adding an API or timer for this would be pretty straight forward.

@shirhatti we should also consider http.sys. A timer would be easier there.

chrisdot · 2019-10-18T20:29:16Z

After a few discussions in the corefx repository for similar change on HttpClient, it is now clear for me that it would be better to have a parametrable timeout system to manage HTTP/2 PING's behavior (rather than direct API access to send PINGs).

My needs rely on a gRPC usage case, where http/2 ping frames are explicitly used. To do that, I would need access to the following equivalent ping parameters taken from the grpc-core to be available on grpc-dotnet (and therefore also in kestrel):

GRPC_ARG_KEEPALIVE_TIME_MS: This channel argument controls the period (in milliseconds) after which a keepalive ping is sent on the transport.

GRPC_ARG_KEEPALIVE_TIMEOUT_MS: This channel argument controls the amount of time (in milliseconds) the sender of the keepalive ping waits for an acknowledgement. If it does not receive an acknowledgment within this time, it will close the connection.

I think that without such a feature, real gRPC streaming scenarios just won't be possible with grpc-dotnet.

Tratcher · 2019-10-18T21:20:16Z

What about all of the advanced options listed in that keepalive doc?

@JamesNK

JamesNK · 2019-10-18T21:46:44Z

I'm not familiar with the details of keepalive. I can't really comment on which are must haves.

JamesNK · 2019-10-18T21:57:10Z

Netty is the primary Java gRPC server. You could look to it as an example of a more general purpose server and see what configuration it has for keepalive.

analogrelay · 2019-10-23T18:24:49Z

Backlogging for now. We don't have enough input to consider this high priority for 5.0 at this time.

chrisdot · 2019-10-23T18:48:59Z

Backlogging for now. We don't have enough input to consider this high priority for 5.0 at this time.

:-( gRPC streaming will be tricky to use without that.
What kind of information do you need?

Tratcher · 2019-10-23T20:03:52Z

:-( gRPC streaming will be tricky to use without that.

Streaming will work normally unless you have connections that are idle for a long time, correct?

chwarr · 2019-10-23T20:40:52Z

Yes, gRPC streaming calls work without HTTP2 PINGs. But they're often used for push-style notification, so it's not uncommon, in my experience, for them to go idle for a long time.

For example, the client starts a streaming call and then the server can push notifications when they arrive. When there are messages flowing, there's no need for the PINGs. The application's messages are conveying useful information and serving as connectivity checks. But when the stream idle, HTTP2 PINGs can be configured to kick in to know whether the connection is still alive and to keep it alive in the face of load balancers that close idle connections (E.g., AWS's Elastic Load Balancer defaults to 350 seconds, Azure's Load Balancer is 4 minutes, Google Cloud's Backend Load Balancer is 30 seconds, I think. I've not used that one.). If the client detects that the connection has died while idle, it can also proactively reconnect to minimize delays when it does later need to send a message.

HTTP2 PINGs mean that the application protocol doesn't need an explicit "Still here" message just for detecting connection liveness. Also, one HTTP2 PING can be used for the entire connection, regardless of how many streaming calls are multiplexed atop it.

The gRPC C library allows PINGs to be configured client to server and server to client, depending on who needs to know.

Let me know if more info would be useful.

analogrelay · 2019-10-23T21:13:34Z

I think we understand the value and the feature, it's a matter of prioritization at this point :).

Is the idea just to keep the connection alive? If so, sending PING frames at some interval is likely to be sufficient. We do that in WebSockets and it seems reasonable.

It gets more complicated if we want to do things like checking for the PING+ACK responses and ensuring they come back in a timely fashion.

@chwarr and @chrisdot Would it be sufficient to just send PING frames on an interval and silently process the response PING+ACK frame?

chwarr · 2019-10-23T22:26:52Z

We don't have enough input to consider this high priority for 5.0 at this time.
I think we understand the value and the feature, it's a matter of prioritization at this point :).

I wasn't sure if you needed more information, more use cases, or more people wanting it. So, I provided more information. :-)

Would it be sufficient to just send PING frames on an interval and silently process the response PING+ACK frame?

Silent processing would be a good start and would probably address 60% of what PINGs are used for in practice by gRPC consumers.

For someone implementing a gRPC server atop Kestrel completely silent processing is likely not sufficient. The server side will need some way of knowing the the PINGs have failed or been sufficiently delayed so that it can fail all the inflight streaming calls atop that HTTP2 connection. Analogously for the client-side, but that's tracked by different issue already.) As mentioned before, @JamesNK is building a gRPC server stack atop Kestrel in the grpc-dotnet project project.

The gRPC HTTP2 protocol has application visible semantics for failed/late PINGs:

If a server initiated PING does not receive a response within the deadline expected by the runtime all outstanding calls on the server will be closed with a CANCELLED status. An expired client initiated PING will cause all calls to be closed with an UNAVAILABLE status.

analogrelay · 2019-10-24T17:08:16Z

I wasn't sure if you needed more information, more use cases, or more people wanting it. So, I provided more information. :-)

A little of both ;). Just trying to get an idea of the best bang of our buck cost/benefit.

Silent processing would be a good start and would probably address 60% of what PINGs are used for in practice by gRPC consumers.

That's good. I think we could do something like this fairly quickly, as it's quite straightforward and we already have a heartbeat timer we can use for this.

The server side will need some way of knowing the the PINGs have failed or been sufficiently delayed so that it can fail all the inflight streaming calls atop that HTTP2 connection

And TCP ACKs are insufficient for this? Tracking outstanding PINGs is quite a bit more costly, but certainly not out of the picture. As you say though, the spec does expect it to be done. We've definitely had our fair share of bad run-ins with TCP ACKing.

I'll move this up to 5.0 to see how far we can get. This seems like a good win for gRPC streaming.

analogrelay · 2019-10-24T17:08:42Z

Actually, clearing the milestone to put this in front of triage again for a bit of discussion before scheduling.

Tratcher · 2019-10-24T17:12:31Z

The server side will need some way of knowing the the PINGs have failed or been sufficiently delayed so that it can fail all the inflight streaming calls atop that HTTP2 connection

And TCP ACKs are insufficient for this? Tracking outstanding PINGs is quite a bit more costly, but certainly not out of the picture. As you say though, the spec does expect it to be done. We've definitely had our fair share of bad run-ins with TCP ACKing.

TCP ACKs are only single hop where PINGs should be end-to-end. In practice a higher layer proxy still might not forward PINGs. For instance you couldn't build a proxy with Kestrel that forwards PINGs, it doesn't expose them in either direction.

analogrelay · 2019-10-24T17:41:35Z

TCP ACKs are only single hop where PINGs should be end-to-end. In practice a higher layer proxy still might not forward PINGs

Very true, good point.

jtattermusch · 2019-10-30T11:18:11Z

CC @jtattermusch

analogrelay · 2019-11-05T21:16:18Z

We'll look at generating new PINGs at a configurable interval for 5.0. Future work (considering clients disconnected if they don't PING+ACK, etc.) is TBD.

shirhatti · 2020-03-27T20:30:15Z

@JamesNK @jtattermusch How do you folks feel about this? Do we need this in Kestrel?

JamesNK · 2020-03-27T20:37:32Z

Maybe. I looked at pings a few weeks ago and had some questions for Jan that have been answered. I'll look at it again.

rolfik · 2020-05-13T08:10:46Z

As I have written in dotnet/runtime#31198, we need quick fix please, otherwise we are forced to use c server gRPC implementation.
Thank You

chrisdot · 2020-05-28T12:06:50Z

Now that we have a nice proposal for HttpClient, wouldn't it be relevant to have the equivalent on the server side? @JamesNK ?

chrisdot mentioned this issue Oct 17, 2019

HTTP2 KeepAlive/PING control support grpc/grpc-dotnet#577

Closed

Tratcher added HTTP2 area-servers feature-kestrel enhancement This issue represents an ask for new feature or an enhancement to an existing one labels Oct 17, 2019

chrisdot changed the title ~~Enable HTTP/2 PING API on kestrel~~ configurable HTTP/2 PING timeouts on kestrel Oct 18, 2019

analogrelay added this to the Backlog milestone Oct 23, 2019

analogrelay removed this from the Backlog milestone Oct 24, 2019

jkotalik added the triage-focus Add this label to flag the issue for focus at triage label Oct 31, 2019

analogrelay added this to the 5.0.0-preview1 milestone Nov 5, 2019

chrisdot mentioned this issue Feb 1, 2020

configurable HTTP/2 PING timeouts in HttpClient dotnet/runtime#31198

Closed

ndglover mentioned this issue Feb 24, 2020

Keep alive missing? grpc/grpc-dotnet#770

Closed

analogrelay removed this from the 5.0.0-preview1 milestone Mar 11, 2020

shirhatti added this to the Backlog milestone Mar 27, 2020

JamesNK modified the milestones: Backlog, 5.0.0 May 29, 2020

JamesNK self-assigned this May 29, 2020

JamesNK modified the milestones: 5.0.0, 5.0.0-preview7 Jun 1, 2020

JamesNK removed the triage-focus Add this label to flag the issue for focus at triage label Jun 1, 2020

JamesNK mentioned this issue Jun 5, 2020

Add HTTP/2 keep alive pings #22565

Merged

JamesNK closed this as completed in #22565 Jun 15, 2020

ghost locked as resolved and limited conversation to collaborators Jul 15, 2020

danroth27 added the Done This issue has been fixed label Jul 20, 2020

amcasey added area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions and removed area-runtime labels Jun 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

configurable HTTP/2 PING timeouts on kestrel #15104

configurable HTTP/2 PING timeouts on kestrel #15104

chrisdot commented Oct 17, 2019

Tratcher commented Oct 17, 2019

chrisdot commented Oct 18, 2019

Tratcher commented Oct 18, 2019

JamesNK commented Oct 18, 2019

JamesNK commented Oct 18, 2019 •

edited

Loading

analogrelay commented Oct 23, 2019

chrisdot commented Oct 23, 2019

Tratcher commented Oct 23, 2019

chwarr commented Oct 23, 2019

analogrelay commented Oct 23, 2019

chwarr commented Oct 23, 2019

analogrelay commented Oct 24, 2019

analogrelay commented Oct 24, 2019

Tratcher commented Oct 24, 2019 •

edited

Loading

analogrelay commented Oct 24, 2019

jtattermusch commented Oct 30, 2019

analogrelay commented Nov 5, 2019

shirhatti commented Mar 27, 2020

JamesNK commented Mar 27, 2020

rolfik commented May 13, 2020

chrisdot commented May 28, 2020

configurable HTTP/2 PING timeouts on kestrel #15104

configurable HTTP/2 PING timeouts on kestrel #15104

Comments

chrisdot commented Oct 17, 2019

Tratcher commented Oct 17, 2019

chrisdot commented Oct 18, 2019

Tratcher commented Oct 18, 2019

JamesNK commented Oct 18, 2019

JamesNK commented Oct 18, 2019 • edited Loading

analogrelay commented Oct 23, 2019

chrisdot commented Oct 23, 2019

Tratcher commented Oct 23, 2019

chwarr commented Oct 23, 2019

analogrelay commented Oct 23, 2019

chwarr commented Oct 23, 2019

analogrelay commented Oct 24, 2019

analogrelay commented Oct 24, 2019

Tratcher commented Oct 24, 2019 • edited Loading

analogrelay commented Oct 24, 2019

jtattermusch commented Oct 30, 2019

analogrelay commented Nov 5, 2019

shirhatti commented Mar 27, 2020

JamesNK commented Mar 27, 2020

rolfik commented May 13, 2020

chrisdot commented May 28, 2020

JamesNK commented Oct 18, 2019 •

edited

Loading

Tratcher commented Oct 24, 2019 •

edited

Loading