Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

configurable HTTP/2 PING timeouts on kestrel #15104

Closed
chrisdot opened this issue Oct 17, 2019 · 21 comments · Fixed by #22565
Closed

configurable HTTP/2 PING timeouts on kestrel #15104

chrisdot opened this issue Oct 17, 2019 · 21 comments · Fixed by #22565
Assignees
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions Done This issue has been fixed enhancement This issue represents an ask for new feature or an enhancement to an existing one feature-kestrel HTTP2

Comments

@chrisdot
Copy link

I'm trying to evaluate gRPC for my client/server context.

I looked briefly into ASP.Net core 3 and its new exciting gRPC support. As I need to use the gRPC streaming feature in a long lived connected scenario, I needed the HTTP/2 PING feature to detect the zombie connections (basically to know: "Am I still having an up and running connection to the client?")

So I opened a question on the gRPC project , but it seems that this feature can not be implemented as there is no kestrel API to control the HTTP/2 PING.

It seems taht other languages/stacks implement an API like that in other HTTP/2 languages/stacks (like node's HTTP/2)

So I would need a support of HTTP/2 PING feature in kestrel (and HttpClient) to be able to have a implementation of a heartbeat feature on gRPC-dotnet.

@Tratcher
Copy link
Member

In kestrel adding an API or timer for this would be pretty straight forward.

@shirhatti we should also consider http.sys. A timer would be easier there.

@Tratcher Tratcher added HTTP2 area-servers feature-kestrel enhancement This issue represents an ask for new feature or an enhancement to an existing one labels Oct 17, 2019
@chrisdot
Copy link
Author

After a few discussions in the corefx repository for similar change on HttpClient, it is now clear for me that it would be better to have a parametrable timeout system to manage HTTP/2 PING's behavior (rather than direct API access to send PINGs).

My needs rely on a gRPC usage case, where http/2 ping frames are explicitly used. To do that, I would need access to the following equivalent ping parameters taken from the grpc-core to be available on grpc-dotnet (and therefore also in kestrel):

  • GRPC_ARG_KEEPALIVE_TIME_MS: This channel argument controls the period (in milliseconds) after which a keepalive ping is sent on the transport.
  • GRPC_ARG_KEEPALIVE_TIMEOUT_MS: This channel argument controls the amount of time (in milliseconds) the sender of the keepalive ping waits for an acknowledgement. If it does not receive an acknowledgment within this time, it will close the connection.

I think that without such a feature, real gRPC streaming scenarios just won't be possible with grpc-dotnet.

@chrisdot chrisdot changed the title Enable HTTP/2 PING API on kestrel configurable HTTP/2 PING timeouts on kestrel Oct 18, 2019
@Tratcher
Copy link
Member

What about all of the advanced options listed in that keepalive doc?

@JamesNK

@JamesNK
Copy link
Member

JamesNK commented Oct 18, 2019

I'm not familiar with the details of keepalive. I can't really comment on which are must haves.

@JamesNK
Copy link
Member

JamesNK commented Oct 18, 2019

Netty is the primary Java gRPC server. You could look to it as an example of a more general purpose server and see what configuration it has for keepalive.

@analogrelay analogrelay added this to the Backlog milestone Oct 23, 2019
@analogrelay
Copy link
Contributor

Backlogging for now. We don't have enough input to consider this high priority for 5.0 at this time.

@chrisdot
Copy link
Author

Backlogging for now. We don't have enough input to consider this high priority for 5.0 at this time.

:-( gRPC streaming will be tricky to use without that.
What kind of information do you need?

@Tratcher
Copy link
Member

:-( gRPC streaming will be tricky to use without that.

Streaming will work normally unless you have connections that are idle for a long time, correct?

@chwarr
Copy link

chwarr commented Oct 23, 2019

Yes, gRPC streaming calls work without HTTP2 PINGs. But they're often used for push-style notification, so it's not uncommon, in my experience, for them to go idle for a long time.

For example, the client starts a streaming call and then the server can push notifications when they arrive. When there are messages flowing, there's no need for the PINGs. The application's messages are conveying useful information and serving as connectivity checks. But when the stream idle, HTTP2 PINGs can be configured to kick in to know whether the connection is still alive and to keep it alive in the face of load balancers that close idle connections (E.g., AWS's Elastic Load Balancer defaults to 350 seconds, Azure's Load Balancer is 4 minutes, Google Cloud's Backend Load Balancer is 30 seconds, I think. I've not used that one.). If the client detects that the connection has died while idle, it can also proactively reconnect to minimize delays when it does later need to send a message.

HTTP2 PINGs mean that the application protocol doesn't need an explicit "Still here" message just for detecting connection liveness. Also, one HTTP2 PING can be used for the entire connection, regardless of how many streaming calls are multiplexed atop it.

The gRPC C library allows PINGs to be configured client to server and server to client, depending on who needs to know.

Let me know if more info would be useful.

@analogrelay
Copy link
Contributor

I think we understand the value and the feature, it's a matter of prioritization at this point :).

Is the idea just to keep the connection alive? If so, sending PING frames at some interval is likely to be sufficient. We do that in WebSockets and it seems reasonable.

It gets more complicated if we want to do things like checking for the PING+ACK responses and ensuring they come back in a timely fashion.

@chwarr and @chrisdot Would it be sufficient to just send PING frames on an interval and silently process the response PING+ACK frame?

@chwarr
Copy link

chwarr commented Oct 23, 2019

We don't have enough input to consider this high priority for 5.0 at this time.
I think we understand the value and the feature, it's a matter of prioritization at this point :).

I wasn't sure if you needed more information, more use cases, or more people wanting it. So, I provided more information. :-)

Would it be sufficient to just send PING frames on an interval and silently process the response PING+ACK frame?

Silent processing would be a good start and would probably address 60% of what PINGs are used for in practice by gRPC consumers.

For someone implementing a gRPC server atop Kestrel completely silent processing is likely not sufficient. The server side will need some way of knowing the the PINGs have failed or been sufficiently delayed so that it can fail all the inflight streaming calls atop that HTTP2 connection. Analogously for the client-side, but that's tracked by different issue already.) As mentioned before, @JamesNK is building a gRPC server stack atop Kestrel in the grpc-dotnet project project.

The gRPC HTTP2 protocol has application visible semantics for failed/late PINGs:

If a server initiated PING does not receive a response within the deadline expected by the runtime all outstanding calls on the server will be closed with a CANCELLED status. An expired client initiated PING will cause all calls to be closed with an UNAVAILABLE status.

@analogrelay
Copy link
Contributor

I wasn't sure if you needed more information, more use cases, or more people wanting it. So, I provided more information. :-)

A little of both ;). Just trying to get an idea of the best bang of our buck cost/benefit.

Silent processing would be a good start and would probably address 60% of what PINGs are used for in practice by gRPC consumers.

That's good. I think we could do something like this fairly quickly, as it's quite straightforward and we already have a heartbeat timer we can use for this.

The server side will need some way of knowing the the PINGs have failed or been sufficiently delayed so that it can fail all the inflight streaming calls atop that HTTP2 connection

And TCP ACKs are insufficient for this? Tracking outstanding PINGs is quite a bit more costly, but certainly not out of the picture. As you say though, the spec does expect it to be done. We've definitely had our fair share of bad run-ins with TCP ACKing.

I'll move this up to 5.0 to see how far we can get. This seems like a good win for gRPC streaming.

@analogrelay analogrelay removed this from the Backlog milestone Oct 24, 2019
@analogrelay
Copy link
Contributor

Actually, clearing the milestone to put this in front of triage again for a bit of discussion before scheduling.

@Tratcher
Copy link
Member

Tratcher commented Oct 24, 2019

The server side will need some way of knowing the the PINGs have failed or been sufficiently delayed so that it can fail all the inflight streaming calls atop that HTTP2 connection

And TCP ACKs are insufficient for this? Tracking outstanding PINGs is quite a bit more costly, but certainly not out of the picture. As you say though, the spec does expect it to be done. We've definitely had our fair share of bad run-ins with TCP ACKing.

TCP ACKs are only single hop where PINGs should be end-to-end. In practice a higher layer proxy still might not forward PINGs. For instance you couldn't build a proxy with Kestrel that forwards PINGs, it doesn't expose them in either direction.

@analogrelay
Copy link
Contributor

TCP ACKs are only single hop where PINGs should be end-to-end. In practice a higher layer proxy still might not forward PINGs

Very true, good point.

@jtattermusch
Copy link

CC @jtattermusch

@jkotalik jkotalik added the triage-focus Add this label to flag the issue for focus at triage label Oct 31, 2019
@analogrelay analogrelay added this to the 5.0.0-preview1 milestone Nov 5, 2019
@analogrelay
Copy link
Contributor

We'll look at generating new PINGs at a configurable interval for 5.0. Future work (considering clients disconnected if they don't PING+ACK, etc.) is TBD.

@shirhatti
Copy link
Contributor

@JamesNK @jtattermusch How do you folks feel about this? Do we need this in Kestrel?

@shirhatti shirhatti added this to the Backlog milestone Mar 27, 2020
@JamesNK
Copy link
Member

JamesNK commented Mar 27, 2020

Maybe. I looked at pings a few weeks ago and had some questions for Jan that have been answered. I'll look at it again.

@rolfik
Copy link

rolfik commented May 13, 2020

As I have written in dotnet/runtime#31198, we need quick fix please, otherwise we are forced to use c server gRPC implementation.
Thank You

@chrisdot
Copy link
Author

Now that we have a nice proposal for HttpClient, wouldn't it be relevant to have the equivalent on the server side? @JamesNK ?

@JamesNK JamesNK modified the milestones: Backlog, 5.0.0 May 29, 2020
@JamesNK JamesNK self-assigned this May 29, 2020
@JamesNK JamesNK modified the milestones: 5.0.0, 5.0.0-preview7 Jun 1, 2020
@JamesNK JamesNK removed the triage-focus Add this label to flag the issue for focus at triage label Jun 1, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Jul 15, 2020
@danroth27 danroth27 added the Done This issue has been fixed label Jul 20, 2020
@amcasey amcasey added area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions and removed area-runtime labels Jun 2, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions Done This issue has been fixed enhancement This issue represents an ask for new feature or an enhancement to an existing one feature-kestrel HTTP2
Projects
None yet
Development

Successfully merging a pull request may close this issue.