Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kestrel doesn't reset KeepAlive timeout on HTTP2 PING frames #24601

Closed
davidni opened this issue Aug 5, 2020 · 18 comments · Fixed by #24644
Closed

Kestrel doesn't reset KeepAlive timeout on HTTP2 PING frames #24601

davidni opened this issue Aug 5, 2020 · 18 comments · Fixed by #24644
Assignees
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions enhancement This issue represents an ask for new feature or an enhancement to an existing one feature-kestrel feature-yarp This issue is related to work on yarp
Milestone

Comments

@davidni
Copy link

davidni commented Aug 5, 2020

Describe the bug

NOTE: This is related, but a separate issue from #15104. That one is about server-initiated PING's to the client to proactively detect zombie connections; whereas this one is about Kestrel not resetting its keepalive timeout in response to client-initiated PING's.

gRPC clients can be configured to send periodic pings to keep a channel open for prolonged periods of time even when there aren't any active streams. Kestrel however drops an idle connection after KestrelServerLimits.KeepAliveTimeout elapses. Http2Connection does not reset the TimeoutControl when PING frames are received, resulting in connections that (supposedly) should remain open to be aborted

To Reproduce

Use a gRPC client configured as follows (using gRPC for C#), with the following options:

var options = new List<ChannelOption>()
{
    // ...
    new ChannelOption("grpc.client_idle_timeout_ms", 15 * 60 * 1000),
    new ChannelOption("grpc.keepalive_time_ms", 10 * 1000),
    new ChannelOption("grpc.keepalive_permit_without_calls", 1),
};

This should result in PING frames being sent every 10 seconds, and the channel should remain open up to 15 minutes even when idle. However, Kestrel terminates the connection after its KeepAliveTimeout elapses (2 min by default).

Exceptions

N/A

Further technical details

  • ASP.NET Core 3.1 (code inspection indicates latest master has the same issue)

Suggested fix would be to add a call to TimeoutControl.SetTimeout(Limits.KeepAliveTimeout.Ticks, TimeoutReason.KeepAlive); when processing a PING frame in Http2Connection.ProcessPingFrameAsync.

Is this a legit bug or is it by design?

@JamesNK @Tratcher

@Tratcher Tratcher added area-servers feature-yarp This issue is related to work on yarp feature-kestrel labels Aug 5, 2020
@Tratcher
Copy link
Member

Tratcher commented Aug 5, 2020

This is currently by design, KeepAliveTimeout only tracks request activity, not pings. HttpClient does something similar, they're adding a new flag to give more control over the behavior. If you wanted this in Kestrel we'd need a similar new API.

dotnet/runtime#31198

@Tratcher Tratcher added the enhancement This issue represents an ask for new feature or an enhancement to an existing one label Aug 5, 2020
@JamesNK
Copy link
Member

JamesNK commented Aug 5, 2020

Hmmm, I'm not sure about that. The additional flag HttpClient has is to control whether keep alive pings are sent on a HTTP/2 connection without active streams.

PING frames should reset the timeout. In fact I think that is the behavior today.

@Tratcher
Copy link
Member

Tratcher commented Aug 5, 2020

I'd expect the server to be more defensive about long lived connections than the client due to resource constraints.

@JamesNK
Copy link
Member

JamesNK commented Aug 5, 2020

What are the rules today about Kestrel keeping a HTTP/2 connection active that has no active requests? Does it wait for a configured period before closing it, or does it wait for the client to close it?

Other HTTP/2 servers are quite strict by default when it comes to accepting incoming PINGS. Other servers, by default, don't allow pings if there are no active streams on the connection.

@JamesNK
Copy link
Member

JamesNK commented Aug 5, 2020

I was thinking a few days ago that keep alive pings really should be tested end-to-end. This issue is a good prompt. Really need to test:

  • Grpc.Core client + Grpc.Core server
  • HttpClient + Grpc.Core server
  • Grpc.Core client + Kestrel
  • HttpClient + Kestrel

And for each of them test what happens with active streams, and without active streams.

@Tratcher
Copy link
Member

Tratcher commented Aug 5, 2020

I think they're the same as HTTP/1, only active requests keep the connection alive. Otherwise it waits for the KeepAliveTimeout and closes it.

@JamesNK JamesNK self-assigned this Aug 5, 2020
@JamesNK
Copy link
Member

JamesNK commented Aug 5, 2020

Ok. I'm trying to think of whether this is a flaw in the keep alive ping feature. I didn't realize KeepAliveTimeout didn't reset on ping frames, and none of my tests went over 2 minutes. One of the scenarios for keep alive pings is to keep a connection warm, despite there being no active requests.

Option 1:

Recommend customers with servers that want long-lasting warm connections to set KeepAliveTimeout = TimeSpan.MaxValue and configure a server keep-alive delay + timeout. That allows a server to be pro-active about deciding whether to close a HTTP/2 connection. However that wouldn't work well if a server was serving both HTTP/1.1 and HTTP/2.

Option 2:

Http2Limits.KeepAlivePingDelay + Http2Limits.KeepAlivePingTimeout overrides KeepAliveTimeout for HTTP/2 (I really wish this was called something like IdleTimeout 😄).

Similar to option 1, but allows the server to be more specific about HTTP/2 behavior.

Option 3:

A setting to control whether pings reset the KeepAliveTimeout timer if there are no active requests. To be defensive it should default to false. Something like:

public class Http2Limits
{
    public bool AllowKeepAlivePingWithoutRequests { get; set; } = false;
    
    // or
    public bool KeepAlivePingsResetIdleTimer { get; set; } = false;
}
  • If it is false then Kestrel won't throw an error if it gets a ping but it won't reset KeepAliveTimeout timer. This matches current behavior.
  • If it is true then Kestrel resets KeepAliveTimeout timer.

@davidni
Copy link
Author

davidni commented Aug 6, 2020

One of the scenarios for keep alive pings is to keep a connection warm, despite there being no active requests.

This is precisely the scenario that led me to identifying this issue 😄

I really wish this was called something like IdleTimeout

Ditto. Something like Go's http server options perhaps.


  • Option 1 is a non-starter, we cannot allow infinite duration HTTP/1.1 connections, as you mentioned
  • Option 2 if I understand it correctly, this does not address the issue, and client ping's are still ignored for the purposes of managing server timeouts, right?
  • Option 3: KeepAlivePingsResetIdleTimer takes care of the issue at hand. Not sure why we would also need AllowKeepAlivePingWithoutRequests?

@JamesNK

@JamesNK
Copy link
Member

JamesNK commented Aug 6, 2020

  • Option 2 if I understand it correctly, this does not address the issue, and client ping's are still ignored for the purposes of managing server timeouts, right?

KeepAlivePingTimeout does use PING to reset its timer. Basically option 2 would turn off the passive KeepAliveTimeout for HTTP/2 and replace it with the active KeepAlivePingDelay/KeepAlivePingTimeout.

  • Option 3: KeepAlivePingsResetIdleTimer takes care of the issue at hand. Not sure why we would also need AllowKeepAlivePingWithoutRequests?

There is an OR there. They're alternative names for a property that does the same thing. I think I prefer KeepAlivePingsResetIdleTimer.


I think I like option 3 the most. Option 2 feels auto-magical and hard to describe, while option 3 is a distinct setting to enable an easy to understand behavior: HTTP/2 connections won't time out if they're pinged.

@davidni
Copy link
Author

davidni commented Aug 6, 2020

Gotcha, so Option 3 really is a strict superset of Option 2 (i.e. it can do everything that Option 2 can, but additionally supports the use case where clients are allowed to extend the lifetime of a connection with client-initiated PING's regardless of server-initiated PING's). I too prefer Option 3.

Side note: It may be interesting to consider making these settable per connection (as opposed to globally in Kestrel limits), though I don't have a concrete use case for that at the moment.

@JamesNK JamesNK added this to the 5.0.0-rc1 milestone Aug 7, 2020
@JamesNK
Copy link
Member

JamesNK commented Aug 7, 2020

Thoughts? @halter73 @Tratcher @davidfowl

@halter73
Copy link
Member

halter73 commented Aug 7, 2020

Nice catch @davidni. Pings should reset the keep-alive timeout period. No need for a setting.

With HTTP/1.1, there's no alternative to the client sending a new request to reset the keep-alive timeout. If there was a way for the client to ping an HTTP/1.1 connection, said ping would reset the timeout.

@Tratcher
Copy link
Member

Tratcher commented Aug 7, 2020

@halter73 won't that allow a client to keep a connection open indefinitely with no requests? That seems like something the server should have control over.

@halter73
Copy link
Member

halter73 commented Aug 7, 2020

Absolutely. And like I said, if a client could do so for an HTTP/1.1 connection, we would allow it.

Option 1 of telling people to set an effectively infinite keep-alive timeout is a nonstarter, because that allows a client to just disappear (never send a FIN or RST) and Kestrel would keep the connection alive forever. This would be a big problem for both HTTP/1.1 and HTTP/2. The OS nor anything else would do anything to tell Kestrel the socket is closed (at least not until ~30,000 more years when the TimeSpan.MaxValue finally elapses). TCP keep-alive is a rarely used feature, and I've never seen or heard of it used for any sort of HTTP connection.

The point of the KeepAliveTimout is to avoid exactly that scenario where Kestrel doesn't know that the socket is really gone. HTTP/2 pings tell us the socket is still active, and that's good enough. If the concern is that a malicious client could use a ping to keep the connection alive for cheap, that could already happen with a short GET request.

@JamesNK
Copy link
Member

JamesNK commented Aug 7, 2020

There are two kinds of pings to consider:

  1. The server has a keep alive ping configured. When there is no activity the server is responsible for starting the ping, it then gets a PING ACK frame (which could be made to contain a payload to validate it).
    We could make this ack ping reset the KeepAliveTimeout timer by default safely because a server has to opt-in to enabling keep alive pings, and it is the one initiating the ping. However this behavior would mean that if you wanted a server to have warm connections then the server would be the one that needs keep alive pings enabled.

  2. Client initiated ping. The server gets the ping, then ACKs it.
    I worry about these pings automatically resetting the KeepAliveTimeout timer because now there is no way to configure Kestrel to close idle connections. Although @halter73 brings up a good point that you can get around this anyway by using a GET request instead of a ping.

@JamesNK
Copy link
Member

JamesNK commented Aug 7, 2020

Actually relying on keep alive PING ACKs wouldn't work (1). The keep alive ping delay is based on receiving any frames. A server initiated keep alive ping wouldn't be sent if the server was receiving pings, but those pings wouldn't reset KeepAliveTimeout.

@halter73
Copy link
Member

halter73 commented Aug 7, 2020

I want to avoid adding too many Http2Limits. It's already quite a lot for the majority of people who aren't intimately familiar with the HTTP/2 protocol. Before adding yet another configurable limit, I'd like to know of a real customer/scenario that needs to make this configurable.

It's clear we need to fix pings not resetting the keep-alive timeout. That's just a bug. Really, any frame received while there are not active requests should reset the timeout.

@Tratcher
Copy link
Member

Tratcher commented Aug 7, 2020

that could already happen with a short GET request.

Fair. I think 2 makes more sense. It's the client that wants the connection to stay warm so let them initiate the pings.

@ghost ghost locked as resolved and limited conversation to collaborators Sep 10, 2020
@amcasey amcasey added area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions and removed area-runtime labels Jun 2, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions enhancement This issue represents an ask for new feature or an enhancement to an existing one feature-kestrel feature-yarp This issue is related to work on yarp
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants