Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

configurable HTTP/2 PING timeouts in HttpClient #31198

Closed
chrisdot opened this issue Oct 17, 2019 · 54 comments · Fixed by #40257
Closed

configurable HTTP/2 PING timeouts in HttpClient #31198

chrisdot opened this issue Oct 17, 2019 · 54 comments · Fixed by #40257
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-System.Net.Http
Milestone

Comments

@chrisdot
Copy link

chrisdot commented Oct 17, 2019

Justification

Long-running but idle connections (which happen in gRPC and long poll scenarios) can result in firewalls/etc. dropping connections out from under us, without any reset notification. This causes long timeouts on a client that is only reading data.

These APIs will have HttpClient periodically send ping frames over HTTP/2, which would prevent firewalls from seeing the connection as idle and detect otherwise broken connections sooner than we can right now.

API Proposal

public class SocketsHttpHandler
{
    // if we haven't received data on a connection in X time, send a PING frame.
    public TimeSpan KeepAlivePingDelay { get; set; }

    // if server does not respond to the PING in X time, kill the connection on our side.
    public TimeSpan KeepAlivePingTimeout { get; set; }

    // ****new since last review****
    public HttpKeepAlivePingPolicy KeepAlivePingPolicy { get; set; }
}

// Consider using a bool instead. Proposing enum for better extensibility in future.
public enum HttpKeepAlivePingPolicy
{
    // Only send pings if there are active requests.
    WithActiveRequests,

    // Send pings even for idle connections.
    // Common gRPC servers have behavior that will kill connections that use this. ¯\_(ツ)_/¯
    Always
}

Original issue below

I've been evaluating the new gRPC-dotnet and particularly its streaming feature in a long lived connected scenario.
I needed the HTTP/2 PING feature to detect the zombie connections (basically to know: "Am I still having an up and running connection to the client?") on both server and client side.
It seems that this feature won't be implement-able on the client side as long as there's no support of an API in the HttpClient, as the client part of gRPC-dotnet relies on HttpClient.

It seems that HttpClient already responds to ping but can not send any on demand. Am I right?

So I would like to request a new method in HttpClient class to support sending Http/2 pings or even better a kind of keep alive ping timeout to do it automatically in case there's no traffic...

BTW: I have also created an issue for the same feature for kestrel (I don't know what kestrel is using underneath).

@Tratcher
Copy link
Member

Adding an HttpClient API to send pings on demand would be really complicated, pings are per connection and HttpClient doesn't expose any connection level APIs.

Using a timer to send pings would be much easier. WebSockets has a similar feature already.

@scalablecory
Copy link
Contributor

Can you give some more detail on your scenario? Once you determine if a connection is still open, what actions will you be taking?

@chrisdot
Copy link
Author

Can you give some more detail on your scenario? Once you determine if a connection is still open, what actions will you be taking?

@scalablecory
My scenario is a client/server relation where both sides need to know if the connection is still up and running at any time (in case when connection fails, it should start to act in a degraded mode).

As I imagine it, a response to a PING should not imply any action at all. This is just normal operating. I consider the things just in the opposit way: if the PING does not get a response, then the connection should be closed (an by getting an exception?!)

@Tratcher
Copy link
Member

Even the act of sending a ping would likely trigger an immediate exception for local disconnect scenarios. Not so much if there's a proxy involved.

The API for a full round trip ping and ack would be tricky, especially since you could have multiple outstanding pings.

WinHttp WebSockets has a feature where it will periodically send unsolicited ping acks (pongs) just to check if the connection is open. It starts with acks because it doesn't want a response, it's just trying to detect socket errors.

@chrisdot
Copy link
Author

@Tratcher

The API for a full round trip ping and ack would be tricky, especially since you could have multiple outstanding pings.

I don't see how this would be a real world a scenario having multiple outstanding pings, at a given moment. A ping would be sent on a idle connection only. And there's no reason to send a second one before the first one was ACKed.
Furthermore, this may be solved by using the "8 octets of opaque data in the payload" of the PING frame that the ACK has to send back?

Maybe there should be 2 parameters:

  • an inactivity timeout after which a ping would be sent to test the line
  • a timeout after which the ack should have been done

@Tratcher
Copy link
Member

A ping would be sent on a idle connection only. And there's no reason to send a second one before the first one was ACKed.

If it were an internal feature based on an idle timer then yes, but if Ping is a callable API then none of that is guaranteed.

@scalablecory
Copy link
Contributor

in case when connection fails, it should start to act in a degraded mode

@chrisdot Thanks. Would you be using this as a way to minimize latency noticed by the user by opening alternative routes or offline alternatives prior to the next request?

Your app would still need to handle the case of a request failing, as even if PING is successful 1 second earlier, the connection could have been killed in the mean time. But if used as the above, I can see some usefulness here.

Furthermore, this may be solved by using the "8 octets of opaque data in the payload" of the PING frame that the ACK has to send back?

Indeed, I don't see this as being a particularly difficult thing to implement.

The one hiccup is that the API would only work for HTTP/2 (and in future, HTTP/3) connections. I don't see that as being a blocker, but I believe it'd be a first for HttpClient so we'd want to make sure we get the design right for such an API.

@chwarr
Copy link
Contributor

chwarr commented Oct 18, 2019

As an concrete example of HTTP2 PING frames, consider gRPC--which was the impetus for @chrisdot opening this issue.

The gRPC HTTP2 protocol explicitly uses PING frames:

Both clients and servers can send a PING frame that the peer must respond to by precisely echoing what they received. This is used to assert that the connection is still live as well as providing a means to estimate end-to-end latency. If a server initiated PING does not receive a response within the deadline expected by the runtime all outstanding calls on the server will be closed with a CANCELLED status. An expired client initiated PING will cause all calls to be closed with an UNAVAILABLE status. Note that the frequency of PINGs is highly dependent on the network environment, implementations are free to adjust PING frequency based on network and application requirements.

The https://github.com/grpc/grpc-dotnet/ project is building a gRPC implementation completely in .NET. One of its goals is to be able to interoperate with the other gRPC implementations.

The C core library implementation of gRPC implements the PING part of the protocol and calls it keepalive. The Go implementation is similar.

@chrisdot
Copy link
Author

chrisdot commented Oct 18, 2019

If it were an internal feature based on an idle timer then yes, but if Ping is a callable API then none of that is guaranteed.

@Tratcher : I think we don't need a direct method/API, a configurable timer manner (property) should be enough I guess. In my case I don't see the point of being able to send at the application level the PINGs. I think this is a transport level stuff and if it is transparent but configurable should be enough??

Would you be using this as a way to minimize latency noticed by the user by opening alternative routes or offline alternatives prior to the next request?

@scalablecory : yes, that's it! I would be able to know that the active connection is down even if on the application level I have no payload for a given time. And that's how it is used in gRPC in the streaming call model... (where everything started from)

The C core library implementation of gRPC implements the PING part of the protocol and calls it keepalive. The Go implementation is similar.

@chwarr: this is exactly the root of my request: I would need to be able to set the pendant of the following ping parameters (taken from the grpc-core) to the grpc-dotnet:

  • GRPC_ARG_KEEPALIVE_TIME_MS: This channel argument controls the period (in milliseconds) after which a keepalive ping is sent on the transport.
  • GRPC_ARG_KEEPALIVE_TIMEOUT_MS: This channel argument controls the amount of time (in milliseconds) the sender of the keepalive ping waits for an acknowledgement. If it does not receive an acknowledgment within this time, it will close the connection.

@chrisdot chrisdot changed the title HTTP/2 PING support API in HttpClient configurable HTTP/2 PING timeouts in HttpClient Oct 18, 2019
@jtattermusch
Copy link

CC @jtattermusch

@karelz
Copy link
Member

karelz commented Jan 23, 2020

@JamesNK do you think this will be needed for gRPC? Did you hear other requests?

Triage: We could implement just timeout (based on fraction of existing timeouts like PooledConnectionIdleTimeout), or we could expose new timeout.
Overall we are not convinced this is super-valuable. If we don't have strong need from users, we would suggest to do nothing, or just fraction-timer based solution.

@chrisdot
Copy link
Author

chrisdot commented Jan 27, 2020

Triage: We could implement just timeout (based on fraction of existing timeouts like PooledConnectionIdleTimeout), or we could expose new timeout.
Overall we are not convinced this is super-valuable. If we don't have strong need from users, we would suggest to do nothing, or just fraction-timer based solution.

I don't know how difficult that is, and I don't fully understand your proposition based on fraction of existing timers.
Timeout after which the PING ack should be done, is not a big deal: using the same timeout mechanism for any regular frame with payload could definitely be a valid option.
But for the inactivity timeout after which a ping would be sent to test the line, I'm sceptical. It may depend of the context constraints (depending to the answer of the question: "how quick should the applevel know that the transport layer is down?) This depend to the context, so maybe better configurable... And this may be better configurable as both client and server might use the same granularity...?

Sorry to insist :-) , but I think there is a tiny hole in the HTTP/2 implementation, which would make the streaming capability of gRPC quite unusable, and not interoperable with other stacks. But I guess there might be other applications over HTTP/2 that would make usage of the keep alive? (can't figure out what for the moment...)

@jtattermusch
Copy link

As @chrisdot have mentioned, this feature is critical for gRPC long lived RPCs and streaming RPCs.
It is also important for general interoperability with other gRPC implemenations (that do support keepalive)
Here is the gRPC's keepalive spec: https://github.com/grpc/grpc/blob/master/doc/keepalive.md

@msftgits msftgits transferred this issue from dotnet/corefx Feb 1, 2020
@msftgits msftgits added this to the Future milestone Feb 1, 2020
@rolfik
Copy link

rolfik commented May 13, 2020

We need this functionality as well, but both in client and server .NET implementation.
We use server streaming for long time notifications which breaks after some time when no data is transferred, but clients have no idea about it.
Now we test this scenario:
Server: ASP.NET Core 3.1 with gRPC .NET implementation
Clients: ASP.NET Core 3.1 or .NET Framework 4.8 WPF with gRPC c implementation with keep alive channel options setup.

But now I wonder who will respond to client http2 pings on the server? We use Kestrel there.

We need quick fix please, otherwise we are forced to use c server gRPC implementation.
Thank You

@chrisdot
Copy link
Author

But now I wonder how will respond to client http2 pings on the server? We use Kestrel there.

This is where my other related issue comes in...

@JamesNK
Copy link
Member

JamesNK commented May 28, 2020

What I propose be added to SocketsHttpHandler:

public class SocketsHttpHandler
{
    public TimeSpan KeepAliveInterval { get; set; }
    public TimeSpan KeepAliveTimeout { get; set; }
    public bool KeepAliveWithoutRequests { get; set; }
}

KeepAliveInterval controls a timer that sends connection-level HTTP/2 pings to the server. Would start after interval of inactivity in reading incoming data. Has a minimum value of 10 seconds (avoid angering the server with frequent pings). Defaults to 0 (off).

KeepAliveTimeout controls how long the client waits for the ping reply. If the reply is not received in that time then the connection is closed. Defaults to 20 seconds.

KeepAliveWithoutRequests controls whether keep alive pings are sent if there are no active requests. This is useful if someone wants keep alive pings sent only when there are requests in-progress, such as a long running gRPC streaming request. If there are no active requests then keep alive pings would stop and the connection could be closed. Defaults to false.

These are the most important settings from how gRPC clients generally handle keep alive: https://github.com/grpc/grpc/blob/master/doc/keepalive.md. There are additional settings for finer grain control of pings (some are server specific). I don't believe they are needed.

@JamesNK
Copy link
Member

JamesNK commented May 28, 2020

The equivalent settings in the gRPC golang client:

type ClientParameters struct {
    // After a duration of this time if the client doesn't see any activity it
    // pings the server to see if the transport is still alive.
    // If set below 10s, a minimum value of 10s will be used instead.
    Time time.Duration // The current default value is infinity.
    // After having pinged for keepalive check, the client waits for a duration
    // of Timeout and if no activity is seen even after that the connection is
    // closed.
    Timeout time.Duration // The current default value is 20 seconds.
    // If true, client sends keepalive pings even with no active RPCs. If false,
    // when there are no active RPCs, Time and Timeout will be ignored and no
    // keepalive pings will be sent.
    PermitWithoutStream bool // false by default.
}

And gRPC Java client has keepAliveTime, keepAliveTimeout, keepAliveWithoutCalls:

https://grpc.github.io/grpc-java/javadoc/io/grpc/netty/NettyChannelBuilder.html#keepAliveTime-long-java.util.concurrent.TimeUnit-

Sets the time without read activity before sending a keepalive ping. An unreasonably small value might be increased, and Long.MAX_VALUE nano seconds or an unreasonably large value will disable keepalive. Defaults to infinite.
Clients must receive permission from the service owner before enabling this option. Keepalives can increase the load on services and are commonly "invisible" making it hard to notice when they are causing excessive load. Clients are strongly encouraged to use only as small of a value as necessary.

@karelz
Copy link
Member

karelz commented May 28, 2020

Given that the properties impact only HTTP/2, I wonder if we should modify the names as well ...
The KeepAliveTimeout default of 20s seems to be a bit excessive ... is that expected?

I was confused by the name KeepAliveWithoutRequests - "without requests" could be interpreted to include also "without active streaming requests". Maybe we could name it better?

cc @scalablecory @ManickaP

@karelz karelz modified the milestones: Future, 5.0 May 28, 2020
@karelz
Copy link
Member

karelz commented May 28, 2020

Marking it for 5.0 for now due to raising demand ...

@scalablecory
Copy link
Contributor

QUIC has pings as well.

We might get away with leaving out the WithoutRequests property as we already have an idle connection lifetime timeout.

@chrisdot, @JamesNK's proposal is not the more explicit 'is the connection alive right now' API you're asking for. Will keepalive be sufficient, or do you feel some sort of PingAsync(uri) is needed for your use?

@chrisdot
Copy link
Author

@chrisdot, @JamesNK's proposal is not the more explicit 'is the connection alive right now' API you're asking for. Will keepalive be sufficient, or do you feel some sort of PingAsync(uri) is needed for your use?

In my case I don't see the point in having a direct API to a ping itself... The need is just about checking the idle connections...

@wfurt
Copy link
Member

wfurt commented May 28, 2020

In my case I don't see the point in having a direct API to a ping itself... The need is just about checking the idle connections...

That can be done with TCP keep-alive as well. (once the dialers are implemented)

@karelz
Copy link
Member

karelz commented Jul 24, 2020

Naively, I would expect that Ping does not count as "normal" request and does not prolong idle timeout. User should set longer [PooledConnectionIdleTimeout](https://apisof.net/catalog/System.Net.Http.SocketsHttpHandler.PooledConnectionIdleTimeout) to keep idle connections around longer. That is my naive expectations without looking at code - @aik-jahoda can you please confirm?

@JamesNK
Copy link
Member

JamesNK commented Jul 24, 2020

Ok. So for someone who wants to keep the connection "warm" they would combine idle timeout with keepalive.

var socketsHttpHandler = new SocketsHttpHandler();
socketsHttpHandler.PooledConnectionIdleTimeout = TimeSpan.FromHours(2);
socketsHttpHandler.KeepAlivePingDelay = TimeSpan.FromSeconds(60);
socketsHttpHandler.KeepAlivePingTimeout = TimeSpan.FromSeconds(30);
socketsHttpHandler.EnableMultipleHttp2Connections = true;

Does this scenario make sense:

  • An app exists that makes bursts of requests once an hour and wants to keep connections warm to avoid latency when the burst happens
  • The app makes 1000 concurrent HTTP/2 requests and 10 connections are created. These 10 connections will live for at least 2 hours. Each pings the server every 60 seconds
  • The next hour the app only makes 500 concurrent HTTP/2 requests. It reuses 5 of the connections.
  • The next hour the app again makes 500 concurrent HTTP/2 requests. It reuses 5 of the connections. 5 idle connections hit the idle timeout and are closed
  • The next hour the app makes 1000 concurrent HTTP/2 requests. It reuses 5 of the connections, and opens 5 new connections

etc

I think this is how it will work, and it is probably fine.

@JamesNK
Copy link
Member

JamesNK commented Jul 24, 2020

On KeepAliveWithoutRequests, we discussed in this issue why it was useful for it to be true, but there a situations of when you want it to be false. The behavior of not having a KeepAliveWithoutRequests setting and default its behavior to true will create problems.

KeepAliveWithoutRequests isn't just a settings to slightly improve client performance. Many servers are aggressive about closing connections when pinged too much by clients. This isn't a problem with Kestrel - Kestrel handles pings so efficiently that it doesn't care how much you ping it - but other servers are strict. One of their techniques for detecting "malicious" pings is a client pinging when a connection has no active streams.

@Tratcher Have you looked closely at how Kestrel handles HTTP/2 pings compared to other servers? Update - I've looked up what gRPC servers do below.

@JamesNK
Copy link
Member

JamesNK commented Jul 25, 2020

By default gRPC client implementations don't send keep alive pings without active streams. And by default the servers (CCore based gRPC servers, Java+Netty, golang) will close connections with GO_AWAY and ENHANCE_YOUR_CALM if the client attempts to ping the server without active streams.

Pinging without active calls is different behavior from all other gRPC clients. Servers all have an option to allow it, but I think the difference behavior unnecessarily creates problems for .NET developers who want to call non-.NET gRPC servers and trying to get keep alive working.

I think we should add KeepAliveWithoutRequests and it should default to false.

@scalablecory
Copy link
Contributor

by default the servers (CCore based gRPC servers, Java+Netty, golang) will close connections with GO_AWAY and ENHANCE_YOUR_CALM if the client attempts to ping the server without active streams.

Wow, that is an interesting design choice. Okay, we do need this to be configurable then. Thanks for providing the data!

@scalablecory scalablecory added api-ready-for-review API is ready for review, it is NOT ready for implementation blocking Marks issues that we want to fast track in order to unblock other important work and removed api-approved API was approved in API review, it can be implemented labels Jul 25, 2020
@terrajobst terrajobst removed the blocking Marks issues that we want to fast track in order to unblock other important work label Jul 28, 2020
@terrajobst
Copy link
Member

terrajobst commented Jul 28, 2020

Video

  • It seems the setting Always is undesirable. Instead of adding this enum, we should just use WithActiveRequests as the default
  • The API is reasonable and an enum is preferred, but let's defer this until we know for sure we need a way to allow Always (even though all other clients have it).
namespace System.Net.Http
{
    public class SocketsHttpHandler
    {
        public TimeSpan KeepAlivePingDelay { get; set; }
        public TimeSpan KeepAlivePingTimeout { get; set; }
    }
}

@terrajobst terrajobst added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Jul 28, 2020
@terrajobst terrajobst reopened this Jul 28, 2020
@JamesNK
Copy link
Member

JamesNK commented Jul 28, 2020

It seems the setting Always is undesirable.

Yes it is desirable. I explained why it is desirable here - #31198 (comment)

To go into an example:

Imagine a client that infrequently sends bursts of data, e.g. 1000 HTTP requests once every hour. Without an option to send keep alive pings - even though there are no active calls - then a load balancer/proxy closes the connection for inactivity. Every time that client wakes up to send the 1000 HTTP requests, all have high latency while they wait for TCP connections to be created, TLS to be established, HTTP/2 connection to be established, and calls to be made.

Customers - including Azure 1st party (I don't know if I can mention names publicly) - have asked for this feature. They have observed that the scenario impacts their P90 P50 latency numbers.

@chwarr Also goes into detail about why KeepAliveWithoutRequests = true is useful here.

This isn't gRPC specific. Anyone using HTTP/2 could use this.

We're going around in circles. Could you please invite me to discuss this feature.

@scalablecory
Copy link
Contributor

scalablecory commented Jul 28, 2020

Thanks @JamesNK. The function of them was never under question -- pushback we got in API review was on specific scenarios. P90 latency is a good scenario.

@terrajobst
Copy link
Member

Excellent. Let's approve this over email.

@terrajobst terrajobst added api-ready-for-review API is ready for review, it is NOT ready for implementation api-approved API was approved in API review, it can be implemented and removed api-approved API was approved in API review, it can be implemented api-ready-for-review API is ready for review, it is NOT ready for implementation labels Jul 28, 2020
@terrajobst
Copy link
Member

terrajobst commented Jul 29, 2020

  • Looks good as proposed.
namespace System.Net.Http
{
    public class SocketsHttpHandler
    {
        public TimeSpan KeepAlivePingDelay { get; set; }
        public TimeSpan KeepAlivePingTimeout { get; set; }
        public HttpKeepAlivePingPolicy KeepAlivePingPolicy { get; set; }
    }
    public enum HttpKeepAlivePingPolicy
    {
        WithActiveRequests,
        Always
    }
}

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Net.Http
Projects
None yet
Development

Successfully merging a pull request may close this issue.