Keep alive missing? #770

TheLever · 2020-02-13T13:32:52Z

For some reason my question text was removed from the question - redoing it here.

Is there a way to keep the connection between client and server alive in a C# gRPC app? My scenario is a bidirectional streaming one (essentially the FullStockTicker demo). What I see is that the server closes the HTTP/2 connection after a few minutes. I have tried to find a way to set the keep alive flag, but that does not seem possible from C#.

I am using Grpc.AspNetCore (2.27.0), Grpc.Tools (2.27.0), Google.Protobuf (3.11.3) and generated clients.

analogrelay · 2020-02-19T21:10:52Z

Can you provide a runnable sample that reproduces this problem? We don't believe the server should be terminating the connection in this situation, so it would help to see the full context of what you're doing.

In addition, if you could enable more detailed logging using the following config in your appSettings.json that would help.

{
  "Logging": {
    "LogLevel": {
      "Default": "Debug",
      "System": "Information",
      "Microsoft": "Trace",
      "Grpc": "Debug"
    }
  }
}

ndglover · 2020-02-24T17:08:21Z

I have just hit the same issue. It's taken a while to track down so might well be worth noting in the docs somewhere. It looks like the work is being done outside of this repo to fix it. See dotnet/aspnetcore#15104

After further investigation this proves not to be the actual issue for me. Apologies for the confusion. I have ~~crossed~~ my original comment.

halter73 · 2020-02-24T22:03:44Z

@ndglover Can you provide the logs that @anurse asked for? Thanks.

thefringeninja · 2020-03-06T13:42:10Z

This is something we will definitely need, as we expect our subscriptions to run indefinitely.

JuliusSweetland · 2020-04-27T15:49:50Z

@JamesNK @anurse Hi James, Andrew, I believe I might need keep alive functionality where I am implementing bidi and server to client streams. These would be long running, and there may be long periods of no messages, which is what I understand the keep alive ping/pong behaviour to be designed for - essentially to test that the connection has not been reset without either side knowing. Would this be a suitable use case for keep alive? If so I'd like to throw my vote in for including it with Grpc-net. Thanks.

JuliusSweetland · 2020-04-27T16:32:53Z

If it helps there is one C# project on GitHub which seems to have rolled their own grpc client code which allows them to set channeloptions on the underlying channel:

https://github.com/zeebe-io/zeebe-client-csharp/blob/5d416bc1bb6eccc3898f0727f639ecd1b2bfb434/Client/ZeebeClient.cs#L58

Which uses this GatewayGrpc client:

https://github.com/zeebe-io/zeebe-client-csharp/blob/5d416bc1bb6eccc3898f0727f639ecd1b2bfb434/Client/Impl/proto/GatewayGrpc.cs

JuliusSweetland · 2020-05-01T08:45:32Z

@JamesNK @anurse Hi again James, Andrew, can I ask where support for gRPC keepalive might be on your priority list? At present I am rolling my own heart beating, which is very much sub-optimal. I would love to rip out some of my code and use the underlying mechanism to get confidence that longstanding tcp connections are still connected. Thanks.

JamesNK · 2020-05-09T23:53:03Z

At the moment it is low priority because we haven't had many people ask for it. What do you want built-in heart beats for? Is it perf, i.e. keeping the TCP connection open so there is not a delay reestablishing the connection after a pause?

JuliusSweetland · 2020-05-10T13:31:19Z

A combination of requiring long running streams for performance reasons, but I also require an up to date view of which streaming connections are still live as we require a client to have connections to certain streams before we allow them to perform operations against other endpoints. If we cannot guarantee the client connections are currently active then we cannot enforce policies which we require.

MaxXor · 2020-05-11T15:13:57Z

@JamesNK I'm planning to switch to gRPC but this issue kept me back. My project has long-running idle connections and need to be responsive as soon as they are used.

vadimi · 2020-05-14T03:21:50Z

We are using keepalives in grpc core when exposing grpc services through AWS NLB load balancer, without them the load balancer would just non gracefully close tcp connections and grpc clients would get errors. My understanding is Azure load balancer works the same way. When running services in kubernetes it also has idle timeout settings (https://kubernetes.io/docs/reference/command-line-tools-reference/kube-proxy/) which will non gracefully terminate tcp connections as well. Keepalives really help in these scenarios which makes apps and services a lot more stable.

JuliusSweetland · 2020-05-25T11:13:33Z

@JamesNK Hi James - has any of the above feedback influenced how you are thinking about Keep Alive?

JamesNK · 2020-05-26T10:05:36Z

I'm planning to investigate what options are in Grpc.Core, and what ones would be useful to have in grpc-dotnet.

If people can give feedback on which settings are important, and why, then that would be useful.

To set expectations: Adding new features to the gRPC on the server - including Kestrel - will be easier than the client. I can prioritize implementing server-side features myself. Client-side features that need to be added to HttpClient are done by a different team. They might not make the cut in the .NET 5 time-frame.

chwarr · 2020-05-26T22:00:01Z

Below I've included what my product is currently using to keep Azure load balancers alive when we have a lull in traffic and our gRPC channels don't have any active calls or messages. (The Azure load balancers will sometimes drop idle connections without sending RST packets to either endpoint.)

We use keepalives to

reduce latency by having a connection to our backend already established (don't have to do TCP, TLS, gRPC, &c. handshakes while in the middle of a client's request) and
improve reliability by detecting when connections die during idle times.

As has been mentioned before, yes, we could do this by adding some sort of heartbeat in our application protocol. The keepalive functionality in the gRPC code library has, thus far, been sufficient.

This is an excerpt of the keepalive options that are set on a gRPC code channel.

The ExtendedChannelOptions class is a collection of C# string constants for the gRPC code channel options. It's definition is below.

// This channel argument controls the period (in milliseconds) after which a
// keepalive ping is sent on the transport.
new ChannelOption(ExtendedChannelOptions.KeepAliveTimeMilliseconds, 2500),

// This channel argument controls the amount of time (in milliseconds), the sender of
// the keepalive ping waits for an acknowledgment. If it does not receive an
// acknowledgment within this time, it will close the connection.
new ChannelOption(ExtendedChannelOptions.KeepAliveTimeoutMilliseconds, 60000),

// This channel argument if set to 1 (0 : false; 1 : true), allows keepalive pings to be sent even if there are no calls in flight.
new ChannelOption(ExtendedChannelOptions.KeepAlivePermitWithoutCalls, 1),

// This channel argument controls the maximum number of pings that can be sent when
// there is no other data (data frame or header frame) to be sent. GRPC Core will not
// continue sending pings if we run over the limit. Setting it to 0 allows sending
// pings without sending data.
new ChannelOption(ExtendedChannelOptions.MaxPingsWithoutData, 0),

// If there is no data being sent on the transport, this channel argument controls
// the minimum time (in milliseconds) gRPC Core will wait between successive pings.
new ChannelOption(ExtendedChannelOptions.MinTimeBetweenPingsMilliseconds, 1000),

ExtendedChannelOptions definition

public static class ExtendedChannelOptions
{
    /// <summary>
    /// This channel argument controls the period (in milliseconds) after
    /// which a keepalive ping is sent on the transport.
    /// </summary>
    public const string KeepAliveTimeMilliseconds = "grpc.keepalive_time_ms";

    /// <summary>
    /// This channel argument controls the amount of time (in
    /// milliseconds), the sender of the keepalive ping waits for an
    /// acknowledgement. If it does not receive an acknowledgement within
    /// this time, it will close the connection.
    /// </summary>
    public const string KeepAliveTimeoutMilliseconds = "grpc.keepalive_timeout_ms";

    /// <summary>
    /// This channel argument if set to 1 (0 : false; 1 : true), allows
    /// keepalive pings to be sent even if there are no calls in flight.
    /// </summary>
    public const string KeepAlivePermitWithoutCalls = "grpc.keepalive_permit_without_calls";

    /// <summary>
    /// The load balancer policy that should be used.
    /// </summary>
    public const string LoadBalancerPolicyName = "grpc.lb_policy_name";

    /// <summary>
    /// This channel argument controls the maximum number of pings that can
    /// be sent when there is no other data (data frame or header frame) to
    /// be sent. GRPC Core will not continue sending pings if we run over
    /// the limit. Setting it to 0 allows sending pings without sending
    /// data.
    /// </summary>
    public const string MaxPingsWithoutData = "grpc.http2.max_pings_without_data";

    /// <summary>
    /// If there is no data being sent on the transport, this channel
    /// argument controls the minimum time (in milliseconds) gRPC Core will
    /// wait between successive pings.
    /// </summary>
    public const string MinTimeBetweenPingsMilliseconds = "grpc.http2.min_time_between_pings_ms";

    /// <summary>
    /// If there is no data being sent on the transport, this channel
    /// argument on the server side controls the minimum time (in
    /// milliseconds) that gRPC Core would expect between receiving
    /// successive pings. If the time between successive pings is less that
    /// than this time, then the ping will be considered a bad ping from
    /// the peer. Such a ping counts as a ‘ping strike’. On the client
    /// side, this does not have any effect.
    /// </summary>
    public const string MinPingIntervalWithoutDataMilliseconds = "grpc.http2.min_ping_interval_without_data_ms";

    /// <summary>
    /// This arg controls the maximum number of bad pings that the server
    /// will tolerate before sending an HTTP2 GOAWAY frame and closing the
    /// transport. Setting it to 0 allows the server to accept any number
    /// of bad pings.
    /// </summary>
    public const string MaxPingStrikes = "grpc.http2.max_ping_strikes";

    /// <summary>
    /// Maximum time that a channel may have no outstanding rpcs. Int valued,
    /// milliseconds.INT_MAX means unlimited.
    /// </summary>
    public const string MaxConnectionIdleMilliseconds = "grpc.max_connection_idle_ms";
}

JuliusSweetland · 2020-05-27T22:20:41Z

@JamesNK As detailed in the previous post. For me the most useful would be the first 3 options:

/// <summary>
/// This channel argument controls the period (in milliseconds) after
/// which a keepalive ping is sent on the transport.
/// </summary>
public const string KeepAliveTimeMilliseconds = "grpc.keepalive_time_ms";

/// <summary>
/// This channel argument controls the amount of time (in
/// milliseconds), the sender of the keepalive ping waits for an
/// acknowledgement. If it does not receive an acknowledgement within
/// this time, it will close the connection.
/// </summary>
public const string KeepAliveTimeoutMilliseconds = "grpc.keepalive_timeout_ms";

/// <summary>
/// This channel argument if set to 1 (0 : false; 1 : true), allows
/// keepalive pings to be sent even if there are no calls in flight.
/// </summary>
public const string KeepAlivePermitWithoutCalls = "grpc.keepalive_permit_without_calls";

vadimi · 2020-05-27T22:41:59Z

@JamesNK here are the options that we normally use for keepalives on the client:

var opts = new List<ChannelOption>();
opts.Add(new ChannelOption("grpc.keepalive_time_ms", connOptions.KeepaliveTimeout));
opts.Add(new ChannelOption("grpc.keepalive_timeout_ms", ConnectionOptions.DefaultKeepaliveWaitTimeout));
opts.Add(new ChannelOption("grpc.keepalive_permit_without_calls", 1));
opts.Add(new ChannelOption("grpc.http2.min_time_between_pings_ms", connOptions.TimeBetweenPings));
opts.Add(new ChannelOption("grpc.http2.max_pings_without_data", 0));

on the server this one is useful to disable keepalive policy if needed:

new ChannelOption("grpc.http2.max_ping_strikes", 0)

not strictly related to keepalives, but we also use grpc.max_connection_idle_ms and grpc.max_connection_age_ms

JamesNK · 2020-05-27T22:43:40Z

Is the typical pattern here that the client is responsible for keeping the connection alive? i.e. the client pings the server on an interval and the server is only responsible for replying to the ping

chwarr · 2020-05-27T23:03:26Z

We run the pings in both directions so that both client and server can detect when traffic has disappeared into a black hole during idle times. (AFAIK, there's no setting to anticipate pings and update connectivity state if one hasn't been seen in a while.)

For us, client pinging server would be more useful than server pinging clients. We do have bi-directional streaming, but the idle times are usually between calls, not while a call is active. (E.g., my service doesn't have long-lived calls pushing data to clients. Yet. 😉)

JamesNK · 2020-05-27T23:10:00Z

I briefly looked into Grpc.Core's keep-alive pings a few months ago. The packets in Wireshark from pings indicated that they were connection level pings. The pings had a StreamID=0. I didn't see any stream level pings where the ping's StreamID=ActiveStreamID.

Do pings serve any purpose when it comes to keeping streams alive? I am thinking about potentially long running calls on those streams like bi-directional streaming that might not actively be sending messages.

chwarr · 2020-05-28T00:08:07Z

AFAIK, it just keeps the TCP, TLS, and HTTP2 connection alive. (My product doesn't have long-lived streams.) To control stream/call lifetime, I think one needs to use deadlines.

JamesNK · 2020-05-28T00:18:06Z

Ok.

Envoy supports timing out individual streams - see stream_idle_timeout at https://www.envoyproxy.io/docs/envoy/latest/api-v2/config/filter/network/http_connection_manager/v2/http_connection_manager.proto

I think to keep a long-running streaming call open in an environment where stream_idle_timeout is enabled you would need to send occasional dummy keepalive messages through gRPC.

JamesNK · 2020-05-28T11:55:25Z

Proposal for APIs added to HttpClient - dotnet/runtime#31198 (comment)

These could then be set when creating a handler for a gRPC channel.

JuliusSweetland · 2020-05-28T18:39:54Z

@JamesNK "Is the typical pattern here that the client is responsible for keeping the connection alive?"

For me it's server as well. My server maintains client subscriptions and does work for each subscriber, so knowing when the connection is dropped and cleanup can occur is useful.

JamesNK · 2020-05-28T22:16:00Z

Keep alive doesn't tell you when a connection is dropped. Its purpose it to send regular pings so connections are not closed from inactivity.

thefringeninja · 2020-05-28T22:21:51Z

But if you don't get a ping back surely that means the connection was dropped? Unless the act of sending a ping re-establishes the connection.

…

On Fri, May 29, 2020 at 12:17 AM James Newton-King ***@***.***> wrote: Keep alive doesn't tell you when a connection is dropped. Its purpose it to send regular pings so connections are not closed from inactivity. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#770 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AADY7B6LSDGQAOUXTKKEAQTRT3PC5ANCNFSM4KUSTWWA> .

-- Sent from my regular computer http://twitter.com/thefringeninja http://www.thefringeninja.com/

JamesNK · 2020-05-28T22:30:18Z

But if you don't get a ping back surely that means the connection was dropped?

If the client or server doesn't get a ping back in time then it means there is a problem and the connection will be dropped. What I meant is there is no API as part of this feature that will tell you a ping timed out and connection was dropped.

JuliusSweetland · 2020-05-28T22:51:22Z

If you have a server streaming endpoint and the client has keep alive configured, should the connection become unusable, then wouldn't the subsequent failed keep alive attempt from the client result in something detectable in the client, such as an exception, or the termination of the response stream?

…

On Thu, 28 May 2020, 23:30 James Newton-King, ***@***.***> wrote: But if you don't get a ping back surely that means the connection was dropped? If the client or server doesn't get a ping back in time then it means there is a problem and the connection will be dropped. What I meant is there is no API as part of this feature that will tell you a ping timed out and connection was dropped. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#770 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABEA7AOMA3KRQZ7X5AWO5OTRT3QYPANCNFSM4KUSTWWA> .

JamesNK · 2020-05-28T23:24:15Z

If you were awaiting the response stream for more data and the connection was terminated then you would immediately get an error that the connection has closed.

JuliusSweetland · 2020-05-29T00:00:38Z

How does that work? If the client is waiting to receive data can it not be waiting forever if something like an intermediate router fails? The server at some point tries to send data, fails, and disconnects, but the client would wait on the half open connection? This is one of the scenarios I wanted to solve with keep alive: the client instead regularly tests connectivity and fails faster if the connection has become half open.

…

On Fri, 29 May 2020, 00:24 James Newton-King, ***@***.***> wrote: If you were awaiting the response stream for more data and the connection was terminated then you would immediately get an error that the connection has closed. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#770 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABEA7ALSPRT5FFFAZ56TR2DRT3XCXANCNFSM4KUSTWWA> .

JamesNK · 2020-05-29T00:19:37Z

Operations awaiting HttpClient's request and response stream will error if the TCP connection is broken.

JuliusSweetland · 2020-05-29T06:52:05Z

Can you point me in the right direction to read about this? I don't understand how this is possible during periods of inactivity, unless there is some sort of packet to check whether the peer can be reached. This was what I believed keep alive packets would probe. Thanks.

…

On Fri, 29 May 2020, 01:19 James Newton-King, ***@***.***> wrote: Operations awaiting HttpClient's request and response stream will error if the TCP connection is broken. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#770 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABEA7ALEBMOGWZYJRYGZ54DRT35SNANCNFSM4KUSTWWA> .

JuliusSweetland · 2020-06-02T08:08:26Z

@JamesNK To clarify the above question: assume a streaming server. The client is awaiting the next response when an intermediary router is powered down and the connection is left half open. The server detects this and closes, but what happens with the client? What stops the client from sitting and waiting forever?

halter73 · 2020-06-02T18:11:46Z

@JuliusSweetland You're right that it's possible for one side of an idle TCP connection to disconnect without the other side observing it. Without any keep alives or timeouts a connection can stay in this state indefinitely.

Keep-alives could avoid this. In the meantime, you could have the client periodically send application-level pings/keep-alives or somethings similar.

JuliusSweetland · 2020-06-02T18:21:12Z

Thanks @halter73 so I believe, between the discussed use cases and the fact that it is possible for the recipient of a stream (client or server) to be waiting for a message which isn't coming, that gRPC Core keep alive functionality is required on both the client and server side.

@JamesNK do you agree?

@halter73 I am using manual pings and monitoring in both directions for my bi directional streaming use case, but it would be great to simplify my code and push this down to a lower level.

JamesNK · 2020-09-01T04:24:26Z

Keep alive pings are supported in 5.0. This can be closed.

Best practice documentation related to performance: https://docs.microsoft.com/en-us/aspnet/core/grpc/performance?view=aspnetcore-5.0#keep-alive-pings

ASL07 · 2021-01-28T08:47:15Z

Keepalive options in SocketsHttpHandler don't seem to be available in dotnet-core
Only .NET 5.0 https://docs.microsoft.com/en-us/dotnet/api/system.net.http.socketshttphandler.keepalivepingdelay?view=net-5.0#System_Net_Http_SocketsHttpHandler_KeepAlivePingDelay
Is there a workaround for dotnet-core ?

i3arnon · 2021-01-31T21:30:41Z

Keepalive options in SocketsHttpHandler don't seem to be available in dotnet-core
Only .NET 5.0

.NET 5.0 is .NET Core 5.0. It's just rebranded.
https://devblogs.microsoft.com/dotnet/introducing-net-5/

Kiddinglife · 2023-04-08T21:59:13Z

Can we have grpc.max-idle-connection-timeout. Otherwise the grpc server will cut off connection 60s inactivity and cause stream remove error in the grpc client. This is not expected when you want have stable steaming rpc.

TheLever added the question Further information is requested label Feb 13, 2020

analogrelay added the needs-author-feedback label Feb 19, 2020

JamesNK mentioned this issue Mar 2, 2020

How can I set more ChannelOption, such as keepalive_time_ms? #797

Closed

analogrelay added this to the 5.0 milestone Mar 4, 2020

analogrelay added investigate and removed needs-author-feedback labels Mar 4, 2020

analogrelay assigned JamesNK Mar 4, 2020

JuliusSweetland mentioned this issue May 23, 2020

Termination of idle requests and connections #921

Closed

halter73 mentioned this issue Jun 2, 2020

[WebSockets] How to send Ping/Pong control frames, events for Ping frame sent by client. dotnet/runtime#35473

Closed

JunTaoLuo mentioned this issue Jun 12, 2020

Unable to keep bi-directal streaming connection open #948

Closed

JamesNK closed this as completed Sep 1, 2020

George-Payne mentioned this issue Jan 11, 2021

PersistentSubscription silently drops EventStore/EventStore-Client-NodeJS#112

Closed

xlegalles mentioned this issue May 12, 2023

Migrate to grpc-dotnet camunda-community-hub/zeebe-client-csharp#348

Closed

Tommo56700 mentioned this issue Jan 3, 2025

Definative guidance for keepalive using gRPC server side streaming #2588

Open

Keep alive missing? #770

Keep alive missing? #770

Comments

TheLever commented Feb 13, 2020 • edited Loading

analogrelay commented Feb 19, 2020

ndglover commented Feb 24, 2020 • edited Loading

halter73 commented Feb 24, 2020

thefringeninja commented Mar 6, 2020

JuliusSweetland commented Apr 27, 2020 • edited Loading

JuliusSweetland commented Apr 27, 2020

JuliusSweetland commented May 1, 2020

JamesNK commented May 9, 2020

JuliusSweetland commented May 10, 2020

MaxXor commented May 11, 2020 • edited Loading

vadimi commented May 14, 2020 • edited Loading

JuliusSweetland commented May 25, 2020

JamesNK commented May 26, 2020

chwarr commented May 26, 2020

JuliusSweetland commented May 27, 2020

vadimi commented May 27, 2020

JamesNK commented May 27, 2020

chwarr commented May 27, 2020

JamesNK commented May 27, 2020

chwarr commented May 28, 2020 • edited Loading

JamesNK commented May 28, 2020 • edited Loading

JamesNK commented May 28, 2020

JuliusSweetland commented May 28, 2020

JamesNK commented May 28, 2020

thefringeninja commented May 28, 2020 via email

JamesNK commented May 28, 2020

JuliusSweetland commented May 28, 2020 via email

JamesNK commented May 28, 2020

JuliusSweetland commented May 29, 2020 via email

JamesNK commented May 29, 2020

JuliusSweetland commented May 29, 2020 via email

JuliusSweetland commented Jun 2, 2020

halter73 commented Jun 2, 2020

JuliusSweetland commented Jun 2, 2020

JamesNK commented Sep 1, 2020

ASL07 commented Jan 28, 2021

i3arnon commented Jan 31, 2021

Kiddinglife commented Apr 8, 2023 • edited Loading

TheLever commented Feb 13, 2020 •

edited

Loading

ndglover commented Feb 24, 2020 •

edited

Loading

JuliusSweetland commented Apr 27, 2020 •

edited

Loading

MaxXor commented May 11, 2020 •

edited

Loading

vadimi commented May 14, 2020 •

edited

Loading

chwarr commented May 28, 2020 •

edited

Loading

JamesNK commented May 28, 2020 •

edited

Loading

Kiddinglife commented Apr 8, 2023 •

edited

Loading