Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep alive missing? #770

Closed
TheLever opened this issue Feb 13, 2020 · 38 comments
Closed

Keep alive missing? #770

TheLever opened this issue Feb 13, 2020 · 38 comments
Assignees
Labels
investigate question Further information is requested
Milestone

Comments

@TheLever
Copy link

TheLever commented Feb 13, 2020

For some reason my question text was removed from the question - redoing it here.

Is there a way to keep the connection between client and server alive in a C# gRPC app? My scenario is a bidirectional streaming one (essentially the FullStockTicker demo). What I see is that the server closes the HTTP/2 connection after a few minutes. I have tried to find a way to set the keep alive flag, but that does not seem possible from C#.

I am using Grpc.AspNetCore (2.27.0), Grpc.Tools (2.27.0), Google.Protobuf (3.11.3) and generated clients.

@TheLever TheLever added the question Further information is requested label Feb 13, 2020
@analogrelay
Copy link
Contributor

Can you provide a runnable sample that reproduces this problem? We don't believe the server should be terminating the connection in this situation, so it would help to see the full context of what you're doing.

In addition, if you could enable more detailed logging using the following config in your appSettings.json that would help.

{
  "Logging": {
    "LogLevel": {
      "Default": "Debug",
      "System": "Information",
      "Microsoft": "Trace",
      "Grpc": "Debug"
    }
  }
}

@ndglover
Copy link

ndglover commented Feb 24, 2020

I have just hit the same issue. It's taken a while to track down so might well be worth noting in the docs somewhere. It looks like the work is being done outside of this repo to fix it. See dotnet/aspnetcore#15104

After further investigation this proves not to be the actual issue for me. Apologies for the confusion. I have crossed my original comment.

@halter73
Copy link

@ndglover Can you provide the logs that @anurse asked for? Thanks.

@thefringeninja
Copy link
Contributor

This is something we will definitely need, as we expect our subscriptions to run indefinitely.

@JuliusSweetland
Copy link

JuliusSweetland commented Apr 27, 2020

@JamesNK @anurse Hi James, Andrew, I believe I might need keep alive functionality where I am implementing bidi and server to client streams. These would be long running, and there may be long periods of no messages, which is what I understand the keep alive ping/pong behaviour to be designed for - essentially to test that the connection has not been reset without either side knowing. Would this be a suitable use case for keep alive? If so I'd like to throw my vote in for including it with Grpc-net. Thanks.

@JuliusSweetland
Copy link

If it helps there is one C# project on GitHub which seems to have rolled their own grpc client code which allows them to set channeloptions on the underlying channel:

https://github.com/zeebe-io/zeebe-client-csharp/blob/5d416bc1bb6eccc3898f0727f639ecd1b2bfb434/Client/ZeebeClient.cs#L58

Which uses this GatewayGrpc client:

https://github.com/zeebe-io/zeebe-client-csharp/blob/5d416bc1bb6eccc3898f0727f639ecd1b2bfb434/Client/Impl/proto/GatewayGrpc.cs

@JuliusSweetland
Copy link

@JamesNK @anurse Hi again James, Andrew, can I ask where support for gRPC keepalive might be on your priority list? At present I am rolling my own heart beating, which is very much sub-optimal. I would love to rip out some of my code and use the underlying mechanism to get confidence that longstanding tcp connections are still connected. Thanks.

@JamesNK
Copy link
Member

JamesNK commented May 9, 2020

At the moment it is low priority because we haven't had many people ask for it. What do you want built-in heart beats for? Is it perf, i.e. keeping the TCP connection open so there is not a delay reestablishing the connection after a pause?

@JuliusSweetland
Copy link

A combination of requiring long running streams for performance reasons, but I also require an up to date view of which streaming connections are still live as we require a client to have connections to certain streams before we allow them to perform operations against other endpoints. If we cannot guarantee the client connections are currently active then we cannot enforce policies which we require.

@MaxXor
Copy link

MaxXor commented May 11, 2020

@JamesNK I'm planning to switch to gRPC but this issue kept me back. My project has long-running idle connections and need to be responsive as soon as they are used.

@vadimi
Copy link

vadimi commented May 14, 2020

We are using keepalives in grpc core when exposing grpc services through AWS NLB load balancer, without them the load balancer would just non gracefully close tcp connections and grpc clients would get errors. My understanding is Azure load balancer works the same way. When running services in kubernetes it also has idle timeout settings (https://kubernetes.io/docs/reference/command-line-tools-reference/kube-proxy/) which will non gracefully terminate tcp connections as well. Keepalives really help in these scenarios which makes apps and services a lot more stable.

@JuliusSweetland
Copy link

@JamesNK Hi James - has any of the above feedback influenced how you are thinking about Keep Alive?

@JamesNK
Copy link
Member

JamesNK commented May 26, 2020

I'm planning to investigate what options are in Grpc.Core, and what ones would be useful to have in grpc-dotnet.

If people can give feedback on which settings are important, and why, then that would be useful.

To set expectations: Adding new features to the gRPC on the server - including Kestrel - will be easier than the client. I can prioritize implementing server-side features myself. Client-side features that need to be added to HttpClient are done by a different team. They might not make the cut in the .NET 5 time-frame.

@chwarr
Copy link

chwarr commented May 26, 2020

Below I've included what my product is currently using to keep Azure load balancers alive when we have a lull in traffic and our gRPC channels don't have any active calls or messages. (The Azure load balancers will sometimes drop idle connections without sending RST packets to either endpoint.)

We use keepalives to

  1. reduce latency by having a connection to our backend already established (don't have to do TCP, TLS, gRPC, &c. handshakes while in the middle of a client's request) and
  2. improve reliability by detecting when connections die during idle times.

As has been mentioned before, yes, we could do this by adding some sort of heartbeat in our application protocol. The keepalive functionality in the gRPC code library has, thus far, been sufficient.

This is an excerpt of the keepalive options that are set on a gRPC code channel.

The ExtendedChannelOptions class is a collection of C# string constants for the gRPC code channel options. It's definition is below.

// This channel argument controls the period (in milliseconds) after which a
// keepalive ping is sent on the transport.
new ChannelOption(ExtendedChannelOptions.KeepAliveTimeMilliseconds, 2500),

// This channel argument controls the amount of time (in milliseconds), the sender of
// the keepalive ping waits for an acknowledgment. If it does not receive an
// acknowledgment within this time, it will close the connection.
new ChannelOption(ExtendedChannelOptions.KeepAliveTimeoutMilliseconds, 60000),

// This channel argument if set to 1 (0 : false; 1 : true), allows keepalive pings to be sent even if there are no calls in flight.
new ChannelOption(ExtendedChannelOptions.KeepAlivePermitWithoutCalls, 1),

// This channel argument controls the maximum number of pings that can be sent when
// there is no other data (data frame or header frame) to be sent. GRPC Core will not
// continue sending pings if we run over the limit. Setting it to 0 allows sending
// pings without sending data.
new ChannelOption(ExtendedChannelOptions.MaxPingsWithoutData, 0),

// If there is no data being sent on the transport, this channel argument controls
// the minimum time (in milliseconds) gRPC Core will wait between successive pings.
new ChannelOption(ExtendedChannelOptions.MinTimeBetweenPingsMilliseconds, 1000),
ExtendedChannelOptions definition
public static class ExtendedChannelOptions
{
    /// <summary>
    /// This channel argument controls the period (in milliseconds) after
    /// which a keepalive ping is sent on the transport.
    /// </summary>
    public const string KeepAliveTimeMilliseconds = "grpc.keepalive_time_ms";

    /// <summary>
    /// This channel argument controls the amount of time (in
    /// milliseconds), the sender of the keepalive ping waits for an
    /// acknowledgement. If it does not receive an acknowledgement within
    /// this time, it will close the connection.
    /// </summary>
    public const string KeepAliveTimeoutMilliseconds = "grpc.keepalive_timeout_ms";

    /// <summary>
    /// This channel argument if set to 1 (0 : false; 1 : true), allows
    /// keepalive pings to be sent even if there are no calls in flight.
    /// </summary>
    public const string KeepAlivePermitWithoutCalls = "grpc.keepalive_permit_without_calls";

    /// <summary>
    /// The load balancer policy that should be used.
    /// </summary>
    public const string LoadBalancerPolicyName = "grpc.lb_policy_name";

    /// <summary>
    /// This channel argument controls the maximum number of pings that can
    /// be sent when there is no other data (data frame or header frame) to
    /// be sent. GRPC Core will not continue sending pings if we run over
    /// the limit. Setting it to 0 allows sending pings without sending
    /// data.
    /// </summary>
    public const string MaxPingsWithoutData = "grpc.http2.max_pings_without_data";

    /// <summary>
    /// If there is no data being sent on the transport, this channel
    /// argument controls the minimum time (in milliseconds) gRPC Core will
    /// wait between successive pings.
    /// </summary>
    public const string MinTimeBetweenPingsMilliseconds = "grpc.http2.min_time_between_pings_ms";

    /// <summary>
    /// If there is no data being sent on the transport, this channel
    /// argument on the server side controls the minimum time (in
    /// milliseconds) that gRPC Core would expect between receiving
    /// successive pings. If the time between successive pings is less that
    /// than this time, then the ping will be considered a bad ping from
    /// the peer. Such a ping counts as a ‘ping strike’. On the client
    /// side, this does not have any effect.
    /// </summary>
    public const string MinPingIntervalWithoutDataMilliseconds = "grpc.http2.min_ping_interval_without_data_ms";

    /// <summary>
    /// This arg controls the maximum number of bad pings that the server
    /// will tolerate before sending an HTTP2 GOAWAY frame and closing the
    /// transport. Setting it to 0 allows the server to accept any number
    /// of bad pings.
    /// </summary>
    public const string MaxPingStrikes = "grpc.http2.max_ping_strikes";

    /// <summary>
    /// Maximum time that a channel may have no outstanding rpcs. Int valued,
    /// milliseconds.INT_MAX means unlimited.
    /// </summary>
    public const string MaxConnectionIdleMilliseconds = "grpc.max_connection_idle_ms";
}

@JuliusSweetland
Copy link

@JamesNK As detailed in the previous post. For me the most useful would be the first 3 options:

/// <summary>
/// This channel argument controls the period (in milliseconds) after
/// which a keepalive ping is sent on the transport.
/// </summary>
public const string KeepAliveTimeMilliseconds = "grpc.keepalive_time_ms";

/// <summary>
/// This channel argument controls the amount of time (in
/// milliseconds), the sender of the keepalive ping waits for an
/// acknowledgement. If it does not receive an acknowledgement within
/// this time, it will close the connection.
/// </summary>
public const string KeepAliveTimeoutMilliseconds = "grpc.keepalive_timeout_ms";

/// <summary>
/// This channel argument if set to 1 (0 : false; 1 : true), allows
/// keepalive pings to be sent even if there are no calls in flight.
/// </summary>
public const string KeepAlivePermitWithoutCalls = "grpc.keepalive_permit_without_calls";

@vadimi
Copy link

vadimi commented May 27, 2020

@JamesNK here are the options that we normally use for keepalives on the client:

var opts = new List<ChannelOption>();
opts.Add(new ChannelOption("grpc.keepalive_time_ms", connOptions.KeepaliveTimeout));
opts.Add(new ChannelOption("grpc.keepalive_timeout_ms", ConnectionOptions.DefaultKeepaliveWaitTimeout));
opts.Add(new ChannelOption("grpc.keepalive_permit_without_calls", 1));
opts.Add(new ChannelOption("grpc.http2.min_time_between_pings_ms", connOptions.TimeBetweenPings));
opts.Add(new ChannelOption("grpc.http2.max_pings_without_data", 0));

on the server this one is useful to disable keepalive policy if needed:

new ChannelOption("grpc.http2.max_ping_strikes", 0)

not strictly related to keepalives, but we also use grpc.max_connection_idle_ms and grpc.max_connection_age_ms

@JamesNK
Copy link
Member

JamesNK commented May 27, 2020

Is the typical pattern here that the client is responsible for keeping the connection alive? i.e. the client pings the server on an interval and the server is only responsible for replying to the ping

@chwarr
Copy link

chwarr commented May 27, 2020

We run the pings in both directions so that both client and server can detect when traffic has disappeared into a black hole during idle times. (AFAIK, there's no setting to anticipate pings and update connectivity state if one hasn't been seen in a while.)

For us, client pinging server would be more useful than server pinging clients. We do have bi-directional streaming, but the idle times are usually between calls, not while a call is active. (E.g., my service doesn't have long-lived calls pushing data to clients. Yet. 😉)

@JamesNK
Copy link
Member

JamesNK commented May 27, 2020

I briefly looked into Grpc.Core's keep-alive pings a few months ago. The packets in Wireshark from pings indicated that they were connection level pings. The pings had a StreamID=0. I didn't see any stream level pings where the ping's StreamID=ActiveStreamID.

Do pings serve any purpose when it comes to keeping streams alive? I am thinking about potentially long running calls on those streams like bi-directional streaming that might not actively be sending messages.

@chwarr
Copy link

chwarr commented May 28, 2020

AFAIK, it just keeps the TCP, TLS, and HTTP2 connection alive. (My product doesn't have long-lived streams.) To control stream/call lifetime, I think one needs to use deadlines.

@JamesNK
Copy link
Member

JamesNK commented May 28, 2020

Ok.

Envoy supports timing out individual streams - see stream_idle_timeout at https://www.envoyproxy.io/docs/envoy/latest/api-v2/config/filter/network/http_connection_manager/v2/http_connection_manager.proto

I think to keep a long-running streaming call open in an environment where stream_idle_timeout is enabled you would need to send occasional dummy keepalive messages through gRPC.

@JamesNK
Copy link
Member

JamesNK commented May 28, 2020

Proposal for APIs added to HttpClient - dotnet/runtime#31198 (comment)

These could then be set when creating a handler for a gRPC channel.

@JuliusSweetland
Copy link

@JamesNK "Is the typical pattern here that the client is responsible for keeping the connection alive?"

For me it's server as well. My server maintains client subscriptions and does work for each subscriber, so knowing when the connection is dropped and cleanup can occur is useful.

@JamesNK
Copy link
Member

JamesNK commented May 28, 2020

Keep alive doesn't tell you when a connection is dropped. Its purpose it to send regular pings so connections are not closed from inactivity.

@thefringeninja
Copy link
Contributor

thefringeninja commented May 28, 2020 via email

@JamesNK
Copy link
Member

JamesNK commented May 28, 2020

But if you don't get a ping back surely that means the connection was dropped?

If the client or server doesn't get a ping back in time then it means there is a problem and the connection will be dropped. What I meant is there is no API as part of this feature that will tell you a ping timed out and connection was dropped.

@JuliusSweetland
Copy link

JuliusSweetland commented May 28, 2020 via email

@JamesNK
Copy link
Member

JamesNK commented May 28, 2020

If you were awaiting the response stream for more data and the connection was terminated then you would immediately get an error that the connection has closed.

@JuliusSweetland
Copy link

JuliusSweetland commented May 29, 2020 via email

@JamesNK
Copy link
Member

JamesNK commented May 29, 2020

Operations awaiting HttpClient's request and response stream will error if the TCP connection is broken.

@JuliusSweetland
Copy link

JuliusSweetland commented May 29, 2020 via email

@JuliusSweetland
Copy link

@JamesNK To clarify the above question: assume a streaming server. The client is awaiting the next response when an intermediary router is powered down and the connection is left half open. The server detects this and closes, but what happens with the client? What stops the client from sitting and waiting forever?

@halter73
Copy link

halter73 commented Jun 2, 2020

@JuliusSweetland You're right that it's possible for one side of an idle TCP connection to disconnect without the other side observing it. Without any keep alives or timeouts a connection can stay in this state indefinitely.

Keep-alives could avoid this. In the meantime, you could have the client periodically send application-level pings/keep-alives or somethings similar.

@JuliusSweetland
Copy link

Thanks @halter73 so I believe, between the discussed use cases and the fact that it is possible for the recipient of a stream (client or server) to be waiting for a message which isn't coming, that gRPC Core keep alive functionality is required on both the client and server side.

@JamesNK do you agree?

@halter73 I am using manual pings and monitoring in both directions for my bi directional streaming use case, but it would be great to simplify my code and push this down to a lower level.

@JamesNK
Copy link
Member

JamesNK commented Sep 1, 2020

Keep alive pings are supported in 5.0. This can be closed.

Best practice documentation related to performance: https://docs.microsoft.com/en-us/aspnet/core/grpc/performance?view=aspnetcore-5.0#keep-alive-pings

@ASL07
Copy link

ASL07 commented Jan 28, 2021

Keepalive options in SocketsHttpHandler don't seem to be available in dotnet-core
Only .NET 5.0 https://docs.microsoft.com/en-us/dotnet/api/system.net.http.socketshttphandler.keepalivepingdelay?view=net-5.0#System_Net_Http_SocketsHttpHandler_KeepAlivePingDelay
Is there a workaround for dotnet-core ?

@i3arnon
Copy link

i3arnon commented Jan 31, 2021

Keepalive options in SocketsHttpHandler don't seem to be available in dotnet-core
Only .NET 5.0

.NET 5.0 is .NET Core 5.0. It's just rebranded.
https://devblogs.microsoft.com/dotnet/introducing-net-5/

@Kiddinglife
Copy link

Kiddinglife commented Apr 8, 2023

Can we have grpc.max-idle-connection-timeout. Otherwise the grpc server will cut off connection 60s inactivity and cause stream remove error in the grpc client. This is not expected when you want have stable steaming rpc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
investigate question Further information is requested
Projects
None yet
Development

No branches or pull requests