Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide custom keepalive ping handler #2166

Closed
gyuho opened this issue Jun 20, 2018 · 5 comments
Closed

Provide custom keepalive ping handler #2166

gyuho opened this issue Jun 20, 2018 · 5 comments
Labels
Type: Feature New features or improvements in behavior

Comments

@gyuho
Copy link
Contributor

gyuho commented Jun 20, 2018

What version of gRPC are you using?

v1.12.2

What version of Go are you using (go version)?

Go 1.10.3

What operating system (Linux, Windows, …) and version?

Linux

What did you do?

Use keepalive.

What did you expect to see?

Customizable keepalive ping handler.

What did you see instead?

Internal to gRPC.


Currently, server can only configure keepalive "duration":

// ServerParameters is used to set keepalive and max-age parameters on the server-side.
type ServerParameters struct {
// MaxConnectionIdle is a duration for the amount of time after which an idle connection would be closed by sending a GoAway.
// Idleness duration is defined since the most recent time the number of outstanding RPCs became zero or the connection establishment.
MaxConnectionIdle time.Duration // The current default value is infinity.

And its keepalive handler is internal to gRPC:

func (t *http2Server) keepalive() {
p := &ping{}
var pingSent bool
maxIdle := time.NewTimer(t.kp.MaxConnectionIdle)

etcd uses this keepalive ping to detect unavailable server or client for stream RPCs (e.g. Watch API).

etcd is typically clustered of 3 to 5 nodes. Client watch stream may failover to other nodes, when a node become unavailable. However, once a stream is made to a node, which then becomes network-partitioned, the watch client would get stuck with the partitioned node.

Related issues are:

etcd's current workaround is to specify WithRequireLeader in client context to pass metadata to the server interceptor; but, this is optional in etcd. Another workaround in etcd is to add add stream progress request etcd-io/etcd#9869 to check if the node is up-to-date.

It would be very useful if etcd can configure server-side keepalive ping handler to do the following:

  1. receive client ping
  2. receiver node checks if the node has an active leader
  3. if the node lost the leader, close the client connection

Do we have any plan to expose keepalive ping handler in gRPC?

/cc @jpbetz @xiang90

@dfawley
Copy link
Member

dfawley commented Jun 21, 2018

@gyuho, no, there are no current plans to do this. Is there a reason you want to perform this check when you get keepalive pings as opposed to alternatives (e.g. polling periodically)? It may be possible to do something similar using a custom handshaker on the server that kills the connection when certain conditions arise, or by making an RPC initiated by the client that takes the place of the keepalive pings. I'm not sure I fully understand your design, however, so this may or may not help.

@gyuho
Copy link
Contributor Author

gyuho commented Jun 22, 2018

@dfawley I wasn't clear enough.

Our main use case for keepalive ping is to detect unresponsive "server" from "client" side for long-running stream RPCs. Keepalive is used for detecting disconnects in transport layer, but no way to extend it to support application-layer conditions (as mentioned above, partitioned node keepalive server handler will still receive keepalive pings from server, thus marked as active and making client stream be stuck with the partitioned node).

Is there a reason you want to perform this check when you get keepalive pings as opposed to alternatives (e.g. polling periodically)?

We want to use keepalive since it's light (8-byte HTTP/2 ping) and built-in to gRPC server. We have a similar use case in gRPC server interceptor layer where every RPC goes through leader-check to see if there's an active leader or not. I would imagine we could do the same thing for keepalive handler.

We haven't tried https://godoc.org/google.golang.org/grpc/credentials#TransportCredentials but seems like this is for connection handshake?

Thanks!

@dfawley
Copy link
Member

dfawley commented Jun 22, 2018

@guyho,

If you were to kill the connection, how would the client know which node to reconnect to?

Could you perform this check in a server interceptor and kill the watch stream (instead of the connection itself) when the partitioning happens? Note that your streaming interceptor has full control of the stream and can terminate it early with an error at any time -- not just at the start of the RPC or when traffic happens. E.g. something like this might work:

func interceptor(...) error {
  errChan := make(chan error, 1)
  go func() {
    errChan<-handler(srv, ss)
    close(errChan)
  }()
  select {
    case <-partitioned:  // written to by something monitoring for partitioning
      return partitionedErr
    case err := <-errChan:
      return err
    }
  }
}

We haven't tried godoc.org/google.golang.org/grpc/credentials#TransportCredentials but seems like this is for connection handshake?

TransportCredentials is intended for auth, but could be used to do anything at a connection-level that was desired as well. That would be a bit of an abuse, though, so probably not a great idea. Instead, if you really wanted connection-level control, the Listener you give to gRPC could be wrapped in something that wraps the Conns it hands out that do checks like this.

@gyuho
Copy link
Contributor Author

gyuho commented Jun 22, 2018

@dfawley

If you were to kill the connection, how would the client know which node to reconnect to?

Once disconnected, etcd clientv3 balancer picker policy will either round robin or switch other nodes based on latest node health status to retry the requests. We just need some indication that this node is not responding.

Could you perform this check in a server interceptor and kill the watch stream (instead of the connection itself) when the partitioning happens?

Yes, and we support that as an option (user can specify require-leader metadata in their contexts).

Instead, if you really wanted connection-level control, the Listener you give to gRPC could be wrapped in something that wraps the Conns it hands out that do checks like this.

I see. We will look into this.

Was just curious if gRPC team has any plan to support the custom keepalive handler. If you recommend server interceptor pattern, we are also happy with it for now.

Thanks for response! And please feel free to close it if there's no plan to support the custom keepalive handler in the near future. We will revisit when we find other use cases.

@dfawley
Copy link
Member

dfawley commented Jun 22, 2018

Yes, I think the interceptor pattern would be recommended for this. Keepalives should only be necessary for our internal implementation, so I would like to avoid exporting a hook for them unless there is a strong enough need to justify it (i.e. there's no other/better way to accomplish the same thing). Thanks for the request.

@dfawley dfawley closed this as completed Jun 22, 2018
@lock lock bot locked as resolved and limited conversation to collaborators Dec 19, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Type: Feature New features or improvements in behavior
Projects
None yet
Development

No branches or pull requests

3 participants