Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CloseConnection and invocation cancellation / retry #3151

Open
bernardnormier opened this issue Nov 13, 2024 · 1 comment
Open

CloseConnection and invocation cancellation / retry #3151

bernardnormier opened this issue Nov 13, 2024 · 1 comment
Milestone

Comments

@bernardnormier
Copy link
Member

A connection sends a CloseConnection frame to its peer to initiate a graceful connection close (aka shutdown).

The CloseConnection frame is sent by initiateShutdown only when there is no outstanding upcall, for example:

os.writeByte(Protocol.closeConnectionMsg);

if (_state == StateClosing && _upcallCount == 0)

if (_upcallCount == 0)

For upcallCount, see:

// The upcall count keeps track of the number of dispatches, AMI (response) continuations, sent callbacks and

(it's the same in other languages)

upcallCount == 0 in particular guarantees there is no outstanding dispatch (since a dispatch is an upcall). This includes the dispatch of oneway requests.

Connection.close calls initiateShutdown only once there is no outstanding two-way invocations. See:

doApplicationClose();

doApplicationClose();

but other events (OA deactivation, Communicator destruction) don't check two-way invocations and will happily initiate shutdown (= send CloseConnection) with outstanding two-way invocations:

case ObjectAdapterDeactivated:

Now, this graceful connection closure is usually fast and successful, but it can sometimes take a while and even fail.

The typical reason for taking a while is the peer is busy with its own upcalls:

  • dispatching two-way requests
  • dispatching oneway requests
  • dispatching two-way requests for invocations that have timed-out in the caller (= similar to oneway)

In the current code, we wait until the connection closure is complete to cancel two-way invocations in the peer (= the side that didn't send CloseConnection) that are outstanding when the peer receives the CloseConnection frame:

foreach (OutgoingAsyncBase o in _asyncRequests.Values)

All these invocations are safe to cancel & retry since the CloseConnection frame guarantees none of these requests were dispatched (remember, the initiator waits until upcallCount is 0).

// A CloseConnectionException indicates graceful server shutdown, and is therefore

This wait delays the retrying of the invocations with a new connection (to the same server or another replica). Worse, these retry-able invocations may not be retried at all if the connection closure is not graceful for some reason (typically because it times out).

We should fix this behavior by canceling these invocations sooner--as soon as we receive the CloseConnection frame.

@bernardnormier bernardnormier added this to the Future milestone Nov 13, 2024
@bernardnormier
Copy link
Member Author

In icerpc-csharp's implementation of the ice protocol, we do cancel outstanding invocations immediately when we receive a CloseConnection frame:

https://github.com/icerpc/icerpc-csharp/blob/071848c40365c3fe204f4e1d7866c22e4300dc1a/src/IceRpc/Internal/IceProtocolConnection.cs#L1201

https://github.com/icerpc/icerpc-csharp/blob/071848c40365c3fe204f4e1d7866c22e4300dc1a/src/IceRpc/IceRpcError.cs#L27

You need the retry interceptor to retry automatically on IceRpcError.InvocationCanceled:
https://docs.icerpc.dev/api/csharp/api/IceRpc.Retry.RetryInterceptor.html#IceRpc_Retry_RetryInterceptor_remarks

Without it, the caller gets the IceRpcError.InvocationCanceled error code and can then safely retry at the application level.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant