Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for sending error codes on session close #121

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

sukunrt
Copy link
Member

@sukunrt sukunrt commented Aug 26, 2024

This adds support for sending error codes on closing a session.

The error isn't guaranteed to be sent to remote. It depends on the LINGER value for the TCP socket and also on whether there's any unread data in the receive buffer when close was called. In both these situations, the GoAway frame might be dropped.

To reliably send error codes, we'd have to send a TCP FIN packet and wait for remote its half of the connection. This would also require sending everything that's pending in the kernel write buffer. To not introduce this 1RTT delay of closing, I've opted to make this a best effort implementation.

@sukunrt sukunrt changed the title add support for sening error codes on session close add support for sending error codes on session close Aug 26, 2024
@sukunrt sukunrt force-pushed the sukun/conn-error-2 branch 4 times, most recently from af1ac71 to 18a75f1 Compare August 26, 2024 16:35
@Stebalien
Copy link
Member

Is there an issue describing how this will be used? I usually want to send errors when closing a stream, less when closing a connection. Is the plan to use this in the connection manager?

@sukunrt
Copy link
Member Author

sukunrt commented Aug 27, 2024

Apologies! The specs are here: libp2p/specs#623
The corresponding change in go-libp2p. There's only quic support for now. libp2p/go-libp2p#2927

In go-libp2p we will mostly use Connection Close error codes from the connection manager. Applications can define their error codes.

Stream error codes will be introduced in a separate PR.

@sukunrt sukunrt marked this pull request as ready for review August 27, 2024 10:13
@Stebalien
Copy link
Member

Oh, I see.

Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it probably works, but blocking on connection close is "new" (as far as I know) so we need to make sure it's not going to cause issues with other parts of the code.

Comment on lines +289 to +291
// Attempts to send a GoAway before closing the connection. The GoAway may not actually be sent depending on the
// semantics of the underlying net.Conn. For TCP connections, it may be dropped depending on LINGER value or
// if there's unread data in the kernel receive buffer.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why send an error in this case? I'm concerned that we're changing the semantics to block for up to 15 seconds.

// The GoAway may not actually be sent depending on the semantics of the underlying net.Conn.
// For TCP connections, it may be dropped depending on LINGER value or if there's unread data in the kernel
// receive buffer.
func (s *Session) CloseWithError(errCode uint32) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we updated the connection manager to be able to deal with potential blocking here? Also, we should probably document it.

const.go Outdated
@@ -117,6 +152,7 @@ const (
// It's not an implementation choice, the value defined in the specification.
initialStreamWindow = 256 * 1024
maxStreamWindow = 16 * 1024 * 1024
goAwayWaitTime = 5 * time.Second
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd make this_much shorter. E.g., 100ms. We shouldn't need to wait on a round-trip here.

@sukunrt
Copy link
Member Author

sukunrt commented Aug 28, 2024

Blocking on a connection is unfortunate. The "correct" way here would be to send the error code with the RST packet. The latest TCP RFC also recommends this: https://www.rfc-editor.org/rfc/rfc9293.html#name-reset-processing

TCP implementations SHOULD allow a received RST segment to include data (SHLD-2). It has been suggested that a RST segment could contain diagnostic data that explains the cause of the RST. No standard has yet been established for such data.

But no implementation provides this API at the moment.

@sukunrt
Copy link
Member Author

sukunrt commented Aug 28, 2024

but blocking on connection close is "new"

We can consider running this Async in a different goroutine. That'll use more memory and mess up the resource manager accounting for a short duration, but it'll cause less issues with existing code.

The current implementation wouldn't block in most of the cases. If there's receive window available at the remote end, it wont block.

@Stebalien
Copy link
Member

So, we usually only close connections from the connection manager, right? IMO, we should consider:

  1. Being more aggressive on timeouts.
  2. Close in parallel (in the connection manager).

On the other hand.... spawning a goroutine isn't terrible. The resource consumption is fairly minimal in modern go and blocking will tie up resources just the same.

@Stebalien
Copy link
Member

Actually... we already have a goroutine. Can we reuse it? Are we not accounting for that one in the resource manager?

@MarcoPolo
Copy link
Contributor

So, we usually only close connections from the connection manager, right?

A common case will also be the resource manager closing connections after seeing some limit has been reached.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants