Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quic: rcmgr: Fix connection accounting #2025

Closed
wants to merge 1 commit into from

Conversation

MarcoPolo
Copy link
Collaborator

The quic transport didn't end the connection reservation in some error cases. The fix here is to use the "err defer" pattern to handle all error cases by checking for errors in a defer statement and finishing the scope then. This pattern is more resilient to future changes since future code doesn't have to remember to free resources on every err return statement.

Tested by having Kubo run for a while.

Before:
Screenshot 2023-01-27 at 10 09 03 AM

After:
Screenshot 2023-01-27 at 12 30 16 PM

This was debugged with this commit. It would give some memory to the connections in the swarm and then print out any connection scopes that didn't have memory and were not done. This meant that they were created but not passed down to the swarm and not finished.

Copy link
Contributor

@marten-seemann marten-seemann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Do we have the same problem in the WebTransport transport?

@@ -84,7 +82,7 @@ func (l *listener) Accept() (tpt.CapableConn, error) {
}
}

func (l *listener) setupConn(qconn quic.Connection) (*conn, error) {
func (l *listener) setupConn(qconn quic.Connection) (_c *conn, _err error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the _ needed? I've never seen this pattern.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed, I don't want to shadow or deal with shadowed vars. I also don't want this to be used in the body of the function.

I also want to avoid the use of the named return values except for this deferred error handling.

If you have a better naming suggestion I'm open to it.

@marten-seemann
Copy link
Contributor

There's a flaky rcmgr test:

  === RUN   TestResourceManagerAcceptDenied/reuseport_off
      conn_test.go:59: using a RSA key: QmUJNXf8tyYMn2LJbsBoTBBT9Xf2YXDBKSK4KV9dc3G5Yh
      conn_test.go:59: using a Secp256k1 key: 16Uiu2HAm5X88sRQ9t6x9JVuR3h9Bp9jApD83kRUXF9AawHGbtvA2
      controller.go:269: missing call(s) to *mocknetwork.MockConnManagementScope.Done() /home/runner/work/go-libp2p/go-libp2p/p2p/transport/quic/conn_test.go:238
      controller.go:269: aborting test due to missing call(s)
  --- FAIL: TestResourceManagerAcceptDenied (1.17s)
      --- PASS: TestResourceManagerAcceptDenied/reuseport_on (0.48s)
      --- FAIL: TestResourceManagerAcceptDenied/reuseport_off (0.69s)

@BigLep
Copy link
Contributor

BigLep commented Jan 27, 2023

Thanks @MarcoPolo . Do you think this would have impacted System.ConsInbound? If so, I agree it would be nice to include in Kubo 0.18.1 . If not, I think it can wait.

@MarcoPolo
Copy link
Collaborator Author

Do we have the same problem in the WebTransport transport?

WebTransport doesn't use the err defer pattern but looks okay.

@MarcoPolo
Copy link
Collaborator Author

Do you think this would have impacted System.ConsInbound

Looking at the data and the code, conns inbound should be safe from this miscount.

@marten-seemann
Copy link
Contributor

Thanks @MarcoPolo . Do you think this would have impacted System.ConsInbound? If so, I agree it would be nice to include in Kubo 0.18.1 . If not, I think it can wait.

It looks like this won't affect the code path that Kubo is using for its resource management in v0.18. Cutting and bubbling up a patch release is pretty low overhead though, so we should consider doing it anyway. This has confused Antonio and it has come up in https://github.com/protocol/bifrost-infra/issues/2308. We might be able to avoid some spurious bug reports if we get this fix out asap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants