Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

internal/transport: close transport reader go routine before closing the connection #2

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

eshitachandwani
Copy link
Owner

@eshitachandwani eshitachandwani commented Sep 3, 2024

Fixes: grpc#2869
Previously, clientconn.Close() did not guarantee termination of reader goroutines upon return, as it only waited on channels like ctxDone, writerDone, and goAway.

Now, clientconn.Close() ensures that:

  • It waits for the readerDone channel to signal completion.
  • Transports are closed simultaneously, and clientconn.Close() waits for all transports to be closed before returning.

RELEASE NOTES:

internal/transport : Wait until all transports close and all go routines exit before clientconn.Close() returns

@purnesh42H
Copy link

Some mechanical comments

  1. title of the PR should be in format: <package/area>: PR statement. In your case it could be internal/transport: close transport reader go routine before closing the connection
  2. PR description should be in format: <fixes/address>: <#issue num>
  3. With fix and without fix logs can be in a comment. Also, its better to just copy paste the logs instead of linking unless its too big
  4. transport.close is not a function so let's not mention it that way. You can may be say clientconn.close()
  5. Try to concise the PR description.

@@ -0,0 +1,63 @@
<?xml version="1.0" encoding="UTF-8"?>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove. You can may be configure vscode to be ignored by git

clientconn.go Outdated Show resolved Hide resolved
clientconn.go Outdated Show resolved Hide resolved
case <-t.readerDone:
//Signal received, proceed with closing
default:
//The signal has already been received, so don't wait again
Copy link

@purnesh42H purnesh42H Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could be 2 cases right? 1) Either the close signal for readerDone was already consumed somewhere else or 2) the signal was never sent to channel?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we are sure we don't have 1) so its most likely 2)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but its only failing in the cases where t.close() is being called internally and also by the teardown, so it is closed once, and whenever it is closed, we stop waiting again by using the default case.

@@ -1023,6 +1023,12 @@ func (t *http2Client) Close(err error) {
case <-timer.C:
t.logger.Infof("Failed to write a GOAWAY frame as part of connection close after %s. Giving up and closing the transport.", goAwayLoopyWriterTimeout)
}
select {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can just have single comment above select. Something like // wait for the reader channel to close before proceeding further

In the default case, we should probably log if running in a debug mode. Something similar to what we do in case of writerDone

if t.logger.V(logLevel) {
t.logger.Infof("Closing: %v", err)
}

@eshitachandwani eshitachandwani changed the title Waiting for the reader GO routine to end in transport.close() internal/transport: close transport reader go routine before closing the connection Sep 10, 2024
@@ -1023,6 +1023,14 @@ func (t *http2Client) Close(err error) {
case <-timer.C:
t.logger.Infof("Failed to write a GOAWAY frame as part of connection close after %s. Giving up and closing the transport.", goAwayLoopyWriterTimeout)
}
// Wait for the reader channel to close
select {
case <-t.readerDone:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to happen after the conn is closed, otherwise the reader will never exit.

case <-t.readerDone:
default:
if t.logger.V(logLevel) {
t.logger.Infof("The reader channel has already been closed: %v", err)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not what a default case means in a select with a channel.

I think we can pretty safely just close conn and do a blocking <-t.readerDone.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I did that when I did not know that the tests were failing when t.Close() was being called from inside reader function. I will remove it and refactor the code to make it work.

clientconn.go Outdated Show resolved Hide resolved
if t.logger.V(logLevel) {
t.logger.Infof("The reader channel has already been closed: %v", err)
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also potentially a keepalive goroutine that we need to wait for.

@@ -1110,6 +1110,7 @@ func (cc *ClientConn) ResetConnectBackoff() {

// Close tears down the ClientConn and all underlying connections.
func (cc *ClientConn) Close() error {
var wg sync.WaitGroup
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style: declare this immediately before it's needed.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -1025,6 +1027,11 @@ func (t *http2Client) Close(err error) {
}
t.cancel()
t.conn.Close()
// Wait for the reader channel to close
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments should explain why, not what. So this is // Wait for the reader goroutine to exit to ensure all resources are cleaned up before Close can return.. Similar below. Also make sure comments that are sentences are punctuated correctly (e.g. end with a period).

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -1627,12 +1632,12 @@ func (t *http2Client) readServerPreface() error {
// reader verifies the server preface and reads all subsequent data from
// network connection. If the server preface is not read successfully, an
// error is pushed to errCh; otherwise errCh is closed with no error.
func (t *http2Client) reader(errCh chan<- error) {
func (t *http2Client) readerUtil(errCh chan<- error) error {
defer close(t.readerDone)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, how about instead of another function and the awkward naming problem, do:

func (t ...) reader(....) {
	var err error
	defer func() {
		close(t.readerDone)
		if err != nil {
			t.Close(err)
		}
	}()
	// ...
}

@@ -1684,7 +1688,12 @@ func (t *http2Client) reader(errCh chan<- error) {
case *http2.PingFrame:
t.handlePing(frame)
case *http2.GoAwayFrame:
t.handleGoAway(frame)
{
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the code block here?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because t.handleGoAway also calls t.close which again poses the same problem.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then...I don't think this does what you think? A bare code block has no effect beyond scoping variables.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't understand? I just want that t.Close should be called after the t.readerDone is closed. So, I just return the error from t.handleGoAway and call close when error!=null

Comment on lines +1692 to +1693
err := t.handleGoAway(frame)
if err != nil {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These can be a compound if err := t.handle...(); err != nil { ... }

// keepalive running in a separate goroutine makes sure the connection is alive by sending pings.
func (t *http2Client) keepalive() {
func (t *http2Client) keepaliveUtil() error {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment as above re: this pattern.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -86,7 +86,8 @@ type http2Client struct {
writerDone chan struct{} // sync point to enable testing.
// goAway is closed to notify the upper layer (i.e., addrConn.transportMonitor)
// that the server sent GoAway on this transport.
goAway chan struct{}
goAway chan struct{}
keepAliveDone chan struct{}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: comment for keepAliveDone channel

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// Wait for the reader channel to close
<-t.readerDone
if t.keepaliveEnabled {
<-t.keepAliveDone //Wait for the keepAlive go routine to end

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: space after //

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ClientConn.Close does not wait for connections to be closed before returning
3 participants