Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added logs for reasons causing connection and transport close #5840

Merged
merged 8 commits into from
Dec 9, 2022

Conversation

zasweq
Copy link
Contributor

@zasweq zasweq commented Dec 3, 2022

A flaky keepalive test is trying to read and write to a closed network connection. In some of the cases, the server was receiving too many pings, which led to one of the root causes of flakiness and the subsequent fix, but I couldn't figure out why the network connection was closing for any other case. Discussing this, the go team realized that the transport layer logs with respect to transport and connection closure were very sparse, and populating logs with information as to why transports and connections were triggered to close could help a user debug.

RELEASE NOTES:

  • transport: Added logs for transport and connection closure

@zasweq zasweq added the Type: Internal Cleanup Refactors, etc label Dec 3, 2022
@zasweq zasweq added this to the 1.52 Release milestone Dec 3, 2022
@zasweq zasweq force-pushed the log-closures-transport-layer branch from 9382de1 to cfba279 Compare December 3, 2022 03:41
@@ -416,6 +416,9 @@ func (c *controlBuffer) get(block bool) (interface{}, error) {
select {
case <-c.ch:
case <-c.done:
if logger.V(logLevel) {
logger.Infof("transport: trigger loopy to exit (with a subsequent conn closure) because transports context expired.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is ErrConnClosing significant here? If not, can we instead return a meaningful error and capture all the exit reasons via a single log after loopy returns (which IIUC is already there)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't look like ErrConnClosing is used as a conditional/invariant anywhere in the system. Shoving this message/information into Error and logging at loopyWriter level sgtm.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/transports/transport's/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done @easwars.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I'm blind, and ErrConnClosing is used as a conditional in many many places at the control buf layer. Leaving this as is for now, unless you have a better suggestion.

internal/transport/http2_client.go Outdated Show resolved Hide resolved
internal/transport/http2_client.go Outdated Show resolved Hide resolved
internal/transport/http2_client.go Outdated Show resolved Hide resolved
internal/transport/http2_client.go Outdated Show resolved Hide resolved
internal/transport/http2_client.go Outdated Show resolved Hide resolved
internal/transport/http2_server.go Outdated Show resolved Hide resolved
internal/transport/http2_server.go Outdated Show resolved Hide resolved
@dfawley dfawley assigned zasweq and unassigned dfawley Dec 5, 2022
@@ -416,6 +416,9 @@ func (c *controlBuffer) get(block bool) (interface{}, error) {
select {
case <-c.ch:
case <-c.done:
if logger.V(logLevel) {
logger.Infof("transport: trigger loopy to exit (with a subsequent conn closure) because transports context expired.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/transports/transport's/

internal/transport/http2_client.go Outdated Show resolved Hide resolved
internal/transport/http2_server.go Outdated Show resolved Hide resolved
Copy link
Contributor Author

@zasweq zasweq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comments y'all!

internal/transport/http2_client.go Outdated Show resolved Hide resolved
internal/transport/http2_server.go Outdated Show resolved Hide resolved
@@ -416,6 +416,9 @@ func (c *controlBuffer) get(block bool) (interface{}, error) {
select {
case <-c.ch:
case <-c.done:
if logger.V(logLevel) {
logger.Infof("transport: trigger loopy to exit (with a subsequent conn closure) because transports context expired.")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done @easwars.

internal/transport/http2_client.go Outdated Show resolved Hide resolved
internal/transport/http2_client.go Outdated Show resolved Hide resolved
internal/transport/http2_client.go Outdated Show resolved Hide resolved
internal/transport/http2_client.go Outdated Show resolved Hide resolved
@@ -416,6 +416,9 @@ func (c *controlBuffer) get(block bool) (interface{}, error) {
select {
case <-c.ch:
case <-c.done:
if logger.V(logLevel) {
logger.Infof("transport: trigger loopy to exit (with a subsequent conn closure) because transports context expired.")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I'm blind, and ErrConnClosing is used as a conditional in many many places at the control buf layer. Leaving this as is for now, unless you have a better suggestion.

internal/transport/http2_server.go Outdated Show resolved Hide resolved
internal/transport/http2_server.go Outdated Show resolved Hide resolved
@zasweq zasweq assigned dfawley and unassigned zasweq Dec 6, 2022
Copy link
Contributor

@easwars easwars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clientconn.go Outdated
@@ -1274,6 +1274,9 @@ func (ac *addrConn) createTransport(addr resolver.Address, copts transport.Conne

newTr, err := transport.NewClientTransport(connectCtx, ac.cc.ctx, addr, copts, onGoAway, onClose)
if err != nil {
if logger.V(2) {
logger.Errorf("NewClientTransport failed to connect to %s. Err: %v", addr, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find that it is better to avoid filler words like "failed" etc in error strings and it leads to better/concise error strings. This could be written as:

logger.Errorf("NewClientTransport(%q): %v", addr, err)

or if you prefer ,ore human readable names over function names

logger.Errorf("Creating new client transport to %q: %v", addr, err)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I like human readable for the high level concept rather than the specific implementation detail (and that exported func just forwards call to internal method). Switched to second option. Thanks for suggestion.

@@ -416,6 +416,9 @@ func (c *controlBuffer) get(block bool) (interface{}, error) {
select {
case <-c.ch:
case <-c.done:
if logger.V(logLevel) {
logger.Infof("transport: trigger loopy to exit (with a subsequent conn closure) because transport's context expired.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if this is appropriate at INFO level or is WARNING more appropriate?

See: https://github.com/grpc/grpc-go/blob/master/Documentation/log_levels.md

Same for other logs in this file/PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this case, ErrConnClosing was being returned to the run() goroutine of loopy, which would trigger the exit with an info log. However, I am changing this to plumb data into ErrConnClosing (or a certain type), and will log the error message plumbed in (i.e. I will put the message from this into the error) because ErrConnClosing has nothing in it currently. See https://github.com/grpc/grpc-go/blob/master/internal/transport/controlbuf.go#L536 for how it currently works on master.

func (ht *serverHandlerTransport) Close() {
func (ht *serverHandlerTransport) Close(err error) {
if logger.V(logLevel) {
logger.Infof("closing serverHandlerTransport due to err: %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. A concise error message could look something like
logger.Infof("Closing server handler transport: %v", err)

Also, note that log messages should start with a capitalized string unlike error messages. Applies here and everywhere else in this PR (unless ofcourse where it starts with the package name).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to concise and capitalized beginning. NEED TO GET TO OTHER ONES WHICH REQUIRE CAPITALIZED STRING.

@@ -236,7 +239,7 @@ func (ht *serverHandlerTransport) WriteStatus(s *Stream, st *status.Status) erro
})
}
}
ht.Close()
ht.Close(fmt.Errorf("finished writing status"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

finished writing status.

Why is this an error?

Is this error propagated to the user? Or is this used by another component, and if so how?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this felt a bit weird to me too. As part of this PR, Doug wanted a general point we log errors that close transport to be in the Close() method on the Client Transport. To avoid my multiple code additions on the server transport at each Close() call (which didn't take an error server side) we decided to log the error/transport is closing in the Close() call server side, which required adding err to API. This was passed simply so that the transport could have a reason to explain why it closed. I'll switch this to errors.New...and if you or Doug prefers this or another way (even sending a nil error) I can switch it.

@@ -346,7 +349,7 @@ func (ht *serverHandlerTransport) HandleStreams(startStream func(*Stream), trace
case <-ht.req.Context().Done():
}
cancel()
ht.Close()
ht.Close(fmt.Errorf("request is done processing"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. Why is the fact that you are done processing with the request an error message. What value does it provide and to whom?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same response as previous comment.

t.Close(connectionErrorf(true, nil, "keepalive ping failed to receive ACK within timeout"))
err := connectionErrorf(true, nil, "keepalive ping failed to receive ACK within timeout")
if logger.V(logLevel) {
logger.Errorf("transport: closing transport due to error in keepalive handling: %v.", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you actually look at how this log statement shows up in practice? Should you be avoiding some duplicate information?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry I was supposed to delete this, I switched all these logs to log in Close with the data in the error passed in at beginning of Close for consolidation/consistency/debugability.

internal/transport/http2_server.go Outdated Show resolved Hide resolved
internal/transport/http2_server.go Outdated Show resolved Hide resolved
internal/transport/http2_server.go Outdated Show resolved Hide resolved
server.go Outdated Show resolved Hide resolved
@easwars easwars assigned zasweq and unassigned easwars Dec 6, 2022
@dfawley dfawley removed their assignment Dec 6, 2022
Copy link
Contributor Author

@zasweq zasweq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comments/pass :D!

internal/transport/http2_client.go Outdated Show resolved Hide resolved
internal/transport/http2_server.go Outdated Show resolved Hide resolved
@@ -447,7 +453,7 @@ func newHTTP2Client(connectCtx, ctx context.Context, addr resolver.Address, opts
err := t.loopy.run()
if err != nil {
if logger.V(logLevel) {
logger.Errorf("transport: loopyWriter.run returning. Err: %v", err)
logger.Errorf("transport: loopyWriter.run returned. Err: %v. Closing connection.", err)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

loopyWriter returned? loopyWriter exited? loopyWriter finished? I went with third option. Line switched to: logger.Errorf("transport: loopyWriter exited. Closing connection. Err: %v", err).

t.Close(connectionErrorf(true, nil, "keepalive ping failed to receive ACK within timeout"))
err := connectionErrorf(true, nil, "keepalive ping failed to receive ACK within timeout")
if logger.V(logLevel) {
logger.Errorf("transport: closing transport due to error in keepalive handling: %v.", err)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry I was supposed to delete this, I switched all these logs to log in Close with the data in the error passed in at beginning of Close for consolidation/consistency/debugability.

clientconn.go Outdated
@@ -1274,6 +1274,9 @@ func (ac *addrConn) createTransport(addr resolver.Address, copts transport.Conne

newTr, err := transport.NewClientTransport(connectCtx, ac.cc.ctx, addr, copts, onGoAway, onClose)
if err != nil {
if logger.V(2) {
logger.Errorf("NewClientTransport failed to connect to %s. Err: %v", addr, err)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I like human readable for the high level concept rather than the specific implementation detail (and that exported func just forwards call to internal method). Switched to second option. Thanks for suggestion.

@@ -951,6 +957,9 @@ func (t *http2Client) Close(err error) {
t.mu.Unlock()
return
}
if logger.V(logLevel) {
logger.Infof("closing transport due to err: %v, will close the connection.", err)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to: logger.Infof("Closing transport, will close the connection. Err: %v", err). See above comment for rationale of "Err:".

internal/transport/http2_server.go Outdated Show resolved Hide resolved
internal/transport/http2_server.go Outdated Show resolved Hide resolved
server.go Outdated Show resolved Hide resolved
@@ -416,6 +416,9 @@ func (c *controlBuffer) get(block bool) (interface{}, error) {
select {
case <-c.ch:
case <-c.done:
if logger.V(logLevel) {
logger.Infof("transport: trigger loopy to exit (with a subsequent conn closure) because transport's context expired.")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this case, ErrConnClosing was being returned to the run() goroutine of loopy, which would trigger the exit with an info log. However, I am changing this to plumb data into ErrConnClosing (or a certain type), and will log the error message plumbed in (i.e. I will put the message from this into the error) because ErrConnClosing has nothing in it currently. See https://github.com/grpc/grpc-go/blob/master/internal/transport/controlbuf.go#L536 for how it currently works on master.

… switched some errors from ErrConnClosing to errors with information inside
@zasweq zasweq assigned easwars and dfawley and unassigned zasweq Dec 7, 2022
@easwars
Copy link
Contributor

easwars commented Dec 7, 2022

Do we need a release note for logs?

clientconn.go Outdated Show resolved Hide resolved
internal/transport/http2_server.go Outdated Show resolved Hide resolved
internal/transport/http2_client.go Outdated Show resolved Hide resolved
@@ -951,6 +954,9 @@ func (t *http2Client) Close(err error) {
t.mu.Unlock()
return
}
if logger.V(logLevel) {
logger.Infof("Closing transport, will close the connection. Err: %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would end up as something like:
Closing transport, will close the connection. Err: finished serving streams for the server transport.

Isn't this enough:
Closing transport: finished serving streams for the server transport

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole point of this PR was to see when conn closes. Are we trying to say that since transport Close closes the connection the reader of the log should know that? I feel like it's fine to mention that it closes Conn, just like we log conn Closure when loopy returns (we don't just lump this conn.Close() operation in the log that loopyWriter returns).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Closing transport, will close the connection: %v

@@ -333,7 +334,7 @@ func NewServerTransport(conn net.Conn, config *ServerConfig) (_ ServerTransport,
t.loopy.ssGoAwayHandler = t.outgoingGoAwayHandler
if err := t.loopy.run(); err != nil {
if logger.V(logLevel) {
logger.Errorf("transport: loopyWriter.run returning. Err: %v", err)
logger.Infof("transport: loopyWriter exited. Closing connection. Err: %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we use the term transport to mean a connection. If you want to distinguish between the transport type and the underlying net.Conn, maybe you can be a more explicit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talked about these in chat. These are logically separate things (the transport and conn) and it's important to have a distinction, because this closes the connection, not the transport. I think Doug had suggestion to switch this to net.Conn or conn so maybe I'll chose one of those.

t.mu.Lock()
if t.state == closing {
t.mu.Unlock()
return
}
if logger.V(logLevel) {
logger.Infof("Closing transport, will close the connection. Err: %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here about the log message being too verbose. Concise messages (without compromising on the information to be conveyed) will help greatly during debugging. So, please do consider how these logs will show up. Run a test or two and see how these show up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logger.Infof("Closing transport, will close the connection: %v", err). See http2_client comment for my reasoning.

@easwars easwars removed their assignment Dec 8, 2022
@easwars easwars assigned zasweq and unassigned dfawley Dec 8, 2022
clientconn.go Outdated Show resolved Hide resolved
internal/transport/controlbuf.go Outdated Show resolved Hide resolved
internal/transport/controlbuf.go Outdated Show resolved Hide resolved
internal/transport/controlbuf.go Outdated Show resolved Hide resolved
internal/transport/handler_server.go Outdated Show resolved Hide resolved
internal/transport/http2_server.go Outdated Show resolved Hide resolved
internal/transport/http2_server.go Outdated Show resolved Hide resolved
internal/transport/transport.go Outdated Show resolved Hide resolved
server.go Outdated Show resolved Hide resolved
server.go Outdated Show resolved Hide resolved
@dfawley
Copy link
Member

dfawley commented Dec 8, 2022

Please also fix the PR title

Copy link
Contributor Author

@zasweq zasweq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comments everyone :D!

internal/transport/controlbuf.go Outdated Show resolved Hide resolved
clientconn.go Outdated Show resolved Hide resolved
clientconn.go Outdated Show resolved Hide resolved
internal/transport/http2_client.go Outdated Show resolved Hide resolved
internal/transport/http2_client.go Outdated Show resolved Hide resolved
internal/transport/handler_server.go Outdated Show resolved Hide resolved
internal/transport/http2_server.go Outdated Show resolved Hide resolved
internal/transport/http2_server.go Outdated Show resolved Hide resolved
internal/transport/http2_server.go Outdated Show resolved Hide resolved
internal/transport/http2_server.go Outdated Show resolved Hide resolved
@zasweq zasweq changed the title Added logs throughout the transport layer for connection and transport close Added logs throughout for connection and transport close reasons Dec 8, 2022
@zasweq zasweq changed the title Added logs throughout for connection and transport close reasons Added logs for reasons causing connection and transport close Dec 8, 2022
@zasweq zasweq assigned easwars and dfawley and unassigned zasweq Dec 8, 2022
internal/transport/controlbuf.go Outdated Show resolved Hide resolved
internal/transport/controlbuf.go Outdated Show resolved Hide resolved
internal/transport/handler_server.go Outdated Show resolved Hide resolved
internal/transport/http2_client.go Outdated Show resolved Hide resolved
internal/transport/http2_client.go Outdated Show resolved Hide resolved
server.go Outdated Show resolved Hide resolved
internal/transport/http2_client.go Outdated Show resolved Hide resolved
internal/transport/http2_client.go Outdated Show resolved Hide resolved
internal/transport/http2_server.go Outdated Show resolved Hide resolved
server.go Outdated Show resolved Hide resolved
Copy link
Contributor Author

@zasweq zasweq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comments!

internal/transport/controlbuf.go Outdated Show resolved Hide resolved
internal/transport/http2_client.go Outdated Show resolved Hide resolved
server.go Outdated Show resolved Hide resolved
internal/transport/http2_client.go Outdated Show resolved Hide resolved
internal/transport/controlbuf.go Outdated Show resolved Hide resolved
internal/transport/http2_client.go Outdated Show resolved Hide resolved
internal/transport/handler_server.go Outdated Show resolved Hide resolved
internal/transport/controlbuf.go Outdated Show resolved Hide resolved
internal/transport/http2_client.go Outdated Show resolved Hide resolved
internal/transport/http2_server.go Outdated Show resolved Hide resolved
@zasweq zasweq requested review from dfawley and easwars and removed request for dfawley December 9, 2022 00:21
@easwars easwars assigned zasweq and unassigned easwars Dec 9, 2022
@zasweq zasweq merged commit a9709c3 into grpc:master Dec 9, 2022
1 check passed
@@ -236,7 +239,7 @@ func (ht *serverHandlerTransport) WriteStatus(s *Stream, st *status.Status) erro
})
}
}
ht.Close()
ht.Close(errors.New("finished writing status"))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now spamming our logs, is it actually useful?

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 13, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants