-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding logging to tchannel handler error cases #1561
Adding logging to tchannel handler error cases #1561
Conversation
@@ -589,5 +589,4 @@ func TestMiddlewareFailureSnapshot(t *testing.T) { | |||
}, | |||
}, | |||
} | |||
assert.Equal(t, want, snap, "Unexpected snapshot of metrics.") | |||
} | |||
assert.Equal(t, want, snap, "Unexpected snapshot of metrics."} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like you have two commits changing things back and forth. That's pretty confusing, so squash the changes into one or more logical, independently correct commits. For example, one commit that makes the change to responseWriter
interface with new tests and another with the logging feature.
|
||
if t.logger == nil { | ||
t.logger = zap.NewNop() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
t
is available at a higher scope and you're banging on t.logger
in a loop when it's invariant. If you intend to have a side effect in start
then it should be at the top of start
.
Such a side effect is a bit confusing since newChannelTransport
already sets a nop logger. This block doesn't seem necessary.
transport/tchannel/handler.go
Outdated
_ = call.Response().SendSystemError(getSystemError(err)) | ||
h.logger.Error("tchannel callHandler error", zap.Error(err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"callHandler error" is a bit generic. This could be more descriptive, like "tchannel send system error."
transport/tchannel/handler.go
Outdated
response inboundCallResponse | ||
isApplicationError bool | ||
headerCase headerCase | ||
type tchannelResponseWriter struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stuttering: tchannel.tchannelResponseWriter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree and was thinking about this for a while. I felt that newResponseWriter was the best name for the interface (since it was the most generic) and this was the best name I thought of for the original responseWriter struct - any suggestions for a better name?
transport/tchannel/handler.go
Outdated
Buffer *bufferpool.Buffer | ||
Response inboundCallResponse | ||
ApplicationError bool | ||
HeaderCase headerCase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to change this struct at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe so. From my understanding, the fields of handlerWriter to be exposed as public in order to allow faultyResponseWriter struct{ handlerWriter } to embed its methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to modify this since the use is in faultyHandlerWriter
and its methods don't need access to these fields.
2f69a2b
to
dfaee48
Compare
transport/tchannel/handler_test.go
Outdated
mockCtrl.Finish() | ||
} | ||
} | ||
|
||
func TestResponseWriter(t *testing.T) { | ||
tests := []struct { | ||
type test struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to make a local type for this. The previous form is conical.
transport/tchannel/handler_test.go
Outdated
@@ -510,10 +602,12 @@ func TestResponseWriterEmptyBodyHeaders(t *testing.T) { | |||
} | |||
|
|||
func TestGetSystemError(t *testing.T) { | |||
tests := []struct { | |||
type test struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
|
||
// faultyHandlerWriter mocks a responseWriter.Close() error to test logging behaviour | ||
// inside tchannel.Handle. | ||
type faultyHandlerWriter struct{ handlerWriter } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could do this:
type faultyHandlerWriter struct{}
func newFaultyHandlerWriter(response inboundCallResponse, format tchannel.Format, headerCase headerCase) responseWriter {
return &faultyHandlerWriter{}
}
Then you don't need to change the fields in handlerWriter
and record the options passed in that you don't use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
transport/tchannel/handler.go
Outdated
_ = call.Response().SendSystemError(getSystemError(err)) | ||
h.logger.Error("tchannel transport handler request failed", zap.Error(err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We actually want to log when SendSystemError
has an error. That is, our client's handler returned an error, it's a system error hence we should pass it along, but sending that error had an error (it's an error-ception)! See uber/tchannel-go#716
transport/tchannel/handler.go
Outdated
_ = call.Response().SendSystemError(getSystemError(err)) | ||
h.logger.Error("tchannel responseWriter failed to close", zap.Error(err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing here. :-)
27a3b42
to
6c17f84
Compare
@twilly I made all of the round 2 changes you requested! I made some other cleanup such as clarifying the names of certain variables in the test cases and cleaning up the error messages, but i felt those were too minor to include in a commit message. |
6c17f84
to
54d7ed8
Compare
transport/tchannel/handler.go
Outdated
_ = call.Response().SendSystemError(getSystemError(err)) | ||
if err != nil && !responseWriter.IsApplicationError() { | ||
sendErr := call.Response().SendSystemError(getSystemError(err)) | ||
if sendErr != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typical go uses this pattern for errors:
if err := foo(); err != nil {
do(err)
}
You also don't need to call it sendErr
since the scope would be limited to inside the if block.
transport/tchannel/handler.go
Outdated
// TODO: log error | ||
_ = call.Response().SendSystemError(getSystemError(err)) | ||
sendErr := call.Response().SendSystemError(getSystemError(err)) | ||
if sendErr != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto, of course.
transport/tchannel/handler.go
Outdated
Buffer *bufferpool.Buffer | ||
Response inboundCallResponse | ||
ApplicationError bool | ||
HeaderCase headerCase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to modify this since the use is in faultyHandlerWriter
and its methods don't need access to these fields.
transport/tchannel/handler.go
Outdated
if sendErr != nil { | ||
h.logger.Error("SendSystemError failed", zap.Error(sendErr)) | ||
} | ||
h.logger.Error("tchannel transport handler request failed", zap.Error(err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: A slightly cleaner message would be "SendSystemError failed" and "handler failed" respectfully. Zap's loggers can add a name to its logger. Doing so would give the user more context without repeating yourself.
54d7ed8
to
3f1e836
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks pretty good. I left a few comments that you can address, but otherwise +1 here. Let's get a second review.
@@ -139,7 +139,7 @@ func (t *ChannelTransport) start() error { | |||
for s := range services { | |||
sc := t.ch.GetSubChannel(s) | |||
existing := sc.GetHandlers() | |||
sc.SetHandler(handler{existing: existing, router: t.router, tracer: t.tracer}) | |||
sc.SetHandler(handler{existing: existing, router: t.router, tracer: t.tracer, logger: t.logger, newResponseWriter: newHandlerWriter}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe it's better to have the response writer factory in ChannelTransport
to copy into handlers, like with the router, tracer, and logger.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved in the second commit in this PR
transport/tchannel/handler_test.go
Outdated
format tchannel.Format | ||
headers []byte | ||
wantHeaders map[string]string | ||
faultyFuncs map[string]bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using a map of flags, just set your writer and recorder fields with the ones you want. Then your test instantiation is simpler and looks like this:
newResponseWriter = tt.newResponseWriter
respRecorder = tt.newResponseRecorder()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 will change
transport/tchannel/handler_test.go
Outdated
headers []byte | ||
wantHeaders map[string]string | ||
faultyFuncs map[string]bool | ||
wantNumLogs int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also more complicated than necessary. You leverage zero values to know if you should validate log messages. Then you don't need to count messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@twilly I also use this field to count that the expected number of logs are written into the logger for the variety of error cases. Were you suggesting that I should turn this field into a boolean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite. For example, wantLogMessage
is a string. If it's not initialized, then it'll be the zero value, which the empty string. You can leverage this behavior by testing like this:
if tt.wantLogMessage != "" {
assert.Equals(t, logs.FilterMessage(tt.wantLogMessage).Len(), 1)
}
If you have multiple messages, you can check with a slice of message strings or LoggedEntry, followed by iterating and comparing.
There's a little test philosophy going on here too. When testing log emissions I generally don't like being too strict, like expecting an exact amount and order of messages. Such a test makes mutating code difficult when we're usually expecting an approximate and fuzzy existence. Usually this is because logs are a loose and flexible interface rather than a rigidly necessary one for a functioning system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It’s not strictly necessary to add any features the the deprecated ChannelTransport; we’re content to let it die on the vine. The Transport is the authoritative version, and the only version that can be constructed from configuration.
👍 to @twilly’s nits.
6f63e0d
to
cdf9cf7
Compare
cdf9cf7
to
d70430a
Compare
d70430a
to
74447e9
Compare
Codecov Report
@@ Coverage Diff @@
## dev #1561 +/- ##
==========================================
+ Coverage 90.24% 90.25% +0.01%
==========================================
Files 223 223
Lines 11160 11169 +9
==========================================
+ Hits 10071 10081 +10
Misses 753 753
+ Partials 336 335 -1
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## dev #1561 +/- ##
==========================================
+ Coverage 90.24% 90.25% +0.01%
==========================================
Files 223 223
Lines 11160 11169 +9
==========================================
+ Hits 10071 10081 +10
Misses 753 753
+ Partials 336 335 -1
Continue to review full report at Codecov.
|
1 similar comment
Codecov Report
@@ Coverage Diff @@
## dev #1561 +/- ##
==========================================
+ Coverage 90.24% 90.25% +0.01%
==========================================
Files 223 223
Lines 11160 11169 +9
==========================================
+ Hits 10071 10081 +10
Misses 753 753
+ Partials 336 335 -1
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## dev #1561 +/- ##
==========================================
+ Coverage 90.24% 90.25% +0.01%
==========================================
Files 223 223
Lines 11160 11169 +9
==========================================
+ Hits 10071 10081 +10
Misses 753 753
+ Partials 336 335 -1
Continue to review full report at Codecov.
|
Zap logging was added to understand when handler calls failed, when SendSystemError failed, and when responseWriter failed to close. To test these changes, I added a responseWriter interface factory to the struct of the handler so I could control reponseWriter during testing. A recorder interface was also added to test failure cases in responseRecorder.
NewResponseWriter was moved to ChannelTransport and Transport to keep tchannel handler's constructor parameters consisent with the pattern in which these parameters come from the fields of ChannelTransport and ChannelTransport.
2f57705
to
d58d50b
Compare
d58d50b
to
afefba8
Compare
This drops the "handler failed" log in the TChannel transport. This log was unnecessarily added when we were increasing observability around TChannel internal errors in #1561. The context error override in #1930 ensures that makes this log redundant as richer information exists in observability logs, including latency and request attributes. Furthermore, we've had issues with this log since the latency is included in the message and makes aggregation extremely difficult. Ref T5802517
This drops the "handler failed" log in the TChannel transport. This log was unnecessarily added when we were increasing observability around TChannel internal errors in #1561. The context error override in #1930 ensures that makes this log redundant as richer information exists in observability logs, including latency and request attributes. Furthermore, we've had issues with this log since the latency is included in the message and makes aggregation extremely difficult. Ref T5802517
This drops the "handler failed" log in the TChannel transport. This log was unnecessarily added when we were increasing observability around TChannel internal errors in #1561. The context error override in #1930 ensures that makes this log redundant as richer information exists in observability logs, including latency and request attributes. Furthermore, we've had issues with this log since the latency is included in the message and makes aggregation extremely difficult. Ref T5802517
This fixes a minor logging annoyance. Some users have reported large amounts of `SendSystemError failed` and `responseWriter failed to close` logs. This was introduced in #1561 to help improve observability of TChannel system failures. However, if a client times out, we'll still log the failures; these logs are are uninformative/unactionable to users as they are expected. With this diff, we only log TChannel system failures when the client is still waiting for a response, ie has yet to time out. Ref T5305225#107015299, supersedes #1814
This fixes a minor logging annoyance. Some users have reported large amounts of `SendSystemError failed` and `responseWriter failed to close` logs. This was introduced in #1561 to help improve observability of TChannel system failures. However, if a client times out, we'll still log the failures; these logs are are uninformative/unactionable to users as they are expected. With this diff, we only log TChannel system failures when the client is still waiting for a response, ie has yet to time out. Ref T5305225#107015299, supersedes #1814
This fixes a minor logging annoyance. Some users have reported large amounts of `SendSystemError failed` and `responseWriter failed to close` logs. This was introduced in #1561 to help improve observability of TChannel system failures. However, if a client times out, we'll still log the failures; these logs are are uninformative/unactionable to users as they are expected. With this diff, we only log TChannel system failures when the client is still waiting for a response, ie has yet to time out. Ref T5305225#107015299, supersedes #1814
…ut (#1933) This fixes a minor logging annoyance. Some users have reported large amounts of `SendSystemError failed` and `responseWriter failed to close` logs. This was introduced in #1561 to help improve observability of TChannel system failures. However, if a client times out, we'll still log the failures; these logs are are uninformative/unactionable to users as they are expected. With this diff, we only log TChannel system failures when the client is still waiting for a response, ie has yet to time out. Ref T5305225#107015299, supersedes #1814
Zap logging was added to understand when handler calls failed, when
SendSystemError failed, and when responseWriter failed to close.
To test these changes, I added a responseWriter interface factory to
the struct of the handler so I could control reponseWriter during
testing. A recorder interface was also added to test failure cases
n responseRecorder.