Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding logging to tchannel handler error cases #1561

Merged
merged 2 commits into from
Sep 19, 2018

Conversation

r-hang
Copy link
Contributor

@r-hang r-hang commented Sep 11, 2018

Zap logging was added to understand when handler calls failed, when
SendSystemError failed, and when responseWriter failed to close.
To test these changes, I added a responseWriter interface factory to
the struct of the handler so I could control reponseWriter during
testing. A recorder interface was also added to test failure cases
n responseRecorder.

@CLAassistant
Copy link

CLAassistant commented Sep 11, 2018

CLA assistant check
All committers have signed the CLA.

@r-hang r-hang requested a review from twilly September 11, 2018 19:33
@@ -589,5 +589,4 @@ func TestMiddlewareFailureSnapshot(t *testing.T) {
},
},
}
assert.Equal(t, want, snap, "Unexpected snapshot of metrics.")
}
assert.Equal(t, want, snap, "Unexpected snapshot of metrics."}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like you have two commits changing things back and forth. That's pretty confusing, so squash the changes into one or more logical, independently correct commits. For example, one commit that makes the change to responseWriter interface with new tests and another with the logging feature.


if t.logger == nil {
t.logger = zap.NewNop()
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

t is available at a higher scope and you're banging on t.logger in a loop when it's invariant. If you intend to have a side effect in start then it should be at the top of start.

Such a side effect is a bit confusing since newChannelTransport already sets a nop logger. This block doesn't seem necessary.

_ = call.Response().SendSystemError(getSystemError(err))
h.logger.Error("tchannel callHandler error", zap.Error(err))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"callHandler error" is a bit generic. This could be more descriptive, like "tchannel send system error."

response inboundCallResponse
isApplicationError bool
headerCase headerCase
type tchannelResponseWriter struct {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stuttering: tchannel.tchannelResponseWriter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree and was thinking about this for a while. I felt that newResponseWriter was the best name for the interface (since it was the most generic) and this was the best name I thought of for the original responseWriter struct - any suggestions for a better name?

Buffer *bufferpool.Buffer
Response inboundCallResponse
ApplicationError bool
HeaderCase headerCase
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to change this struct at all?

Copy link
Contributor Author

@r-hang r-hang Sep 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe so. From my understanding, the fields of handlerWriter to be exposed as public in order to allow faultyResponseWriter struct{ handlerWriter } to embed its methods.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to modify this since the use is in faultyHandlerWriter and its methods don't need access to these fields.

https://play.golang.org/p/qLbk_7Kwi1_U

@r-hang r-hang force-pushed the rhang/adding-logging-to-tchannel-handler branch 4 times, most recently from 2f69a2b to dfaee48 Compare September 12, 2018 22:27
mockCtrl.Finish()
}
}

func TestResponseWriter(t *testing.T) {
tests := []struct {
type test struct {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to make a local type for this. The previous form is conical.

@@ -510,10 +602,12 @@ func TestResponseWriterEmptyBodyHeaders(t *testing.T) {
}

func TestGetSystemError(t *testing.T) {
tests := []struct {
type test struct {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.


// faultyHandlerWriter mocks a responseWriter.Close() error to test logging behaviour
// inside tchannel.Handle.
type faultyHandlerWriter struct{ handlerWriter }
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could do this:

type faultyHandlerWriter struct{}
func newFaultyHandlerWriter(response inboundCallResponse, format tchannel.Format, headerCase headerCase) responseWriter {
    return &faultyHandlerWriter{}
}

Then you don't need to change the fields in handlerWriter and record the options passed in that you don't use.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

_ = call.Response().SendSystemError(getSystemError(err))
h.logger.Error("tchannel transport handler request failed", zap.Error(err))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually want to log when SendSystemError has an error. That is, our client's handler returned an error, it's a system error hence we should pass it along, but sending that error had an error (it's an error-ception)! See uber/tchannel-go#716

_ = call.Response().SendSystemError(getSystemError(err))
h.logger.Error("tchannel responseWriter failed to close", zap.Error(err))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing here. :-)

@r-hang r-hang force-pushed the rhang/adding-logging-to-tchannel-handler branch 5 times, most recently from 27a3b42 to 6c17f84 Compare September 13, 2018 23:21
@r-hang
Copy link
Contributor Author

r-hang commented Sep 13, 2018

@twilly I made all of the round 2 changes you requested! I made some other cleanup such as clarifying the names of certain variables in the test cases and cleaning up the error messages, but i felt those were too minor to include in a commit message.

@r-hang r-hang force-pushed the rhang/adding-logging-to-tchannel-handler branch from 6c17f84 to 54d7ed8 Compare September 13, 2018 23:33
_ = call.Response().SendSystemError(getSystemError(err))
if err != nil && !responseWriter.IsApplicationError() {
sendErr := call.Response().SendSystemError(getSystemError(err))
if sendErr != nil {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typical go uses this pattern for errors:

if err := foo(); err != nil {
  do(err)
}

You also don't need to call it sendErr since the scope would be limited to inside the if block.

// TODO: log error
_ = call.Response().SendSystemError(getSystemError(err))
sendErr := call.Response().SendSystemError(getSystemError(err))
if sendErr != nil {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto, of course.

Buffer *bufferpool.Buffer
Response inboundCallResponse
ApplicationError bool
HeaderCase headerCase
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to modify this since the use is in faultyHandlerWriter and its methods don't need access to these fields.

https://play.golang.org/p/qLbk_7Kwi1_U

if sendErr != nil {
h.logger.Error("SendSystemError failed", zap.Error(sendErr))
}
h.logger.Error("tchannel transport handler request failed", zap.Error(err))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: A slightly cleaner message would be "SendSystemError failed" and "handler failed" respectfully. Zap's loggers can add a name to its logger. Doing so would give the user more context without repeating yourself.

@r-hang r-hang force-pushed the rhang/adding-logging-to-tchannel-handler branch from 54d7ed8 to 3f1e836 Compare September 17, 2018 06:46
Copy link

@twilly twilly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good. I left a few comments that you can address, but otherwise +1 here. Let's get a second review.

@@ -139,7 +139,7 @@ func (t *ChannelTransport) start() error {
for s := range services {
sc := t.ch.GetSubChannel(s)
existing := sc.GetHandlers()
sc.SetHandler(handler{existing: existing, router: t.router, tracer: t.tracer})
sc.SetHandler(handler{existing: existing, router: t.router, tracer: t.tracer, logger: t.logger, newResponseWriter: newHandlerWriter})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe it's better to have the response writer factory in ChannelTransport to copy into handlers, like with the router, tracer, and logger.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved in the second commit in this PR

format tchannel.Format
headers []byte
wantHeaders map[string]string
faultyFuncs map[string]bool
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using a map of flags, just set your writer and recorder fields with the ones you want. Then your test instantiation is simpler and looks like this:

newResponseWriter = tt.newResponseWriter
respRecorder = tt.newResponseRecorder()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 will change

headers []byte
wantHeaders map[string]string
faultyFuncs map[string]bool
wantNumLogs int
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also more complicated than necessary. You leverage zero values to know if you should validate log messages. Then you don't need to count messages.

Copy link
Contributor Author

@r-hang r-hang Sep 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@twilly I also use this field to count that the expected number of logs are written into the logger for the variety of error cases. Were you suggesting that I should turn this field into a boolean?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite. For example, wantLogMessage is a string. If it's not initialized, then it'll be the zero value, which the empty string. You can leverage this behavior by testing like this:

if tt.wantLogMessage != "" {
  assert.Equals(t, logs.FilterMessage(tt.wantLogMessage).Len(), 1)
}

If you have multiple messages, you can check with a slice of message strings or LoggedEntry, followed by iterating and comparing.

There's a little test philosophy going on here too. When testing log emissions I generally don't like being too strict, like expecting an exact amount and order of messages. Such a test makes mutating code difficult when we're usually expecting an approximate and fuzzy existence. Usually this is because logs are a loose and flexible interface rather than a rigidly necessary one for a functioning system.

Copy link
Contributor

@kriskowal kriskowal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s not strictly necessary to add any features the the deprecated ChannelTransport; we’re content to let it die on the vine. The Transport is the authoritative version, and the only version that can be constructed from configuration.

👍 to @twilly’s nits.

@r-hang r-hang force-pushed the rhang/adding-logging-to-tchannel-handler branch 4 times, most recently from 6f63e0d to cdf9cf7 Compare September 18, 2018 18:28
@abhinav abhinav added the CI label Sep 18, 2018
@abhinav abhinav added the CI label Sep 18, 2018
@r-hang r-hang force-pushed the rhang/adding-logging-to-tchannel-handler branch from cdf9cf7 to d70430a Compare September 18, 2018 18:44
@r-hang r-hang force-pushed the rhang/adding-logging-to-tchannel-handler branch from d70430a to 74447e9 Compare September 18, 2018 18:57
@codecov
Copy link

codecov bot commented Sep 18, 2018

Codecov Report

Merging #1561 into dev will increase coverage by 0.01%.
The diff coverage is 89.28%.

Impacted file tree graph

@@            Coverage Diff             @@
##              dev    #1561      +/-   ##
==========================================
+ Coverage   90.24%   90.25%   +0.01%     
==========================================
  Files         223      223              
  Lines       11160    11169       +9     
==========================================
+ Hits        10071    10081      +10     
  Misses        753      753              
+ Partials      336      335       -1
Impacted Files Coverage Δ
transport/tchannel/channel_transport.go 85.93% <100%> (+0.22%) ⬆️
transport/tchannel/transport.go 85.58% <100%> (+0.26%) ⬆️
transport/tchannel/handler.go 84.67% <85.71%> (+0.7%) ⬆️
transport/tchannel/peer.go 93.84% <0%> (+1.53%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8184eb8...2f57705. Read the comment docs.

@codecov
Copy link

codecov bot commented Sep 18, 2018

Codecov Report

Merging #1561 into dev will increase coverage by 0.01%.
The diff coverage is 89.28%.

Impacted file tree graph

@@            Coverage Diff             @@
##              dev    #1561      +/-   ##
==========================================
+ Coverage   90.24%   90.25%   +0.01%     
==========================================
  Files         223      223              
  Lines       11160    11169       +9     
==========================================
+ Hits        10071    10081      +10     
  Misses        753      753              
+ Partials      336      335       -1
Impacted Files Coverage Δ
transport/tchannel/channel_transport.go 85.93% <100%> (+0.22%) ⬆️
transport/tchannel/transport.go 85.58% <100%> (+0.26%) ⬆️
transport/tchannel/handler.go 84.67% <85.71%> (+0.7%) ⬆️
transport/grpc/peer.go 91.26% <0%> (-1.95%) ⬇️
transport/tchannel/peer.go 93.84% <0%> (+1.53%) ⬆️
dispatcher_startup.go 92.85% <0%> (+1.78%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8184eb8...2f57705. Read the comment docs.

1 similar comment
@codecov
Copy link

codecov bot commented Sep 18, 2018

Codecov Report

Merging #1561 into dev will increase coverage by 0.01%.
The diff coverage is 89.28%.

Impacted file tree graph

@@            Coverage Diff             @@
##              dev    #1561      +/-   ##
==========================================
+ Coverage   90.24%   90.25%   +0.01%     
==========================================
  Files         223      223              
  Lines       11160    11169       +9     
==========================================
+ Hits        10071    10081      +10     
  Misses        753      753              
+ Partials      336      335       -1
Impacted Files Coverage Δ
transport/tchannel/channel_transport.go 85.93% <100%> (+0.22%) ⬆️
transport/tchannel/transport.go 85.58% <100%> (+0.26%) ⬆️
transport/tchannel/handler.go 84.67% <85.71%> (+0.7%) ⬆️
transport/grpc/peer.go 91.26% <0%> (-1.95%) ⬇️
transport/tchannel/peer.go 93.84% <0%> (+1.53%) ⬆️
dispatcher_startup.go 92.85% <0%> (+1.78%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8184eb8...2f57705. Read the comment docs.

@codecov
Copy link

codecov bot commented Sep 18, 2018

Codecov Report

Merging #1561 into dev will increase coverage by 0.01%.
The diff coverage is 89.28%.

Impacted file tree graph

@@            Coverage Diff             @@
##              dev    #1561      +/-   ##
==========================================
+ Coverage   90.24%   90.25%   +0.01%     
==========================================
  Files         223      223              
  Lines       11160    11169       +9     
==========================================
+ Hits        10071    10081      +10     
  Misses        753      753              
+ Partials      336      335       -1
Impacted Files Coverage Δ
transport/tchannel/channel_transport.go 85.93% <100%> (+0.22%) ⬆️
transport/tchannel/transport.go 85.58% <100%> (+0.26%) ⬆️
transport/tchannel/handler.go 84.67% <85.71%> (+0.7%) ⬆️
transport/tchannel/peer.go 93.84% <0%> (+1.53%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3961924...afefba8. Read the comment docs.

Zap logging was added to understand when handler calls failed, when
SendSystemError failed, and when responseWriter failed to close.
To test these changes, I added a responseWriter interface factory to
the struct of the handler so I could control reponseWriter during
testing. A recorder interface was also added to test failure cases
in responseRecorder.
NewResponseWriter was moved to ChannelTransport and Transport  to keep
tchannel handler's constructor parameters consisent with the pattern in
which these parameters come from the fields of ChannelTransport and
ChannelTransport.
@r-hang r-hang force-pushed the rhang/adding-logging-to-tchannel-handler branch from 2f57705 to d58d50b Compare September 19, 2018 16:56
@twilly twilly changed the title rhang/adding-logging-to-tchannel-handler Adding logging to tchannel handler error cases Sep 19, 2018
@r-hang r-hang force-pushed the rhang/adding-logging-to-tchannel-handler branch from d58d50b to afefba8 Compare September 19, 2018 17:37
@r-hang r-hang merged commit 360ecaa into yarpc:dev Sep 19, 2018
@r-hang r-hang deleted the rhang/adding-logging-to-tchannel-handler branch September 19, 2018 17:48
peats-bond pushed a commit that referenced this pull request May 15, 2020
This drops the "handler failed" log in the TChannel transport. This
log was unnecessarily added when we were increasing observability
around TChannel internal errors in #1561.

The context error override in #1930 ensures that makes this log
redundant as richer information exists in observability logs,
including latency and request attributes.

Furthermore, we've had issues with this log since the latency is
included in the message and makes aggregation extremely difficult.

Ref T5802517
peats-bond pushed a commit that referenced this pull request May 15, 2020
This drops the "handler failed" log in the TChannel transport. This
log was unnecessarily added when we were increasing observability
around TChannel internal errors in #1561.

The context error override in #1930 ensures that makes this log
redundant as richer information exists in observability logs,
including latency and request attributes.

Furthermore, we've had issues with this log since the latency is
included in the message and makes aggregation extremely difficult.

Ref T5802517
peats-bond pushed a commit that referenced this pull request May 18, 2020
This drops the "handler failed" log in the TChannel transport. This
log was unnecessarily added when we were increasing observability
around TChannel internal errors in #1561.

The context error override in #1930 ensures that makes this log
redundant as richer information exists in observability logs,
including latency and request attributes.

Furthermore, we've had issues with this log since the latency is
included in the message and makes aggregation extremely difficult.

Ref T5802517
peats-bond pushed a commit that referenced this pull request May 18, 2020
This fixes a minor logging annoyance. Some users have reported large
amounts of `SendSystemError failed` and `responseWriter failed to
close` logs. This was introduced in #1561 to help improve
observability of TChannel system failures.

However, if a client times out, we'll still log the failures; these
logs are are uninformative/unactionable to users as they are expected.

With this diff, we only log TChannel system failures when the client
is still waiting for a response, ie has yet to time out.

Ref T5305225#107015299, supersedes #1814
peats-bond pushed a commit that referenced this pull request May 18, 2020
This fixes a minor logging annoyance. Some users have reported large
amounts of `SendSystemError failed` and `responseWriter failed to
close` logs. This was introduced in #1561 to help improve
observability of TChannel system failures.

However, if a client times out, we'll still log the failures; these
logs are are uninformative/unactionable to users as they are expected.

With this diff, we only log TChannel system failures when the client
is still waiting for a response, ie has yet to time out.

Ref T5305225#107015299, supersedes #1814
peats-bond pushed a commit that referenced this pull request May 19, 2020
This fixes a minor logging annoyance. Some users have reported large
amounts of `SendSystemError failed` and `responseWriter failed to
close` logs. This was introduced in #1561 to help improve
observability of TChannel system failures.

However, if a client times out, we'll still log the failures; these
logs are are uninformative/unactionable to users as they are expected.

With this diff, we only log TChannel system failures when the client
is still waiting for a response, ie has yet to time out.

Ref T5305225#107015299, supersedes #1814
peats-bond pushed a commit that referenced this pull request May 19, 2020
…ut (#1933)

This fixes a minor logging annoyance. Some users have reported large
amounts of `SendSystemError failed` and `responseWriter failed to
close` logs. This was introduced in #1561 to help improve
observability of TChannel system failures.

However, if a client times out, we'll still log the failures; these
logs are are uninformative/unactionable to users as they are expected.

With this diff, we only log TChannel system failures when the client
is still waiting for a response, ie has yet to time out.

Ref T5305225#107015299, supersedes #1814
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

6 participants