-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
observability: Override responses with Context errors #1930
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Codecov Report
@@ Coverage Diff @@
## dev #1930 +/- ##
==========================================
+ Coverage 85.07% 87.07% +2.00%
==========================================
Files 240 249 +9
Lines 12512 13133 +621
==========================================
+ Hits 10645 11436 +791
+ Misses 1470 1320 -150
+ Partials 397 377 -20
Continue to review full report at Codecov.
|
peats-bond
pushed a commit
that referenced
this pull request
May 15, 2020
This drops the "handler failed" log in the TChannel transport. This log was unnecessarily added when we were increasing observability around TChannel internal errors in #1561. The context error override in #1930 ensures that makes this log redundant as richer information exists in observability logs, including latency and request attributes. Furthermore, we've had issues with this log since the latency is included in the message and makes aggregation extremely difficult. Ref T5802517
peats-bond
pushed a commit
that referenced
this pull request
May 15, 2020
This drops the "handler failed" log in the TChannel transport. This log was unnecessarily added when we were increasing observability around TChannel internal errors in #1561. The context error override in #1930 ensures that makes this log redundant as richer information exists in observability logs, including latency and request attributes. Furthermore, we've had issues with this log since the latency is included in the message and makes aggregation extremely difficult. Ref T5802517
2 tasks
peats-bond
pushed a commit
that referenced
this pull request
May 18, 2020
This drops the "handler failed" log in the TChannel transport. This log was unnecessarily added when we were increasing observability around TChannel internal errors in #1561. The context error override in #1930 ensures that makes this log redundant as richer information exists in observability logs, including latency and request attributes. Furthermore, we've had issues with this log since the latency is included in the message and makes aggregation extremely difficult. Ref T5802517
AllenLuUber
reviewed
May 18, 2020
AllenLuUber
approved these changes
May 18, 2020
This duplicates the `EndWithAppError` functions so that `Call` and `Handle` have different call paths, `EndCallWithAppError` and `EndHandleWithAppError` respectively. The handle signature will change in the next commit. No functionality has changed.
THe majority of code here is identical to that introduced and validated in D4334363. Here, we add a context override function that determines whether or not to throw away the user's response, in favour of the context error. This ensures that YARPC metrics better reflect timeouts, including higher correlation with sidecar/proxy metrics.
This enables the `End{Call,Handle}WithAppError` methods to set additional log fields. This will be used in the following commit to add observability into dropped responses. No functionality has changed
Now that we are potentially overriding handler responses with context errors, we need to let users know somehow. This change adds a `dropped` log field into the existing metrics, showing the underlying success, error or application error that was dropped.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
If a caller timed out waiting for a request, it would emit a timeout. However,
the YARPC callee would continue fulfilling the request, and emit/log the success
or error, never reporting a timeout. This causes a misalignment of metrics
between YARPC metrics and caller/sidecar metrics. YARPC callees were never
returning timeout errors.
To fix this, this change overrides handler responses with the context error, if
the observability middleware sees that the context deadline expired or if there
was a context cancellation. Users responses are thrown away, but logged in
the existing YARPC logs, under a
dropped
field.Commits are individually reviewable and will be rebased onto dev.