Skip to content
This repository has been archived by the owner on Dec 6, 2024. It is now read-only.

Error flagging with status codes #136

Merged
merged 33 commits into from
Sep 17, 2020
Merged
Changes from 1 commit
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
bd0177d
Add error flagging proposal
Sep 9, 2020
54121cd
removed half finished sentence
Sep 9, 2020
725cbc0
Remove error.message, it is redundant with the status message.
Sep 9, 2020
ae86379
whitespace
Sep 9, 2020
74dd7f5
whitespace
Sep 9, 2020
0d34ea1
clarify convenience methods are in a seprate package
Sep 9, 2020
d951674
whitespace
Sep 9, 2020
c814974
whitespace
Sep 9, 2020
e787f94
use correct RFC language
Sep 9, 2020
3a25b4c
spelling
Sep 9, 2020
4a42c3a
spellingUpdate text/trace/0000-error_flagging.md
Sep 9, 2020
d204bc7
capitalization
Sep 9, 2020
f551758
Capitalization
Sep 9, 2020
44b1fa4
error mapping -> status mapping
Sep 9, 2020
bb1bfb6
whitespace
Sep 9, 2020
ccc0f5e
replace override status codes with user_override boolean
Sep 9, 2020
241e2aa
rewrite based on error WG feedback
Sep 10, 2020
db506d6
indicate that status source is a new field.
Sep 10, 2020
7e5ef2b
remove some garbage
Sep 10, 2020
6fb4bcb
Consolidate OPERATOR and APPLICATION into USER
Sep 10, 2020
639e661
spelling
Sep 10, 2020
0c8899b
nominal -> normal
Sep 10, 2020
ec77065
markdownlint
Sep 11, 2020
788844d
Add PR number to file name
Sep 11, 2020
ed50989
more lint
Sep 11, 2020
a6de9cc
clarify end users
Sep 15, 2020
19a3d23
clarify end user
Sep 15, 2020
cc8f305
clarify terms, update old intro
Sep 15, 2020
2532107
fix intro
Sep 15, 2020
7de13e9
clarify the meaning of normal
Sep 15, 2020
1537cc3
clarify status mapping
Sep 15, 2020
2b9e943
Update text/trace/0136-error_flagging.md
Sep 16, 2020
76f0597
final nits
Sep 16, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
clarify terms, update old intro
  • Loading branch information
Ted Young committed Sep 15, 2020
commit cc8f30509461badbe70c70dd743a934c16619d94
12 changes: 7 additions & 5 deletions text/trace/0136-error_flagging.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,16 @@
# Error Flagging with Status Codes

This proposal adds two status codes explicitly for use as overrides by the end user, and proposes a canonical mapping of semantic conventions to status codes. This clarifies how error reporting should work in OpenTelemetry.
This proposal reduces the number of status codes to three, a new field to identify status codes set by application developers and operators, and proposes a canonical mapping of semantic conventions to status codes. This clarifies how error reporting should work in OpenTelemetry.

Note: the term **end user** is defined as the application developers and operators of the system running opentelemetry. The term **instrumentation** refers to [instrumentation libraries](https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/glossary.md#instrumentation-library) for common code shared between different systems, such as web frameworks and database clients.
tedsuo marked this conversation as resolved.
Show resolved Hide resolved

## Motivation

Error reporting is a fundamental use case for distributed tracing. While we prefer that error flagging occurs within analysis tools, and not within instrumentation plugins, a number of currently supported analysis tools and protocols rely on the existence of an explicit error flag reported from instrumentation. In OpenTelemetry, the error flag is called "status codes".
Error reporting is a fundamental use case for distributed tracing. While we prefer that error flagging occurs within analysis tools, and not within instrumentation, a number of currently supported analysis tools and protocols rely on the existence of an explicit error flag reported from instrumentation. In OpenTelemetry, the error flag is called "status codes".

However, there is confusion over the mapping of semantic conventions to status codes, and concern over the subjective nature of errors. Which network failures count as an error? Are 404s an error? The answer is often dependent on the situation, but without even a baseline of suggested status codes for each convention, the instrumentation author is placed under the heavy burden of making the decision. Worse, the decisions will not be in sync across different instrumentation packages.
However, there is confusion over the mapping of semantic conventions to status codes, and concern over the subjective nature of errors. Which network failures count as an error? Are 404s an error? The answer is often dependent on the situation, but without even a baseline of suggested status codes for each convention, the instrumentation author is placed under the heavy burden of making the decision. Worse, the decisions will not be in sync across different instrumentation.

There is one other missing piece, required for proper error flagging. Both application developers and operators have a deep understanding of what constitutes an error in their system. OpenTelemetry must provide a way for these users to control error flagging, and explicitly indicate that it is the end user setting the status code, and not instrumentation plugins. In these specific cases, the error flagging is known to be correct: the end user has decided the status of the span, and they do not want another interpretation.
There is one other missing piece, required for proper error flagging. Both application developers and operators have a deep understanding of what constitutes an error in their system. OpenTelemetry must provide a way for these users to control error flagging, and explicitly indicate that it is the end user setting the status code, and not instrumentation. In these specific cases, the error flagging is known to be correct: the end user has decided the status of the span, and they do not want another interpretation.

While generic instrumentation can only provide a generic schema, end users are capable of making subjective decisions about their systems. And, as the end user, they should get to have the final call in what constitutes an error. In order to accomplish this, there must be a way to differentiate between errors flagged by instrumentation, and errors flagged by the end user.
tedsuo marked this conversation as resolved.
Show resolved Hide resolved

@@ -52,7 +54,7 @@ Note that these convenience methods simply wire together multiple API calls. The

## Internal details

This proposal is mostly backwards compatible with existing code, protocols, and the OpenTracing bridge. The only potential exception is the removal of status codes enums from the current OTLP protocol, and the rewriting of the small number of instrumentation plugins that were making use of them.
This proposal is mostly backwards compatible with existing code, protocols, and the OpenTracing bridge. The only potential exception is the removal of status codes enums from the current OTLP protocol, and the rewriting of the small number of instrumentation that were making use of them.

## BUT ERRORS ARE SUBJECTIVE!! HOW CAN WE KNOW WHAT IS AN ERROR? WHO ARE WE TO DEFINE THIS?