Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is intended to be reported as an "error" from the HTTP plugin? #297

Closed
timrogers opened this issue Sep 29, 2018 · 13 comments
Closed

What is intended to be reported as an "error" from the HTTP plugin? #297

timrogers opened this issue Sep 29, 2018 · 13 comments

Comments

@timrogers
Copy link
Contributor

What kinds of errors are we trying to report as "errors" from the HTTP(S) plugin?

req.on('error', err => {
span.addTags({
'error.type': err.name,
'error.msg': err.message,
'error.stack': err.stack
})
span.finish()
})

I've observed that 4XX and 5XX HTTP responses don't seem to lead to errors being recorded in Datadog APM. Perhaps this is the expected behaviour, but strikes me as a little counterintuitive and worth documenting if it is how it is meant to work.

I realise that this approach may align with the JavaScript ecosystem more generally (for example fetch doesn't natively treat HTTP error responses as errors.

@rochdev
Copy link
Member

rochdev commented Sep 30, 2018

Right now you are correct that only network errors are considered as errors. I'm actually not sure what the best behavior would be.

For example, from an HTTP client standpoint:

  • The source of network errors is usually unknown from the client perspective, so it probably should be an error.
  • 5xx errors are actually errors upstream, not errors from the client. It may or may not cause the caller to decide to fail. These can also be retried.
  • 4xx are definitely errors from the client. Retrying makes no sense since the same invalid request would simply be resent.

I'd say maybe it makes sense to make all of these errors as you've proposed. At the same time, I'd like to keep the behavior consistent across all our tracers so we'll have to discuss this internally first.

In any case, there should be a configuration for this, similar to validateStatus on HTTP server integrations, which could be added in the meantime while we reconsider the defaults.

@timrogers
Copy link
Contributor Author

My gut feeling is that for most use cases, recording 4XXs and 5XXs as errors would be valuable.

Most of the time, an elevated rate of these would indicate a problem with the health of my application which I want to be able to see at a glance, receive alerts for, etc.

That said, there are cases where (at least for certain requests), some 4XX or 5XX errors are an expected part of the ordinary operation of the application and don't represent a problem, so having the flexibility to decide what is and isn't an error would be valuable. (On the other hand, it seems like you would always want to consider timeouts and network errors an "error").

A validateStatus function as part of the configuration could help with this. At the minimum, I'd expect it to be passed the status, but perhaps some other context (for example the request) would be useful as well to allow me to decide on a more fine-grained basis what is an "error" and what isn't (although obviously that level of configurability adds complexity).

Nevertheless, it feels like recording these as errors would be a reasonable default.

Can I suggest that we?:

  • (a) document explicitly what is treated as an error and what isn't
  • (b) add the simplest possible configuration, a validateStatus function which is passed the response code

@timrogers
Copy link
Contributor Author

timrogers commented Sep 30, 2018 via email

@timrogers
Copy link
Contributor Author

I'm happy to write the code for this. How would you want the errors to be recorded? There seem to be three properties: msg, type and stack. stack is going to be challenging - we might have to omit it if that's permitted (or potentially we could generate a stack trace from the current line and remove entries that came from dd-trace-js).

@rochdev
Copy link
Member

rochdev commented Oct 1, 2018

In this case, since there is no actual error, I would probably omit the error object and simply set the error tag. Then it will be possible to infer the reason of the error from the status code.

I wouldn't generate an actual error whenever it's possible not to do it because error generation is pretty heavy in JavaScript.

@timrogers
Copy link
Contributor Author

I have an implementation for this locally. I’ll push it up tomorrow.

timrogers pushed a commit to timrogers/dd-trace-js that referenced this issue Oct 4, 2018
The [OpenTracing spec][1] states that a span should be tagged as
an error "if and only if the application considers the operation
represented by the Span to have failed".

The `dd-trace-js` HTTP plugin allows you to automatically
instrument outgoing HTTP requests.

Currently, it only tags a span as an error when the request
emits an `error` event, which only happens in the case of "hard
failures" like a timeout or connection problem.

https://github.com/DataDog/dd-trace-js/blob/34c168555f44a961a61ede95a39341d7c527701a/src/plugins/http.js#L59-L67

This means that requests that return responses never are
considered errors, even if a 4XX or 5XX response is received.

Having spans tagged as errors is useful, because it means they
are flagged as such in the Datadog APM interface, useful for
reporting.

This commit adds support for a new `validateStatus` option for
the HTTP plugin which allows the user to supply a
`validateStatus` function which, when a response is received, is
passed the response status and expected to return `false` if the
response should be considered an error, or `true` if not. If it
returns `false`, the span will be marked as an error.

By default, we will continue not to consider any requests that
receive a response to be errors, but this configuration option
allows the user to "opt in" if they want to see these spans as
errors.

Fixes DataDog#297.

[1]: https://github.com/opentracing/specification/blob/master/semantic_conventions.md
timrogers pushed a commit to timrogers/dd-trace-js that referenced this issue Oct 6, 2018
The [OpenTracing spec][1] states that a span should be tagged as
an error "if and only if the application considers the operation
represented by the Span to have failed".

The `dd-trace-js` HTTP plugin allows you to automatically
instrument outgoing HTTP requests.

Currently, it only tags a span as an error when the request
emits an `error` event, which only happens in the case of "hard
failures" like a timeout or connection problem.

https://github.com/DataDog/dd-trace-js/blob/34c168555f44a961a61ede95a39341d7c527701a/src/plugins/http.js#L59-L67

This means that requests that return responses never are
considered errors, even if a 4XX or 5XX response is received.

Having spans tagged as errors is useful, because it means they
are flagged as such in the Datadog APM interface, useful for
reporting.

This commit adds support for a new `validateStatus` option for
the HTTP plugin which allows the user to supply a
`validateStatus` function which, when a response is received, is
passed the response status and expected to return `false` if the
response should be considered an error, or `true` if not. If it
returns `false`, the span will be marked as an error.

By default, we will consider requests that return `4XX` responses
to be errors, since a `4XX` indicates a failure which is within
the user's control.

Fixes DataDog#297.

[1]: https://github.com/opentracing/specification/blob/master/semantic_conventions.md
rochdev pushed a commit that referenced this issue Oct 23, 2018
The [OpenTracing spec][1] states that a span should be tagged as
an error "if and only if the application considers the operation
represented by the Span to have failed".

The `dd-trace-js` HTTP plugin allows you to automatically
instrument outgoing HTTP requests.

Currently, it only tags a span as an error when the request
emits an `error` event, which only happens in the case of "hard
failures" like a timeout or connection problem.

https://github.com/DataDog/dd-trace-js/blob/34c168555f44a961a61ede95a39341d7c527701a/src/plugins/http.js#L59-L67

This means that requests that return responses never are
considered errors, even if a 4XX or 5XX response is received.

Having spans tagged as errors is useful, because it means they
are flagged as such in the Datadog APM interface, useful for
reporting.

This commit adds support for a new `validateStatus` option for
the HTTP plugin which allows the user to supply a
`validateStatus` function which, when a response is received, is
passed the response status and expected to return `false` if the
response should be considered an error, or `true` if not. If it
returns `false`, the span will be marked as an error.

By default, we will consider requests that return `4XX` responses
to be errors, since a `4XX` indicates a failure which is within
the user's control.

Fixes #297.

[1]: https://github.com/opentracing/specification/blob/master/semantic_conventions.md
@johnnydimas
Copy link

Hey @timrogers, thanks for submitting this change. We were racking our brains trying to figure out why our 4xx errors weren't being reported in APM as errors.

I have pointed our npm package to the latest in master as your branch was merged, but not yet published.

I was curious if you could provide an example of how you implement this? Does it require any changes or does it work by default now that 4xx errors are treated as errors, since by definition they are errors?

Thanks!

@timrogers
Copy link
Contributor Author

timrogers commented Oct 24, 2018 via email

@rochdev
Copy link
Member

rochdev commented Oct 24, 2018

@johnnydimas Please note that the feature was merged in the v0.7.0 branch and not master.

@johnnydimas
Copy link

Thanks for the responses!

@rochdev rochdev added this to the 0.7.0 milestone Oct 26, 2018
@johnnydimas
Copy link

@rochdev Is there a timeline on when this will be available in a release?

@rochdev
Copy link
Member

rochdev commented Oct 29, 2018

@johnnydimas I don't have a specific timeline but it should be fairly soon. In the meantime, you can use one of the intermediary beta releases, such as 0.7.0-beta.2 which I've just released.

@timrogers
Copy link
Contributor Author

I’m going to close this as my change was merged, if not released. In an idea world, you’d be able to customise on a per-request basis but it’s hard to imagine how you’d do that in a nice automatic way!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants