What is intended to be reported as an "error" from the HTTP plugin? #297

timrogers · 2018-09-29T08:40:28Z

What kinds of errors are we trying to report as "errors" from the HTTP(S) plugin?

Lines 59 to 67 in 34c1685

    
           req.on('error', err => { 
        
             span.addTags({ 
        
               'error.type': err.name, 
        
               'error.msg': err.message, 
        
               'error.stack': err.stack 
        
             }) 
        
             span.finish() 
        
           })

I've observed that 4XX and 5XX HTTP responses don't seem to lead to errors being recorded in Datadog APM. Perhaps this is the expected behaviour, but strikes me as a little counterintuitive and worth documenting if it is how it is meant to work.

I realise that this approach may align with the JavaScript ecosystem more generally (for example fetch doesn't natively treat HTTP error responses as errors.

The text was updated successfully, but these errors were encountered:

rochdev · 2018-09-30T17:21:13Z

Right now you are correct that only network errors are considered as errors. I'm actually not sure what the best behavior would be.

For example, from an HTTP client standpoint:

The source of network errors is usually unknown from the client perspective, so it probably should be an error.
5xx errors are actually errors upstream, not errors from the client. It may or may not cause the caller to decide to fail. These can also be retried.
4xx are definitely errors from the client. Retrying makes no sense since the same invalid request would simply be resent.

I'd say maybe it makes sense to make all of these errors as you've proposed. At the same time, I'd like to keep the behavior consistent across all our tracers so we'll have to discuss this internally first.

In any case, there should be a configuration for this, similar to validateStatus on HTTP server integrations, which could be added in the meantime while we reconsider the defaults.

timrogers · 2018-09-30T19:36:27Z

My gut feeling is that for most use cases, recording 4XXs and 5XXs as errors would be valuable.

Most of the time, an elevated rate of these would indicate a problem with the health of my application which I want to be able to see at a glance, receive alerts for, etc.

That said, there are cases where (at least for certain requests), some 4XX or 5XX errors are an expected part of the ordinary operation of the application and don't represent a problem, so having the flexibility to decide what is and isn't an error would be valuable. (On the other hand, it seems like you would always want to consider timeouts and network errors an "error").

A validateStatus function as part of the configuration could help with this. At the minimum, I'd expect it to be passed the status, but perhaps some other context (for example the request) would be useful as well to allow me to decide on a more fine-grained basis what is an "error" and what isn't (although obviously that level of configurability adds complexity).

Nevertheless, it feels like recording these as errors would be a reasonable default.

Can I suggest that we?:

(a) document explicitly what is treated as an error and what isn't
(b) add the simplest possible configuration, a validateStatus function which is passed the response code

timrogers · 2018-09-30T21:28:22Z

Thinking about it, you might well have logic to retry connection failures too (if the request is idempotent), and despite having handled the error, you’d still probably want it recorded as an error. I think the same goes for 500s.

…

On Sun, 30 Sep 2018 at 18:21, Roch Devost ***@***.***> wrote: Right now you are correct that only network errors are considered as errors. I'm actually not sure what the best behavior would be. For example, from an HTTP client standpoint: - The source of network errors is usually unknown from the client perspective, so it probably should be an error. - 5xx errors are actually errors upstream, not errors from the client. It may or may not cause the caller to decide to fail. These can also be retried. - 4xx are definitely errors from the client. Retrying makes no sense since the same invalid request would simply be resent. I'd say maybe it makes sense to make all of these errors as you've proposed. At the same time, I'd like to keep the behavior consistent across all our tracers so we'll have to discuss this internally first. In any case, there should be a configuration for this, similar to validateStatus on HTTP server integrations, which could be added in the meantime while we reconsider the defaults. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#297 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAHFpim3N1WjL57LZbCVmaxmA1wCbE2wks5ugP2MgaJpZM4XAZ8L> .

timrogers · 2018-10-01T15:34:25Z

I'm happy to write the code for this. How would you want the errors to be recorded? There seem to be three properties: msg, type and stack. stack is going to be challenging - we might have to omit it if that's permitted (or potentially we could generate a stack trace from the current line and remove entries that came from dd-trace-js).

rochdev · 2018-10-01T15:41:13Z

In this case, since there is no actual error, I would probably omit the error object and simply set the error tag. Then it will be possible to infer the reason of the error from the status code.

I wouldn't generate an actual error whenever it's possible not to do it because error generation is pretty heavy in JavaScript.

timrogers · 2018-10-03T17:20:59Z

I have an implementation for this locally. I’ll push it up tomorrow.

The [OpenTracing spec][1] states that a span should be tagged as an error "if and only if the application considers the operation represented by the Span to have failed". The `dd-trace-js` HTTP plugin allows you to automatically instrument outgoing HTTP requests. Currently, it only tags a span as an error when the request emits an `error` event, which only happens in the case of "hard failures" like a timeout or connection problem. https://github.com/DataDog/dd-trace-js/blob/34c168555f44a961a61ede95a39341d7c527701a/src/plugins/http.js#L59-L67 This means that requests that return responses never are considered errors, even if a 4XX or 5XX response is received. Having spans tagged as errors is useful, because it means they are flagged as such in the Datadog APM interface, useful for reporting. This commit adds support for a new `validateStatus` option for the HTTP plugin which allows the user to supply a `validateStatus` function which, when a response is received, is passed the response status and expected to return `false` if the response should be considered an error, or `true` if not. If it returns `false`, the span will be marked as an error. By default, we will continue not to consider any requests that receive a response to be errors, but this configuration option allows the user to "opt in" if they want to see these spans as errors. Fixes DataDog#297. [1]: https://github.com/opentracing/specification/blob/master/semantic_conventions.md

The [OpenTracing spec][1] states that a span should be tagged as an error "if and only if the application considers the operation represented by the Span to have failed". The `dd-trace-js` HTTP plugin allows you to automatically instrument outgoing HTTP requests. Currently, it only tags a span as an error when the request emits an `error` event, which only happens in the case of "hard failures" like a timeout or connection problem. https://github.com/DataDog/dd-trace-js/blob/34c168555f44a961a61ede95a39341d7c527701a/src/plugins/http.js#L59-L67 This means that requests that return responses never are considered errors, even if a 4XX or 5XX response is received. Having spans tagged as errors is useful, because it means they are flagged as such in the Datadog APM interface, useful for reporting. This commit adds support for a new `validateStatus` option for the HTTP plugin which allows the user to supply a `validateStatus` function which, when a response is received, is passed the response status and expected to return `false` if the response should be considered an error, or `true` if not. If it returns `false`, the span will be marked as an error. By default, we will consider requests that return `4XX` responses to be errors, since a `4XX` indicates a failure which is within the user's control. Fixes DataDog#297. [1]: https://github.com/opentracing/specification/blob/master/semantic_conventions.md

The [OpenTracing spec][1] states that a span should be tagged as an error "if and only if the application considers the operation represented by the Span to have failed". The `dd-trace-js` HTTP plugin allows you to automatically instrument outgoing HTTP requests. Currently, it only tags a span as an error when the request emits an `error` event, which only happens in the case of "hard failures" like a timeout or connection problem. https://github.com/DataDog/dd-trace-js/blob/34c168555f44a961a61ede95a39341d7c527701a/src/plugins/http.js#L59-L67 This means that requests that return responses never are considered errors, even if a 4XX or 5XX response is received. Having spans tagged as errors is useful, because it means they are flagged as such in the Datadog APM interface, useful for reporting. This commit adds support for a new `validateStatus` option for the HTTP plugin which allows the user to supply a `validateStatus` function which, when a response is received, is passed the response status and expected to return `false` if the response should be considered an error, or `true` if not. If it returns `false`, the span will be marked as an error. By default, we will consider requests that return `4XX` responses to be errors, since a `4XX` indicates a failure which is within the user's control. Fixes #297. [1]: https://github.com/opentracing/specification/blob/master/semantic_conventions.md

johnnydimas · 2018-10-24T07:55:15Z

Hey @timrogers, thanks for submitting this change. We were racking our brains trying to figure out why our 4xx errors weren't being reported in APM as errors.

I have pointed our npm package to the latest in master as your branch was merged, but not yet published.

I was curious if you could provide an example of how you implement this? Does it require any changes or does it work by default now that 4xx errors are treated as errors, since by definition they are errors?

Thanks!

timrogers · 2018-10-24T08:04:35Z

It should just work. I haven’t updated yet 🙈 What’s missing here is the ability to customise what is considered an error based on how the specific API you’re calling works, but if you’re not worried about that and can configure globally, this should work for you.

…

On Wed, 24 Oct 2018 at 08:55, Johnny ***@***.***> wrote: Hey @timrogers <https://github.com/timrogers>, thanks for submitting this change. We were racking our brains trying to figure out why our 4xx errors weren't being reported in APM as errors. I have pointed our npm package to the latest in master as your branch was merged, but not yet published. I was curious if you could provide an example of how you implement this? Does it require any changes or does it work by default now that 4xx errors are treated as errors, since by definition they are errors? Thanks! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#297 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAHFpgQMg-2vWDw83FtS_1-UDSs-AF_Yks5uoBzpgaJpZM4XAZ8L> .

rochdev · 2018-10-24T14:39:20Z

@johnnydimas Please note that the feature was merged in the v0.7.0 branch and not master.

johnnydimas · 2018-10-24T19:25:21Z

Thanks for the responses!

johnnydimas · 2018-10-29T14:49:17Z

@rochdev Is there a timeline on when this will be available in a release?

rochdev · 2018-10-29T17:29:43Z

@johnnydimas I don't have a specific timeline but it should be fairly soon. In the meantime, you can use one of the intermediary beta releases, such as 0.7.0-beta.2 which I've just released.

timrogers · 2018-11-06T09:55:26Z

I’m going to close this as my change was merged, if not released. In an idea world, you’d be able to customise on a per-request basis but it’s hard to imagine how you’d do that in a nice automatic way!

rochdev added community feature-request integrations labels Sep 30, 2018

timrogers mentioned this issue Oct 4, 2018

Add validateStatus option to HTTP plugin #301

Merged

rochdev added this to the 0.7.0 milestone Oct 26, 2018

timrogers closed this as completed Nov 6, 2018

luozhaoyu mentioned this issue May 8, 2019

Http 5xx errors status are categorized as "OK" instead of "Error" in APM tracing for Bottle plugin DataDog/dd-trace-py#927

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is intended to be reported as an "error" from the HTTP plugin? #297

What is intended to be reported as an "error" from the HTTP plugin? #297

timrogers commented Sep 29, 2018

rochdev commented Sep 30, 2018

timrogers commented Sep 30, 2018

timrogers commented Sep 30, 2018 via email

timrogers commented Oct 1, 2018

rochdev commented Oct 1, 2018

timrogers commented Oct 3, 2018

johnnydimas commented Oct 24, 2018

timrogers commented Oct 24, 2018 via email

rochdev commented Oct 24, 2018

johnnydimas commented Oct 24, 2018

johnnydimas commented Oct 29, 2018

rochdev commented Oct 29, 2018

timrogers commented Nov 6, 2018

What is intended to be reported as an "error" from the HTTP plugin? #297

What is intended to be reported as an "error" from the HTTP plugin? #297

Comments

timrogers commented Sep 29, 2018

rochdev commented Sep 30, 2018

timrogers commented Sep 30, 2018

timrogers commented Sep 30, 2018 via email

timrogers commented Oct 1, 2018

rochdev commented Oct 1, 2018

timrogers commented Oct 3, 2018

johnnydimas commented Oct 24, 2018

timrogers commented Oct 24, 2018 via email

rochdev commented Oct 24, 2018

johnnydimas commented Oct 24, 2018

johnnydimas commented Oct 29, 2018

rochdev commented Oct 29, 2018

timrogers commented Nov 6, 2018