-
Notifications
You must be signed in to change notification settings - Fork 893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
4xx client errors to be left unset #2309
4xx client errors to be left unset #2309
Conversation
This statement does not seem that convincing that this is the right thing to do :) |
@bogdandrutu what are your thoughts? Should we change it or keep it as is? I can think of reasons both to change it or keep it as is - I don't feel strongly either way. I'm making this PR so we can make a decision, as it is a noted decision point in getting the HTTP semconv to stable. |
HTTP statuses 4xx mean "client errors". They usually mean that client made some mistake: it was not authenticated, it asked for a wrong url etc. IMO client should be made aware of this mistake so that it could fix it. I vote against this change. |
From my memory, the span status was introduced after a long discussion where the agreement was that usually it is context-dependent if something is an error or not. Thus, the accepted OTEP defined the error codes UNSET, ERROR and OK together with a source field that indicates whether the status is set by an instrumentation or the user. Please see https://github.com/open-telemetry/oteps/blob/main/text/trace/0136-error_flagging.md This distinction was deemed important, because an instrumentation-set code can only be a vague heuristic hint, because it does not know the context in which it is used. E.g. did the HTTP library request some fixed API URL like Of course, the source field then did not make it from the OTEP to the specification, so this got a bit unclear. But given that we already have a definition that considers 4xx an error status on the client side at least, and we did deliberately introduce a client/server distinction when changing the original definition that also considered it an error on the server side, I am against this PR. I think the case for this backwards-incompatible change is not strong enough. |
@Oberon00 I think this is a good example to prove that we should consider adding the source to the specs. |
I had people reach out to me with exactly this use case, I think this often comes up when instrumenting libraries that use HTTP under the hood. For me there is a general open question regarding the error status on a span: if a child span has a status of |
This part still seems to be unclear for me and generates even more questions, like
I feel it might be beneficial to relax the current requirements (yet the spec is still experimental), so we improve here in the future, for example by introducing some configuration options/language, so a user can adjust the instrumentation behavior according to their case (e.g., [400-403, 405-407, 411-419] -> ERROR, the rest 4xx -> UNSET). |
Such configuration can get quite complex and would need to be re-implemented in every OTel language (at least, if you do code sharing between different instrumentations of the same language). I feel like this is better handled on the backend or collector. There you could define a configuration like:
Although I can see an argument for deploying such configuration along with the application. |
@Oberon00 100% I agree that configuring error status should happen in a collector (or even farther down the pipe). The question remains, though: what is the best default? In practice, does counting 4xx as errors just create noise that most users will immediately want to turn off? In other words, do users want to configure 4xx errors by selectively turn them on? Or by selectively turning them off? In my experience, users want these errors suppressed, except in specific circumstances. So the proper default should be that 4xx are not marked as errors. If others disagree, I am curious: with the observability system you currently use (and/or work on), what is the default? What default do you think operators prefer in practice, and why? |
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
Closed as inactive. Feel free to reopen if this PR is still being worked on. |
Up? |
@carlosalberto seems we have some folks on either side of the argument, and as a result this PR isn't going anywhere. As the assignee do you have any suggestion on how we should proceed? |
Hey @jamesmoessis I think we need more actual eyes on this, so let's present it on the next Spec call (Tuesday). Hopefully you are there, as plan would be not to poke the community, but hopefully discuss the pros and cons ;) |
@carlosalberto I would love to come to a spec meeting but unfortunately they are at 3am my local time, and I'm away on vacation for the next week. I'm happy to discuss the pros and cons here (and in SIGs that I can make). That being said it would be great if it had more eyes on it. The community needs to make a yes/no decision on this PR in order to make a step towards a stable HTTP semconv. |
@jamesmoessis Thanks for the heads up. So I will present that in that Spec call and will report here what's the result of that discussion. |
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
Hey @jamesmoessis We discussed this in the last Spec SIG call, and we want to stick with what we have right now, based on:
Let us know what you think. |
Thank you a bunch for raising this in the spec SIG @carlosalberto! I think the conclusions reached and reasons for conclusion are completely reasonable. I'm happy to close this PR now that it's been discussed, and seemingly a decision has been made. |
Fixes part of OTEP-174
Changes
As part of OTEP-174 it was suggested that span error statuses should be left unset for client 4xx spans. The reason being, a 4xx on the client does not necessarily indicate an error. For example, a 429 might not be considered an error on the client side.
Related oteps #