-
Notifications
You must be signed in to change notification settings - Fork 897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Standardize Server-Timing: traceparent "propagator" across vendors #3811
Comments
FYI @cedricziel as you expressed interest in writing a proposal for this in slack before |
Absolutely in favor of this. Pity ServerTiming is still not supported in Safari, though. |
In safari js has access to it for xhr and fetch (possibly with |
Seconding @mmanciop here. We would love to see a sustainable and supported way to communicate server context to client side technology. Server-Timing is widely used even beyond the implementations mentioned and I think OTel would benefit a lot from a specification of using it for the purpose of forwarding context to clients. |
Talk about timing! I was just discussing this yesterday with a few other folks and even opened this here: Here are a few notes based on my investigation so far:
|
I like all of what @jpkrohling has to say. I like the idea of using |
To explain the impact to others landing here: Conversely, SPAs sending XHR requests have no problem correlating their requests with the corresponding distributed traces server-side, as the JS in the browser can inject the |
Great point, @mmanciop. I was having problems understanding why we couldn't do it with our current solutions until @cedricziel showed me this diagram he created: |
As mentioned in a previous comment, I was getting ready to propose a spec change related to this, and here's the draft I had. Note that I was breaking down the task in smaller chunks, the first one being expanding the notion of propagators so that we define what's a "client propagator". The next one, based on the outcome of the W3C Trace Context issue I listed earlier, would be to define the first client propagator based on When working with client-side instrumentation, such as the ones being developed under the Client Instrumentation SIG, there’s currently no reliable way to obtain the trace context or any references to the trace generated by the backend during the initial document request on the client. While the client (browser, mobile app, …) might generate their trace IDs and send them via regular trace propagation mechanisms for correlation at the backend (like span links, or as the parent span), other scenarios might still be hard to implement. For instance, the response of a backend might cause a re-render of a UI component, and currently, it’s not possible to link the trace related to that re-render to the root span of the backend trace unless the trace ID has been created by the frontend and reused in the backend. This spec change proposal enhances the concept of propagators to differentiate between “backward propagators” (or response propagators) and “forward propagators” (or request propagators):
Without this differentiation, when implementing context propagation to clients, it would result in headers being sent in the request and response payloads that are not intended to be there, which might cause ambiguity, conflicts, and increased payload size. For example, an application configured with TraceContext propagator and a new hypothetical ClientPropagator might end up sending the following headers to all their outgoing requests to downstream services and to their responses to callers:
This spec change is agnostic to the payload and relates only to enhancing the definition of propagators. The payload that would be used in the first recommended backward propagator is still under definition by the W3C Trace Context working group and is, therefore, out of scope for this change. |
Relates to open-telemetry#3811 Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>
Yes, we should support passing this via the Server-Timing headers for browsers that support it. |
About
and
Note that the addition of custom request headers in XHR/fetch instrumentations is prone to cause same-origin policy issues. This can be worked around using CORS, but this causes significant friction and is commonly misunderstood. Ideally, a correlation solution does not have to (solely) rely on additional request headers. |
Relates to open-telemetry#3811 Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>
The W3C distributed tracing working group met with the Web Performance Working Group about exactly this today. Notes are in the tracking issue created by @jpkrohling w3c/trace-context#556 The short version is that we are encouraged by the possibility of using server timing. The group had previously decided to define a custom header purely because server-timing was nascent, but the landscape is significantly improved now. The next step is for the tracing working group to translate its existing draft response header spec into a version which uses a server-timing metric. |
biggest concern currently with server-timing is browser support. It is not available in safari or iOS currently, and according to https://caniuse.com/server-timing it is available for about 75% of users. After discussion with the web performance group, it seems that safari support is held back due to privacy concerns and is likely to be restricted to a same-origin policy regardless of CORS opt-in or timing-allow-origin. |
@dyladan, what I understood from w3c/trace-context#556 regarding this last concern is that we'd face the same challenges with Safari, so, we'd be in no better position if trace context would decide to have its own response header, right ? |
Now, yes. In 2018 when the question was first considered the answer was less clear. I wasn't sharing it as a reason not to use server timing, just trying to make sure everyone was aware of the limitations. |
Relates to open-telemetry#3811 Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>
Relates to open-telemetry#3811 Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>
Relates to open-telemetry#3811 Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>
@jpkrohling @johnbley The current |
Relates to open-telemetry#3811 Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>
For completeness, there is another way to propagate the context back to the client for web document load, and that is by writing a meta tag in the HTML content. This is currently implemented in the OTel document-load instrumentation. This at least has the advantage of working on Safari as well. |
Relates to open-telemetry#3811 Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>
Relates to open-telemetry#3811 Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>
This seems non-trivial enough to need an OTEP with more details. |
I think this is the status: #3825 (comment) |
Discussed in the 4/23/24 Spec SIG. Given that the problem is very related to browsers, it might be appropriate for the client SIG to work on this. I've set the status to |
Given that I opened a PR for this already (#3825), I'm OK being the sponsor. |
Maybe out of scope for this issue (or the PR above raised by @jpkrohling ) but if response propagators were to be configured to propagate context back to callers, would this be a good use of |
What are you trying to achieve?
Multiple otel vendors have used HTTP
Server-Timing
headers to propagateserver-side instrumentation context back to client instrumentation. I
would like the otel specification to canonicalize the names, formats,
and configuration options for this, and for the various otel implementations
to accept donated implementations of this concept.
Additional context.
Client-side instrumentation (in the sense of web or mobile apps) may set outbound context
via http headers which may be received by server-side instrumentation. However, there are a few
cases where this breaks down:
untrusted clients to influence the way their server-side instrumentation behaves
can't influence)
caused by adding headers to
fetch/xhr
requestsMultiple otel vendors have landed on a solution to the second point above, by using
Server-Timing
response headers generated by server-side instrumentation and received by client-side instrumentation.
A few links for your reference:
Server-Timing response headers are keyed to a name (conceptually it could be used like
"app=400, db=300, env=prod3"
). Several otel vendors/contributors have indepdently used this inthe fairly obvious way, where the key used is
traceparent
and the value is the full traceparent-formatstring. A complete example would be:
Some existing examples from around the otel universe:
Each one uses the exact same "propagation" concept (traceparent value format,
traceparent
is the key name inthe server-timing header). They do differ in configuration/setup, and they also differ in client usage -
for example, Microsoft's product (to my knowledge) uses it on browser page load to set the actual trace context
for the page load, while Splunk clients add the server-side trace context as a trace link to the appropriate
client-side http client span. In my opinion the specifciation can sidestep this issue (directed usage of the propagated
context) for now, or recommend making it configurable.
Questions towards a specification
server-side instrumentation to be
tracecontext,baggage,servertiming
and theservertiming
propagatorwould propagate back to the client?
configured on/off (e.g., environment variables)?
The text was updated successfully, but these errors were encountered: