-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow to specify ActivityContext of new Activity to StartActivity(..) to unblock interop with real-world vendors #42786
Comments
This scenario is currently not supported. It is something we can consider looking at in the future releases. |
Thanks, @tarekgh / Thanks! |
@macrogreg your feedback is valuable please don't stop sending it. I was just trying to clarify that we are done with 5.0. Thank you too. |
Thanks for confirming, @tarekgh . |
@macrogreg - I don't think I fully understand how the scenario is ideally intended to work. Could you walk me through a simple example? The setup I am imagining is we have some non-W3C system sending a message to a monitored .NET service. The message still has some type of correlation identifier attached to it but it doesn't follow trace-context conventions. It sounds like the goal was inside the .NET process we want to generate a W3C compliant id and I am guessing a monitoring tools like datadog wants to record some telemetry establishing the parent-child relationship at the boundary between these two services. Assuming you had a free hand to implement the behavior of the .NET process however you wanted to get the best result...
Thanks! |
Hi @noahfalk , When I submitted this last weekend I operated in the mode "it's unlikely that we can change anything for .NET 5, but let's still raise the point". For a subsequent release we may want to consider going one step further: We may face an architectural weakness here: Let me switch to more concrete case. Let me use Datadog to describe how I would implement context propagation in a potentially heterogenous monitoring environment. (I think other vendors would be in a very similar spot.) Together, we can decide if this makes sense or if something is missing. :) W3C defined the length of the traceId and the spanId. It recommends it to be random.
This case is particularly easy, because Datadog uses:
Datadog is using a shorter trace ID (for historical reasons, 16 bytes instead of 32 as recommended by W3C). We also have a specific algorithm to generate those IDs that we prefer over the Guid-based approach built into Activity. We use proprietary headers to propagate trace context.
|
Thanks @macrogreg : )
Does this mean the id that flows in-process would be the complete 32 byte trace id or you would prefer to truncate it to the 16 byte id with zeros? It was unclear what value is in that outbound W3C header.
Are you anticipating all Activities to have 16 byte zero filled ids, or just certain ones that DataDog code is responsible for creating? |
I am not sure I understand the question. When propagating downstream, we would include both DD and W3C header because we do not know what monitoring solution the downstream system is using.
If valid & consistent W3C info came in with the request we can use all 32 bytes of it in the way described above. If the Trace ID is invented by a DD system, it should only populate 16 bytes of the trace ID and leave the rest constant. However, the current API forces us to leave the trace id creation to the Activity internal logic. It will populate all 32 bytes and we will use only 16 of them for DD ids. See what I mean? :) |
Then you guessed well because you still answered it. I wanted to know if you would preserve all 32 bytes of the W3C identifier when you propogate the id in memory and you confirmed that yes, you would (assuming it matches the DataDog header).
I am trying to figure out in what circumstances you want DD to be in charge of inventing these IDs. For example the code below is a hypothetical customer console app that has no pre-existing W3C or DataDog header (because there was no incoming message at all). Are you aiming for DataDog to gain control of the TraceId generated for this Activity? I'm guessing no but I am still trying to fill in the gaps in my understanding : )
|
I think this is a great example. If there is no Listener then there is no Activity. The meta point is that the ID format is the "business" of the telemetry consumer, not the telemetry producer. |
I am assuming DataDog would have done some code-less attach to get a listener hooked up, right? So you are saying in this scenario you are hoping for DataDog to be in control of creating the ID?
While I agree that consumers are the ones who care, increasingly there isn't a single consumer and the app developer may play a role as both consumer and producer. ILogger puts the IDs in log messages, customers can write their own ActivityListeners, and OpenTelemetry/code-less instrumentation may be storing the IDs as well. It feels like your goal to have DataDog be able to control the IDs of all root Activities is fundamentally at odds with the platform goal to use a standard format. Instead of seeking to control the ID that shows up on Activity.ID, I would treat it as a W3C based system that you need to interoperate with. I think you have several mechanisms at your disposal to do so:
|
For Datadog we do not have a problem, we can follow the strategy described earlier (see remarks above). I am trying to look at it form the BCL perspective. It is also relevant as we are working on OTel auto-instrumentation. There, we do not know who the vendor is and what ID format they are using, so I am trying to be generic. :) There may be many exporters. None of them is the "leader" so they cannot be determining the id.
Is the standard format really a goal? If yes, OK. But (arguably), if Activity is a data exchange type, it should not prescribe a distributed id format. After all, the distributed trace may use different languages, vendors, versions for the respective microservices. And:
Exactly, that's what I am getting at. :) |
Yes : ) For multiple reasons:
I'm not suggesting that you get to set an arbitrary format, I'm suggesting you get to define what W3C trace context will be treated as the parent, if any. |
I am not sure I understand. :)
Fair enough. This can be a problem for a vendor who is not using W3C. E.g. OTel is not (yet) using it. In the long-term this makes complete sense for .NET as having this standardization would make things easier. On the other hand, if a vendor is not using a W3C compatible system they may be disincentivized from using Activities and ship their own libraries that represent Spans.
Gotcha. I completely see the desire for simplicity. Its just that everyone who wants to use Activities must be W3C compatible and as a result adoption may be slower. Perhaps, this is a fine bet to take. But let's be aware of the risk. :) |
SpanId changes at each child and this is part of trace context.
I hope so, it is the bet we took : ) Risk acknowledged! |
Your link is refering to wire formats. For in-memory formats (the Span in OTel or Activity in .NET), OTel is standardized on W3C. Activity itself doesn't specify any behavior for how an ID is deserialized from an inbound message or serialized onto an outbound one. The bridge between wire formats and in-memory formats is controlled by the networking libraries with varying amounts of customization opportunity for 3rd parties. |
Closing this issue as we are going to track the solution proposal in the issue #46704. Feel free to comment there. You may look at the proposed change in the comment #46704 (comment). Thanks! |
Another one :)
(On a personal note: I really wished we had started this work stream at the OTel group earlier, so that these requests had more chance to flow into .NET 5, but I believe that it still has value to point out potential improvement opportunities. :) )
I propose that from the perspective of Open Telemetry, it is a critical scenario that a distributed operation can be traced across microservices that are monitored by a heterogeneous vendor ecosystem, using manual and/or auto-instrumentation. Such vendor systems may use proprietary, non-W3C-based trace and span IDs. As long as each local vendor system interoperates with the W3C trace context in a manner compliant to the W3C specification, the trace context will flow correctly across service boundaries. For that, the .NET Activity API MUST allow explicitly specifying the W3C trace context details (trace and span IDs) for new spans in a manner compliant with the W3C recommendation [L1, L2]. Currently, that is not supported.
(I'd be very happy to be corrected in I am missing something. :) )
W3C specifies how to interoperate between non-W3C-based trace and span IDs potentially used by vendors and W3C-compliant trace contexts:
On a high level, that involves:
At the very least, if should be possible to restrict the number of significant bytes in the IDs, as described by W3C [L1, L2].
However, that is currently not possible with the ActivitySource based API. It is possible to supply the parent context, but the user has no control over the generation of the trace context for the new span.
We should consider adding
ActivitySource.StartActivity(..)
overloads that allow specifying anActivityContext
for the span being created. If this is too late for this in .NET5, we should make and document an explicit workaround recommendation.The text was updated successfully, but these errors were encountered: