-
Notifications
You must be signed in to change notification settings - Fork 772
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Step 1 - Use new Activity to Replace OT Span #660
Step 1 - Use new Activity to Replace OT Span #660
Conversation
public static void EnableOpenTelemetry(string source, ActivityProcessor activityProcessor) | ||
{ | ||
ActivitySource.AddActivityListener( | ||
activitySource => activitySource.Name.Equals(source), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, indent.
{ | ||
try | ||
{ | ||
disposableExporter.Dispose(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we're assuming that the exporter is only referenced by this processor (instead of being shared in multiple processors)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
current PR allows single pipeline only. But I think you are right, we cannot assume single ownership/dispose responsibility for exporters. I need to think this scenario and come back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Not a blocker for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the preview @cijothomas - let's know when the NuGet package is available, we will have more useful feedback when dealing with the translations.
public static void EnableOpenTelemetry(string source, ActivityProcessor activityProcessor) | ||
{ | ||
ActivitySource.AddActivityListener( | ||
activitySource => activitySource.Name.Equals(source), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cijothomas crossing this with dotnet/runtime#35326 I can understand the parameters but it will be good to name the parameters or add comments here, especially for the second and third parameters. Anyway, the signatures are already better per https://source.dot.net/#System.Diagnostics.DiagnosticSource/System/Diagnostics/ActivityListener.cs,42a5d087d90a1839,references
@pjanotti the current NuGet package is available as a single file System.Diagnostics.DiagnosticSource.5.0.0-dev.nupkg, maybe we can include in the PR and later clean it up (we might also consider rewrite git history if people hates binary files)? |
@reyang I agree with that. @cijothomas can you add it to this PR? |
Yes will do that. (packed day, so this would happen at end of day) |
Most likely we'll get the package from an alternate feed (like dotnet.myget...something) much earlier than may 14. So we wont have to checkin binaries. Waiting for official confirmation on this from .net team. |
// The following is generating activity. | ||
// A library would simply write the following line of code. | ||
var source = new ActivitySource("DemoSource"); | ||
using (var parent = source.StartActivity("parent")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is parent
end up being display name or "enumeration" name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If not a display name, please add code that sets the display name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StartActivity("anythinghere") - "anythinghere" becomes OperationName, DisplayName by default. DisplayName can be overridden, but OperationName cannot be.
I think the intention is DisplayName to become OT SpanName, as OT allows changing it after starting, but Activity.OperationName is fixed at creation.
{ | ||
child?.AddTag("child location", "child location"); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If parent may be null (see parent?.
), will the using
statement fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. Parent will be null if none is listening.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to be a real pain, many many people will forgot the null check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To sergey's qn - the using statement doesn't fail.
But any other place where parent is not nullchecked will fail. Hence the parent?
usage throughout. Users are expected to do this, and this is unit testable easily.
var source = new ActivitySource("DemoSource"); | ||
using (var parent = source.StartActivity("parent")) | ||
{ | ||
parent?.AddTag("parent location", "parent location"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are integers and other types allowed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only string - this is a drift from OT.
using (var child = source.StartActivity("child")) | ||
{ | ||
child?.AddTag("child location", "child location"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add example with the result setting. I bet the most of implementation will need try{}catch
to indicate that the activity failed
{ | ||
try | ||
{ | ||
// do not await, just start export |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious, I think this is supposed to wait until exporter returned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its fire and forget, so no await.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be different from other languages, for example Python.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without a backpressure system couldn't this become an issue if the producer overloads ability to process these Task
s?
Perhaps at least we can keep track how how many Tasks are in-flight and drop items if it reaches a certain configurable limit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this deserve a revisit. I followed the model from SimpleSpanProcessor.
@bruno-garcia let me know if this is a blocker for this PR or not. I can address this in next step, if you have no objections.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, without back pressure support we could easily over-produce and not understand where the blockage is. I think we need to at least have a plan for how we'd like to address it, even if the implementation comes at a later point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is not introducing this (I am just taking current SimpleSpanProcessor implementation) - These concerns/issues exist today, and is not addressed in this PR.
We definitely need to address this, and opening an issue to track this as this shouldn't block this PR :#674
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree that we don't want to block this PR.
{ | ||
if (!this.disposed) | ||
{ | ||
this.disposed = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not thread safe?
} | ||
} | ||
|
||
parent?.AddTag("http.status_code", "200"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the spec it is an integer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup. Tags are currently string,string only. This is a drift from OT spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think http.status_code
should use parent?.SetCustomProperty("http.status_code", 200)
rather than tag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current adapters use Attributes for storing status_code with ("http.status_code", 200). Attributes support int, but its replacement Activity.Tags only support string.
On top of this, Spans have Status, which don't have a place in Activity yet - the suggestion was to use activity.SetCustomProperty, for the SpanStatus. This was not finalized as well, as there was plans to drop the Status from Span?.
Agree this needs to be finalized - do we need it for this PR or we can do it when more clarity is achieved?
My simplest Exporter is not looking at activity.CustomProperties. Would add it in subsequent PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
discussion in dotnet activity PR: https://github.com/dotnet/designs/pull/98/files#diff-25bba2a21bc64ddeccd374c1c28a0fefR406
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tags are currently string,string only. This is a drift from OT spec.
It will be good if exporters (at least the ones on this repo) have consistent behavior in this regard, eg.: by default have every value
as a string
and optionally enable a best-effort to convert to some "supported" types by the exporter format.
... as there was plans to drop the Status from Span?.
is anybody actively pursuing that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the discussion here regarding supporting the OT Status type, then SetCustomProperty can be used here. But if we are talking about adding simple HTTP status code, then I believe using Tags will be more reasonable as it will log a simple code and no further information attached to it (e.g. description or even the thrown exception).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For HTTP the existing semantic conventions are already enough (and even the OTel Status can be derived from that). For others wanting to set span status or equivalent, we need further conventions to do the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any pending work to support non-string values as activity tags?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any pending work to support non-string values as activity tags?
Yes. Tarek is working on proposals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with some non-blocking issues that need to be revisited after this PR.
One thing that feels kind of off to me is... |
Yes. |
{ | ||
try | ||
{ | ||
// do not await, just start export |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without a backpressure system couldn't this become an issue if the producer overloads ability to process these Task
s?
Perhaps at least we can keep track how how many Tasks are in-flight and drop items if it reaches a certain configurable limit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may have missed some of the conversation, but is this expected to supersede span / tracer for manual instrumentation? Is there any recorded discussion on this topic?
I find the ActivitySource API difficult to use and not a good match for the OpenTelemetry spec. I've highlighted some issues in my review.
If this was just to extend support for existing ActivitySource I think it would be okay as long as the main programming interface was spans and the Activities were translated to spans.
/// </summary> | ||
/// <param name="processorFactory">Factory that creates exporting processor from the exporter.</param> | ||
/// <returns>Returns <see cref="ActivityProcessorPipelineBuilder"/>.</returns> | ||
public ActivityProcessorPipelineBuilder SetExportingProcessor(Func<ActivityExporter, ActivityProcessor> processorFactory) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we determine the 'exporting' processor is the last processor in the list instead of explicitly setting it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the same behavior as current SpanProcessorPipelineBuilder - I'd defer any changes to current design in subsequent PRs to unblock progress on this PR.
|
||
// The below commented out line shows more likely code in a real world webserver. | ||
// using (var parent = source.StartActivity("HttpIn", ActivityKind.Server, HttpContext.Request.Headers["traceparent"] )) | ||
using (var parent = source.StartActivity("HttpIn", ActivityKind.Server)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really do not like this API - it's very easy to missing a null check and introduce a non-obvious error. It also goes against the OpenTelemetry spec that a no-op tracer / span is returned in such a way no code needs to change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's very easy to missing a null check and introduce a non-obvious error.
StartActivity is not different than any other API that can return a null value. we use the ?
for simplifying the code. Usually, users adopt new APIs will look at the doc and usage patterns. Also, usually, anyone writes code like that, is going to test it. simple case testing will catch this problem very easily.
It also goes against the OpenTelemetry spec that a no-op tracer/span is returned in such a way no code needs to change.
There is no code that needs to change anyway. right? The code will be the same either there are listeners or not. please notice, Activity is a type that is used for many years. In our design, we wanted to serve both users already using Activity and OT scenarios.
CC @noahfalk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @tarekgh - I didn't mean my comments to come across so negative, sorry. I'll try to be more constructive.
I guess my main issue is that it is significantly different in usage compared to the OpenTelemetry specification. The span & tracer interfaces are designed to be robust and easy to use, providing an efficient no-op design if configuration is incomplete.
The Activity API can return null and must be checked for manually every time a usr wishes to interact with it, which becomes error prone if using the shorthand null check. It also does not support non-string tags (see http status code) which deviates from the spec and will cause propagation issues between language implementations (all other languages allow non-strings).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MikeGoldsmith I appreciate your feedback, it is helpful. So, don't hesitate to send more :-)
Yes, I understand there are some differences between the OT specs and the .NET APIs. We have been working hard to get them close and fill the gaps but at the same time, we are restricted by some other constraints. We had to come with a design that not confuses a lot of developers who currently using the Activity APIs and at the same allow the OT scenarios. Think about it as we are not strictly implementing the OT spec as it is but we are enabling all scenarios that OT can do.
We'll continue to look at the feedback and try to figure out how we can help more with that. Thanks again for your feedback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't mean my comments to come across so negative
No worries at all @MikeGoldsmith, your initial feedback was straightforward and the objections were grounded. It was fine feedback : )
I guess my main issue is that it is significantly different in usage compared to the OpenTelemetry specification. The span & tracer interfaces are designed to be robust and easy to use, providing an efficient no-op design if configuration is incomplete
I agree, it is different. From my perspective deciding to reuse Activity APIs and integrate into the platform rather than creating a competing API has a lot of advantages, but one disadvantage is that it creates additional constraints on the design. In this case OT's premise is that it is easy and cheap to create a no-op Span so that nobody has to do a null check. We considered a no-op Activity, but didn't like the options:
- If the no-op Activity behaves like a normal Activity then it is stateful and we can't use a global singleton. This means we are paying for allocations and all the normal initialization. Some of the work unfortunately isn't that cheap.
- If the no-op Activity doesn't behave like a normal Activity we can make it cheap, but now we've created some risk that pre-existing code assets had assumptions about how Activity works which are invalidated. For example maybe the existing code assumes that properties can be read back after they are set, or that the timestamp of a started activity isn't 0.
So we ended at option 3, which was let the user see the null result. Our hope is that compiler nullable reference analysis makes it obvious when you've done the wrong thing, or if you are using an older compiler then code snippets, docs, and testing. Certainly its possible that this will be more problematic than we estimated so getting this kind of feedback is important (thanks!). So far we haven't heard much concern raised around this, but our overall feedback volume is probably still low.
It also does not support non-string tags
This one should probably be separated into its own issue. @cijothomas @reyang - have you guys experimented much with tags that would be non-string types in other languages? Certainly I see some potential for friction here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one should probably be separated into its own issue. @cijothomas @reyang - have you guys experimented much with tags that would be non-string types in other languages? Certainly I see some potential for friction here.
Yep. @cijothomas @tarekgh and I were chatting about this topic earlier today. We do see a good value of having non-string types.
} | ||
} | ||
|
||
parent?.AddTag("http.status_code", "200"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any pending work to support non-string values as activity tags?
{ | ||
child?.AddTag("child location", "child location"); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to be a real pain, many many people will forgot the null check.
@@ -0,0 +1,76 @@ | |||
// <copyright file="OpenTelemetrySdk1.cs" company="OpenTelemetry Authors"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// <copyright file="OpenTelemetrySdk1.cs" company="OpenTelemetry Authors"> | |
// <copyright file="OpenTelemetrySdk.cs" company="OpenTelemetry Authors"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to be a real pain, many many people will forgot the null check
I don't think so. but anyway we'll watch the usage of the API.
Is there any pending work to support non-string values as activity tags?
No. Tags support strings only. OT support numbers/Bool which can still be used as strings.
{ | ||
try | ||
{ | ||
// do not await, just start export |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, without back pressure support we could easily over-produce and not understand where the blockage is. I think we need to at least have a plan for how we'd like to address it, even if the implementation comes at a later point.
I had a short discussion just before in the OTel .NET gitter channel, and I'm copying here as I felt @cijothomas's answers were helpful to me.
|
/// <remarks> | ||
/// Basic implementation only. Most logic from TracerBuilder will be ported here. | ||
/// </remarks> | ||
public static void EnableOpenTelemetry(Action<OpenTelemetryBuilder> configureOpenTelemetryBuilder) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something feels a bit off about OpenTelemetry.Trace.Configuration.OpenTelemetrySdk.EnableOpenTelemetry
. How would that look with both tracing and metrics? Would we need to do...
OpenTelemetry.Trace.Configuration.OpenTelemetrySdk.EnableOpenTelemetry(...);
OpenTelemetry.Metrics.Configuration.OpenTelemetrySdk.EnableOpenTelemetry(...);
I'm thinking it should be more like...
OpenTelemetry.OpenTelemetrySdk
.EnableTracing(...)
.EnableMetrics(...)
Doesn't need to be done as part of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not in this PR:
- We probably want to consider moving Configuration outside Trace/Metrics namespace.
- In Python it is called opentelemetry.sdk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. Something simpler like OpenTelemetry.Configuration or similar.
@cijothomas I left a few comments but LGTM. |
Notes of discussion are here: https://docs.google.com/document/d/1yjjD6aBcLxlRazYrawukDgrhZMObwHARJbB9glWdHj8/edit# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per discussion on the SIG let's merge this PR and follow up on the discussion about the Trace API as a separate issue. Merging it makes it easier to experiment with ActivitySource and provide better feedback and decide about the API.
Agree. Let's give @bruno-garcia and @MikeGoldsmith till tomorrow mid day to comment if there is something blocking |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! We just need to keep in mind the fire and forget bits to address in the future. Thanks @cijothomas
Step1 of moving to use
Activity
to represent OT Spans. This is in alignment with the overall direction (https://medium.com/opentelemetry/opentelemetry-net-sdk-progress-3a63dcdc6cb0). There are some gaps between the OT Span API and .NET Activity API. They are captured in this issue : #675This PR doesn't replace existing Spans, but instead coexists with existing, so everything working currently using Span/Tracer would work as is.
In a subsequent PR, all Span/Tracer would be removed, and replaced with
Activity
,ActivitySource
- This will obviously be a breaking change (and a quite big one) for any existing early adopter customers.Reference to Activity design:
dotnet/designs#98 this is the .NET Activity design proposals PR. This PR is based off on the designs explained in there.
What can be done once this change is in:
ActivityProcessor
andActivityExporter
Example instrumentation in Libraries:
Example application code:
TestConsoleActivity
shows a fully working example.Next steps:
Port most logic of TracerBuilder to OpenTelemetryBuilder.- add batching, multiple pipeline support.
Respect Sampling decisions.
Add Jaeger exporter for Activity (may skip this and jump).
---breaking change now onwards----
Replace Span with Activity - would involve replacing all exporter and adapter code.....
Decide default behavior - should OT subscribe to all ActivitySources? Or selected ones? or All except disabled?