-
Notifications
You must be signed in to change notification settings - Fork 287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Propagating SamplingFlags as per W3CTraceContext guidelines. #2156
Comments
Looks good to me. Alternative proposal is backward compatible and addresses the issue. |
Updating traceflags on activity will not fix this issue entirely as we rely on HttpClient and Azure Clients to propagate |
Few workarounds:
|
At the root of the issue is the fact that we are trying to modify TraceFlags for an activity that has already started. This is only an issue for "legacy" instrumentations which creates activity the legacy way (i.e new Activity().Start())., and not an issue for ActivitySource/OpenTelemetry way, as the sampler's are invoked before the activity gets created, and hence TraceFlags are correctly set. Given the fact that this issue is only affecting legacy instrumentations:
|
Adding initial analysis (needs to be validated). Two possible approaches: Both the approaches need accessing TelemetryConfiguration object to get TelemetryProcessorChain
Approach 1:
a. If the first sampling processor is of SamplingTelemetryProcessor which includes Dependency Telemetry
b. If the first sampling processor is of type AdaptiveSamplingTelemetryProcessor which includes Dependency telemetry.
Limitations:
Approach 2:
Limitations:
|
This issue is stale because it has been open 300 days with no activity. Remove stale label or this will be closed in 7 days. Commenting will instruct the bot to automatically remove the label. |
Background:
This SDK has adopted W3CTraceContext recommendation as the default for distributed context propagation. In recent version of .NET Core (3.1 and new), the propagation is done by the libraries (HttpClient, AspNetCore) themselves, and not by this SDK. In older versions, autocollectors in this repo takes care of propagation. Propagation is done with the support of Activity class, whose Activity.Id field corresponds to the W3CTraceContext header "traceparent".
Examples:
For HttpClient in .NET Core 3.1 and newer, the HttpClient library itself adds traceparent header to outbound calls, with
Activity.Current.Id as the value.
For HttpClient in .NET Core 2.1 the
DependencyCollectionTelemetryModule
adds traceparent header to outbound calls, with Activity.Current.Id as the value.For Asp.Net Core 3.1 and newer, the Asp.Net Core framework itself extracts traceparent from incoming request headers, and
starts an Activity with incoming traceparent as its parent.
For Asp.Net Core 2.1, the
RequestCollectionModule
extracts traceparent from incoming request headers, and starts an Activity with incoming traceparent as its parent.Current issue with context propagation and Sampling
Activity.Id (traceparent) is composed of "version-traceid-spanid-traceflags", with the traceflags currently supporting a single flag "Sampled". Depending on the sampling strategy, this flag may/may not be used. For example:, if an application makes "delayed" sampling decision, then it may not propagate the traceflags which indicates its sampling decision. For an application which receives an incoming request with traceflags, it can chose to ignore it for variety of reasons, like security considerations, diff. in load etc.
Application Insights follow a "tail-based" or "delayed" sampling model, where sampling decision is made after collecting the entire telemetry item. For example, a
RequestTelemetry
item is created at first, its fields are populated by the auto collector module, and allTelemetryInitializer
s are run which can further populate/entirch the telemetry. The sampler is run after these, as aTelemetryProcessor
.Because of its delayed sampling model, Application Insights does not respect the incoming trace flags, and does not modify traceflags when propagating to next node.
This causes issues when interoping with other systems, which follow a different algorithm. Due to lack of the support of sampling traceflags, its not possible to coordinate sampling decisions between 2 nodes operating different sampling algorithms
Proposal
Switch ApplicationInsights to head-based sampling. This shift can allow ApplicationInsights to respect the incoming traceflags for its sampling decision, and also allow it to propagate its sampling decision to next node.
However, this is a major behavior change in ApplicationInsights, and can be done only with a major version bump to 3.0
Alternate proposal
Modify Sampled flag, based on the "expected" probability of the telemetry being sampled in.
This involves calculating the samplingscore and comparing it with current_sampling_rate, and adjusting Sampling flag accordingly before the request is sent.
For example:
While Http call is being made, the Activity's TraceId is used to calculate SamplingScore, and compare it with SamplingPercentage at that time. If the score is < samplingpercentage, modify SamplingFlags to Sampled. Propagate traceparent as before.
It is possible that, the actual sampling decision may be to drop this item. This can occur if too many other telemetry items were produced/samplingpercentage got changed in between. As per W3CTrace-Context spec, this is okay.
This is not a breaking change, so can be included in 2.X version itself.
It involves the following:
HttpClient (.NET Core) - Yes
HttpClient (.NET Fw) - Yes
Azure clients - To be investigated.
The text was updated successfully, but these errors were encountered: