Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Scope not being applied to OpenTelemetry spans in ASP.NET Core #2690

Merged
merged 15 commits into from
Oct 9, 2023

Conversation

jamescrosswell
Copy link
Collaborator

@jamescrosswell jamescrosswell commented Oct 3, 2023

Problem

Fixes #2644... however Transaction.Request.Method was just one of a number of attributes missing that would ordinarily be applied to Transactions from the Scope:

// Apply scope data
var currentScopeAndClient = ScopeManager.GetCurrent();
var scope = currentScopeAndClient.Key;
scope.Evaluate();
scope.Apply(transaction);

SentryMiddleware and SentrySpanProcessor don't share a ScopeAndClientStack

Part of the problem is that OpenTelemetry spans get created as a result of System.Diagnostic events that Otel subscribes to... and those events are received/processed in a different AsyncLocal context to the rest of our middleware pipeline. As such, our SentryMiddleware and our SentrySpanProcessor don't share the same ScopeAndClientStack:

ScopeStackContainer = options.ScopeStackContainer ?? (
options.IsGlobalModeEnabled
? new GlobalScopeStackContainer()
: new AsyncLocalScopeStackContainer());

Scope popped before SentrySpanProcessor.OnEnd is called

Initially I tried to work around the problem above by using a ScopeAndClientStack that was bound to Activity.Current.RootId rather than to the current AsyncLocal context... and that was possible. However it revealed another problem, which is that by the time SentrySpanProcessor.OnEnd is called, the Scope that was pushed by our middleware had already been popped from the ScopeAndClientStack... The fact that Otel is using a System.Diagnostics activity event subscription means that processing of these events happens async (kind of like events on a queue).

Solution

The second of those two problems forced me to come up with a fairly simple solution in the end.. which is to:

  1. Save a clone of the scope from the SentryMiddleware, once the request has finished
  2. Restore that saved scope to the ScopeAndClientStack inside the SentrySpanProcessor prior to finishing and capturing the transaction

if (_options.Instrumenter == Instrumenter.OpenTelemetry && Activity.Current is {} activity)
{
// The middleware pipeline finishes up before the Otel Activity.OnEnd callback is invoked so we need
// so save a copy of the scope that can be restored by our SentrySpanProcessor
hub.ConfigureScope(scope => activity.SetFused(scope));
}

var activityScope = data.GetFused<Scope>();
if (activityScope is Scope savedScope && _hub is IHubEx hub)
{
hub.RestoreScope(savedScope);
}

@jamescrosswell jamescrosswell changed the title WIP: Fix/no scopes for otel Fix: Scope not being applied to OpenTelemetry spans in ASP.NET Core Oct 4, 2023
@jamescrosswell jamescrosswell marked this pull request as ready for review October 4, 2023 06:24
@bitsandfoxes
Copy link
Contributor

bitsandfoxes commented Oct 5, 2023

Nice findings! Thanks for digging through that!
If I understand this correctly the RestoreScope basically pushes a "too soon popped" scope back on the stack so the whole "apply scope to transaction" logic down the line works with GetCurrentScope?
But does that mean that the whole stack gets copied with every request?
You've already found a way to link the activity with the scope. Can we reuse the OtelEnricher or a transaction processor for applying the scope instead?

This seems relevant: #972

src/Sentry/Internal/SentryScopeManager.cs Show resolved Hide resolved
src/Sentry/SentrySdk.cs Outdated Show resolved Hide resolved
src/Sentry/SentrySdk.cs Outdated Show resolved Hide resolved
@jamescrosswell
Copy link
Collaborator Author

If I understand this correctly the RestoreScope basically pushes a "too soon popped" scope back on the stack so the whole "apply scope to transaction" logic down the line works with GetCurrentScope?

Kind of yeah. currently our middleware includes this line... note the using - so dispose will get called on the return value when it falls out of scope.:

using (hub.PushAndLockScope())

That makes sense, for the most part, since the ScopeManager we use in ASP.NET Core is using an AsyncLocalScopeStackContainer... our scope/client stack is designed to be kept "per request" and we're using the AsyncLocal context as a proxy for "per request". That's a fair assumption - I looked at the implementation for IHttpContextAccessor and it also uses AsyncLocal.

All good, until we want to start tracing stuff to do with the request via Diagnostic events that probably get emmited in the context of that AsyncLocal middleware pipeline but don't get received/consumed in that same AsyncLocal context. By the time the OpenTelemetry .NET SDK (and therefore our SentrySpanProcessor) receive those diagnostic events, the ScopeManager no longer holds a reference to that Scope (and it may have been garbage collected).

Looking at what happens in the Dispose method, it looks like the Scope itself doesn't get disposed - rather the scope/client call stack gets reset to it's previous state (which doesn't include the new scope that was cloned/pushed to the top of the stack). So if we retained a reference to that scope, we could prevent it from being garbage collected. Maybe we could avoid having to clone it then - I'll play around and see if I can get that working.

@jamescrosswell jamescrosswell self-assigned this Oct 9, 2023
@jamescrosswell
Copy link
Collaborator Author

But does that mean that the whole stack gets copied with every request?

It was never the whole stack (just the top scope on the stack) but actually the Scope itself doesn't get disposed explicitly, so we can avoid even that (see latest commit). No cloning required then!

@jamescrosswell
Copy link
Collaborator Author

Can we reuse the OtelEnricher or a transaction processor for applying the scope instead?

@bitsandfoxes I'm not sure how easy/practical that would be. We'd essentially have to replicate all of the logic here somehow and execute it in our SpanProcessor... then we probably don't want to run that logic again in the Hub.CaptureTransaction method. Seems like a much more major refactoring, so if we want to do that I reckon we do it outside this bug fix PR.

Copy link
Contributor

@bitsandfoxes bitsandfoxes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@bitsandfoxes bitsandfoxes merged commit 89a50ab into main Oct 9, 2023
6 of 7 checks passed
@bitsandfoxes bitsandfoxes deleted the fix/no-scopes-for-otel branch October 9, 2023 15:16
@github-actions
Copy link
Contributor

github-actions bot commented Oct 9, 2023

Fails
🚫 Please consider adding a changelog entry for the next release.

Instructions and example for changelog

Please add an entry to CHANGELOG.md to the "Unreleased" section. Make sure the entry includes this PR's number.

Example:

## Unreleased

- Scope not being applied to OpenTelemetry spans in ASP.NET Core ([#2690](https://github.com/getsentry/sentry-dotnet/pull/2690))

If none of the above apply, you can opt out of this check by adding #skip-changelog to the PR description.

Generated by 🚫 dangerJS against aa2f820

vaind added a commit that referenced this pull request Oct 17, 2023
Co-authored-by: Ivan Dlugos <dlugos.ivan@gmail.com>
Co-authored-by: James Crosswell <jamescrosswell@users.noreply.github.com>
Co-authored-by: Ivan Dlugos <6349682+vaind@users.noreply.github.com>
Co-authored-by: Stefan Jandl <reg@bitfox.at>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: getsentry-bot <bot@sentry.io>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
fix build on mac m1 (#2702)
Fix: Scope not being applied to OpenTelemetry spans in ASP.NET Core (#2690)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HTTP Method column missing as OTEL attribute doesn't appear to parse
3 participants