-
Notifications
You must be signed in to change notification settings - Fork 511
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement new POTel span processor #3223
Conversation
* create a new otel context `_SCOPES_KEY` that will hold a tuple of `(curent_scope, isolation_scope)` * the `current_scope` will always be forked (like on every span creation/context update in practice) * note that this is on `attach`, so not on all copy-on-write context object creation but only on apis such as [`trace.use_span`](https://github.com/open-telemetry/opentelemetry-python/blob/ba22b165471bde2037620f2c850ab648a849fbc0/opentelemetry-api/src/opentelemetry/trace/__init__.py#L547) or [`tracer.start_as_current_span`](https://github.com/open-telemetry/opentelemetry-python/blob/ba22b165471bde2037620f2c850ab648a849fbc0/opentelemetry-api/src/opentelemetry/trace/__init__.py#L329) * basically every otel `context` fork corresponds to our `current_scope` fork * the `isolation_scope` currently will not be forked * these will later be updated, for instance when we update our top level scope apis that fork isolation scope, that will also have a corresponding change in this `attach` function
* create a new otel context `_SCOPES_KEY` that will hold a tuple of `(curent_scope, isolation_scope)` * the `current_scope` will always be forked (like on every span creation/context update in practice) * note that this is on `attach`, so not on all copy-on-write context object creation but only on apis such as [`trace.use_span`](https://github.com/open-telemetry/opentelemetry-python/blob/ba22b165471bde2037620f2c850ab648a849fbc0/opentelemetry-api/src/opentelemetry/trace/__init__.py#L547) or [`tracer.start_as_current_span`](https://github.com/open-telemetry/opentelemetry-python/blob/ba22b165471bde2037620f2c850ab648a849fbc0/opentelemetry-api/src/opentelemetry/trace/__init__.py#L329) * basically every otel `context` fork corresponds to our `current_scope` fork * the `isolation_scope` currently will not be forked * these will later be updated, for instance when we update our top level scope apis that fork isolation scope, that will also have a corresponding change in this `attach` function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool ✨
# if have a root span ending, we build a transaction and send it | ||
self._flush_root_span(span) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking what happens if there's a child span that, for whatever reason (weird async cases? wonky instrumentation?), hasn't finished before the root span. Since we're not using on_start
at all, there'll be virtually no record of the span at this point and the parent transaction will be sent. Once the child finishes, we'll run on_end
with the now orphaned span and it'll be saved in _children_spans
, but never sent, since the parent transaction is already gone. So we might have a potential leak there.
Do we need some sort of cleanup of orphaned spans? Should we wait a bit before flushing the transaction to account for child spans possibly ending very close to the parent span, but a bit late? IIRC I've noticed JS also having a small sleep in place.
TBH not sure how much of a real world problem late child spans are, but I can imagine they happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option: Also use on_start
to create some record of the span and then in on_end
, if a transaction comes and we detect that it has unfinished child spans, wait for a grace period to give them time to finish?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some heuristic cleanup logic will be done in a follow up PR yes, I haven't thought through exactly how we'll do it but JS just has a cutoff logic of 5 minutes
# we construct the event from scratch here | ||
# and not use the current Transaction class for easier refactoring |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is cool, haven't thought of completely bypassing this. I'll have to think about the implications for the granular instrumenter if we isolate OTel and our instrumentation like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so for now I'm just creating a raw dict, but I'm thinking eventually of a TypedDict
for Span
too and the current Span/Transaction
classes will basically be completely removed.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## neel/potel/initial-scope-management #3223 +/- ##
=======================================================================
- Coverage 79.27% 78.97% -0.30%
=======================================================================
Files 134 135 +1
Lines 14255 14335 +80
Branches 2990 3009 +19
=======================================================================
+ Hits 11300 11321 +21
- Misses 2107 2170 +63
+ Partials 848 844 -4
|
@sentrivana moved op/description/status logic to utils, but status mapping will be done separately. Maybe @szokeasaurusrex can look into making this logic as similar to JS as possible, but for now this is good enough for this PR. https://github.com/getsentry/sentry-javascript/blob/master/packages/opentelemetry/src/utils/mapStatus.ts |
a626201
to
74f1589
Compare
74f1589
to
2c29711
Compare
* only acts on `on_end` instead of both `on_start/on_end` as before * store children spans in a dict mapping `span_id -> children` * new dict only stores otel span objects and no sentry transaction/span objects so we save a bit of useless memory allocation * I'm not using our current `Transaction/Span` classes at all to build the event because when we add our APIs later, we'll need to rip these out and we also avoid having to deal with the `instrumenter` problem * if we get a root span (without parent), we recursively walk the dict and find the children and package up the transaction event and send it * I didn't do it like JS because I think this way is better * they [group an array of `finished_spans`](https://github.com/getsentry/sentry-javascript/blob/7e298036a21a5658f3eb9ba184165178c48d7ef8/packages/opentelemetry/src/spanExporter.ts#L132) every time a root span ends and I think this uses more cpu than what I did * and the dict like I used it doesn't take more space than the array either * if we get a span with a parent we just update the dict to find the span later * moved the common `is_sentry_span` logic to utils
* Skeletons for new components * Add simple scope management whenever a context is attached * create a new otel context `_SCOPES_KEY` that will hold a tuple of `(curent_scope, isolation_scope)` * the `current_scope` will always be forked (like on every span creation/context update in practice) * note that this is on `attach`, so not on all copy-on-write context object creation but only on apis such as [`trace.use_span`](https://github.com/open-telemetry/opentelemetry-python/blob/ba22b165471bde2037620f2c850ab648a849fbc0/opentelemetry-api/src/opentelemetry/trace/__init__.py#L547) or [`tracer.start_as_current_span`](https://github.com/open-telemetry/opentelemetry-python/blob/ba22b165471bde2037620f2c850ab648a849fbc0/opentelemetry-api/src/opentelemetry/trace/__init__.py#L329) * basically every otel `context` fork corresponds to our `current_scope` fork * the `isolation_scope` currently will not be forked * these will later be updated, for instance when we update our top level scope apis that fork isolation scope, that will also have a corresponding change in this `attach` function * Don't parse DSN twice * wip * Skeletons for new components * Skeletons for new components * Add simple scope management whenever a context is attached * create a new otel context `_SCOPES_KEY` that will hold a tuple of `(curent_scope, isolation_scope)` * the `current_scope` will always be forked (like on every span creation/context update in practice) * note that this is on `attach`, so not on all copy-on-write context object creation but only on apis such as [`trace.use_span`](https://github.com/open-telemetry/opentelemetry-python/blob/ba22b165471bde2037620f2c850ab648a849fbc0/opentelemetry-api/src/opentelemetry/trace/__init__.py#L547) or [`tracer.start_as_current_span`](https://github.com/open-telemetry/opentelemetry-python/blob/ba22b165471bde2037620f2c850ab648a849fbc0/opentelemetry-api/src/opentelemetry/trace/__init__.py#L329) * basically every otel `context` fork corresponds to our `current_scope` fork * the `isolation_scope` currently will not be forked * these will later be updated, for instance when we update our top level scope apis that fork isolation scope, that will also have a corresponding change in this `attach` function * mypy fixes * working span processor * lint * Port over op/description/status extraction * defaultdict * naive impl * wip * fix args * wip * remove extra docs * Add simple scope management whenever a context is attached (#3159) Add simple scope management whenever a context is attached * create a new otel context `_SCOPES_KEY` that will hold a tuple of `(curent_scope, isolation_scope)` * the `current_scope` will always be forked (like on every span creation/context update in practice) * note that this is on `attach`, so not on all copy-on-write context object creation but only on apis such as [`trace.use_span`](https://github.com/open-telemetry/opentelemetry-python/blob/ba22b165471bde2037620f2c850ab648a849fbc0/opentelemetry-api/src/opentelemetry/trace/__init__.py#L547) or [`tracer.start_as_current_span`](https://github.com/open-telemetry/opentelemetry-python/blob/ba22b165471bde2037620f2c850ab648a849fbc0/opentelemetry-api/src/opentelemetry/trace/__init__.py#L329) * basically every otel `context` fork corresponds to our `current_scope` fork * the `isolation_scope` currently will not be forked * these will later be updated, for instance when we update our top level scope apis that fork isolation scope, that will also have a corresponding change in this `attach` function * Implement new POTel span processor (#3223) * only acts on `on_end` instead of both `on_start/on_end` as before * store children spans in a dict mapping `span_id -> children` * new dict only stores otel span objects and no sentry transaction/span objects so we save a bit of useless memory allocation * I'm not using our current `Transaction/Span` classes at all to build the event because when we add our APIs later, we'll need to rip these out and we also avoid having to deal with the `instrumenter` problem * if we get a root span (without parent), we recursively walk the dict and find the children and package up the transaction event and send it * I didn't do it like JS because I think this way is better * they [group an array of `finished_spans`](https://github.com/getsentry/sentry-javascript/blob/7e298036a21a5658f3eb9ba184165178c48d7ef8/packages/opentelemetry/src/spanExporter.ts#L132) every time a root span ends and I think this uses more cpu than what I did * and the dict like I used it doesn't take more space than the array either * if we get a span with a parent we just update the dict to find the span later * moved the common `is_sentry_span` logic to utils * Basic test cases for potel (#3286) * Proxy POTelSpan.set_data to underlying otel span attributes (#3297) * ref(tracing): Simplify backwards-compat code (#3379) With this change, we aim to simplify the backwards-compatibility code for POTel tracing. We do this as follows: - Remove `start_*` functions from `tracing` - Remove unused parameters from `tracing.POTelSpan.__init__`. - Make all parameters to `tracing.POTelSpan.__init__` kwarg-only. - Allow `tracing.POTelSpan.__init__` to accept arbitrary kwargs, which are all ignored, for compatibility with old `Span` interface. - Completely remove `start_inactive_span`, since inactive spans can be created by setting `active=False` when constructing a `POTelSpan`. * New Scope implementation based on OTel Context (#3389) * New `PotelScope` inherits from scope and reads the scope from the otel context key `SENTRY_SCOPES_KEY` * New `isolation_scope` and `new_scope` context managers just use the context manager forking and yield with the scopes living on the above context key * isolation scope forking is done with the `SENTRY_FORK_ISOLATION_SCOPE_KEY` boolean context key * Fix circular imports (#3431) * Random tweaks (#3437) * Origin improvements (#3432) * Tweak OTel timestamp utils (#3436) * Create spans on scope (#3442) * Fill out more property/method stubs (#3441) * Cleanup origin handling and defaults (#3445) * add note to migration guide * Attribute namespace for tags, measurements (#3448) --------- Co-authored-by: Neel Shah <neel.shah@sentry.io> Co-authored-by: Neel Shah <neelshah.sa@gmail.com> Co-authored-by: Daniel Szoke <7881302+szokeasaurusrex@users.noreply.github.com>
on_end
instead of bothon_start/on_end
as beforespan_id -> children
Transaction/Span
classes at all to build the event because when we add our APIs later, we'll need to rip these out and we also avoid having to deal with theinstrumenter
problemfinished_spans
every time a root span ends and I think this uses more cpu than what I didis_sentry_span
logic to utils