-
Notifications
You must be signed in to change notification settings - Fork 377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PROF-10679] Add preview support for correlating profiling with otel ruby gem #3984
[PROF-10679] Add preview support for correlating profiling with otel ruby gem #3984
Conversation
…tel gem Things missing: * Specs conflict with ddtrace otel specs (need to poke at appraisals) * Missing endpoint support
While we don't need the actual span object to read the span ids, we will need it to read the endpoint names.
… specs I'm... unhappy about this, but couldn't think of anything better that wouldn't involve refactoring the ddtrace tracing otel support and that seems even worse.
Sigh old rubies...
This setting is off by default for now, while we're still previewing the feature.
Could not make it use the more specific signature, unfortunately.
...otherwise this is a confusing sharp edge that I myself just ran into.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #3984 +/- ##
=========================================
Coverage ? 97.84%
=========================================
Files ? 1314
Lines ? 78701
Branches ? 3909
=========================================
Hits ? 77004
Misses ? 1697
Partials ? 0 ☔ View full report in Codecov by Sentry. |
BenchmarksBenchmark execution time: 2024-10-09 15:39:55 Comparing candidate commit 9c40c0e in PR branch Found 0 performance improvements and 1 performance regressions! Performance is the same for 22 metrics, 2 unstable metrics. scenario:profiler - sample timeline=false
|
expect(@t1_span_id.to_i).to be > 0 | ||
expect(@t1_local_root_span_id.to_i).to be > 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
matter of personal preference probably, but this would also work:
expect(@t1_span_id.to_i).to be_positive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a very strong reason, but I like the > 0 as being more explicit if zero is included or not.
In Ruby, 0 is neither positive nor negative... But you can still get a few weirdnesses:
[14] pry(main)> 0.0
=> 0.0
[15] pry(main)> -0.0
=> -0.0
[16] pry(main)> (0.0).object_id
=> -9223372036854775806
[17] pry(main)> (-0.0).object_id
=> 2660
[19] pry(main)> [0.0, -0.0].map(&:positive?)
=> [false, false]
[20] pry(main)> [0.0, -0.0].map(&:negative?)
=> [false, false]
so IDK, I feel like > 0 is ultra-clear while positive may leave you wondering :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right, I didn't know about -0.0.negative?
thing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it makes sense for -0.0
have negative? => false
because -0.0 == 0.0
but yea... computers 🤣
if (context_storage == Qnil || !RB_TYPE_P(context_storage, T_ARRAY)) return; | ||
|
||
VALUE otel_current_span_key = get_otel_current_span_key(state); | ||
if (otel_current_span_key == Qfalse) return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Noting that all the other short-circuiting ifs are essentially of the form if (xxx == Qnil)
. Only otel_current_span_key
behaves differently. I can imagine some time in the future when this code is less fresh in our minds this could bite us back. Granted tests hopefully catch it but perhaps not a wild idea to flip the values we use as not-found and not-resolved markers so all short-circuiting ifs can be of the == Qnil
form?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VALUE span_context = span_and_context.span_context; | ||
if (span_context == Qnil) return; | ||
|
||
VALUE span_id = rb_ivar_get(span_context, at_span_id_id /* @span_id */); | ||
VALUE span_trace_id = rb_ivar_get(span_context, at_trace_id_id /* @trace_id */); | ||
if (span_id == Qnil || span_trace_id == Qnil || !RB_TYPE_P(span_id, T_STRING) || !RB_TYPE_P(span_trace_id, T_STRING)) return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every time you call otel_span_and_context_from
you end up doing these things (refer to lines 1655-1661 above).
Could we not include all of these into otel_span_and_context_from
? It seems like it'd improve readability and maintainability a fair bit? Would also lead otel_span
to be something like { .span_id, .trace_id, .span }
which also feels like a simpler API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
VALUE span_id = rb_ivar_get(span_context, at_span_id_id /* @span_id */); | ||
VALUE span_trace_id = rb_ivar_get(span_context, at_trace_id_id /* @trace_id */); | ||
if (span_id == Qnil || span_trace_id == Qnil || !RB_TYPE_P(span_id, T_STRING) || !RB_TYPE_P(span_trace_id, T_STRING)) return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: Giving up on the whole thing at this point seems wasteful.
I realize this is part of your strategy of assuming any unexpected thing meaning we may not be operating under the same constraints we considered when we first added this. But at this point we know we have a valid active_span and may have already found a chain of multiple valid spans.
Would it be that dangerous to break
rather than return
here and just carry on, taking the last valid span in the chain as the root span?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm it's a reasonable point.
My thinking is twofold:
-
If we get the wrong local root span id then I think we'd be sending incorrect data downstream. I can imagine us trying to investigate "hey, why does this span not show up correctly when 'span and children' is selected, but if I select it directly it works?" and how hard it would be to track it down to us being lenient here.
-
If this class gets changed, then potentially we won't be able to extract data from any instance. It seems unlikely (but not impossible, I'll admit), that we'll have some instances with the data, but not others. So I'm not sure if in practice this "mixed situation would happen"
...so I'm not entirely convinced it's worth trying to be lenient here 🤔
when 'both' | ||
:both | ||
when 'only' | ||
:only | ||
else | ||
false | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: I was wondering if this was a style choice or if there's some limitation of the settings system that prevents us from using :both
, :only
and false
directly as the settings' normalized values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks folks! As we want to get this out asap, I'll merge the PR as-is and open a follow-up PR to address your feedback. |
…etry context This PR documents the setting introduced in version 2.4.0 of the Datadog Ruby library (DataDog/dd-trace-rb#3984) to have the Ruby profiler directly read from the opentelemetry context. This allows customers that want to use the opentelemetry Ruby library together with profiling to get the full profiler feature set. (I've also gone ahead and did some small tweaks to the text: Suggest using the latest version in the config example; Suggest using Ruby 3.1+ in the requirements).
What does this PR do?
This PR picks up from where #3510 left off and adds preview support for correlating profiling data with traces/spans being tracked by the otel ruby gem (including endpoint profiling). (And specifically this PR just adds a few things on top of #3510, but does not otherwise change it significantly).
Note that this differs from the recommended way of using opentelemetry/otel with dd-trace-rb, which is to follow the instructions from https://docs.datadoghq.com/tracing/trace_collection/custom_instrumentation/otel_instrumentation/ruby/ .
The key difference is -- this PR makes code hotspots work even for setups that opt to not use
require 'datadog/opentelemetry'
(which we still recommend as the easier path).The approach taken here is similar to #2342 and #3466: we peek inside the implementation of the otel gem to extract the information we need (namely the span id, local root span id, trace type, and trace endpoint). This approach is potentially brittle, which is why the code is written very defensively, with the aim of never breaking the application (or profiling) if something is off -- it just won't collect code hotspots.
This feature is off by default. This PR adds a new setting to control this feature:
DD_PROFILING_PREVIEW_OTEL_CONTEXT_ENABLED
-> set toonly
/both
/false
/true
)c.profiling.advanced.preview_otel_context_enabled =
-> set to"only"
/"both"
/true
/false
Setting this feature to
both
means that the profiler will automatically pick up context from either dd-trace-rb tracing or otel library; whereas setting it toonly
means that the profiler will skip dd-trace-rb tracing entirely and only read context from the otel library.(This was suggested by @AlexJF in #3510 (comment), and also doubles as a way to raise confidence in releasing this since it allows us to have it off by default)
Motivation:
We have a customer interested in running this setup, so hopefully they'll be able to start using this feature.
Additional Notes:
For this feature to work, otel traces need to be tagged with the dd-trace-rb runtime-id. See below for an example of how to do this.
If the runtime-id is not set up, it will not be possible to see profiles scoped to individual requests in the datadog UX.
How to test the change?
On top of the added test coverage, I was able to see code hotspots working for the following sinatra example app:
And here's how it looks:
and