Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for tagging profiles with opentelemetry trace identifiers #1568

Closed
wants to merge 10 commits into from
18 changes: 18 additions & 0 deletions Appraisals
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,16 @@ def self.gem_cucumber(version)
end
end

def self.apraise_opentelemetry
appraise 'opentelemetry-pre-1-0' do
gem 'opentelemetry-sdk', '< 1'
end

appraise 'opentelemetry-1-0' do
gem 'opentelemetry-sdk', '>= 1.0.0.rc2'
end
end

if Gem::Version.new(RUBY_VERSION) < Gem::Version.new(Datadog::VERSION::MINIMUM_RUBY_VERSION)
raise NotImplementedError, "Ruby versions < #{Datadog::VERSION::MINIMUM_RUBY_VERSION} are not supported!"
elsif Gem::Version.new('2.1.0') <= Gem::Version.new(RUBY_VERSION) \
Expand Down Expand Up @@ -784,6 +794,8 @@ elsif Gem::Version.new('2.5.0') <= Gem::Version.new(RUBY_VERSION) \
gem 'resque', '>= 2.0'
end

apraise_opentelemetry

(3..5).each { |v| gem_cucumber(v) }

appraise 'contrib' do
Expand Down Expand Up @@ -966,6 +978,8 @@ elsif Gem::Version.new('2.6.0') <= Gem::Version.new(RUBY_VERSION) \
gem 'resque', '>= 2.0'
end

apraise_opentelemetry

(3..5).each { |v| gem_cucumber(v) }

appraise 'contrib' do
Expand Down Expand Up @@ -1149,6 +1163,8 @@ elsif Gem::Version.new('2.7.0') <= Gem::Version.new(RUBY_VERSION) \
gem 'resque', '>= 2.0'
end

apraise_opentelemetry

(3..5).each { |v| gem_cucumber(v) }

appraise 'contrib' do
Expand Down Expand Up @@ -1247,6 +1263,8 @@ elsif Gem::Version.new('3.0.0') <= Gem::Version.new(RUBY_VERSION)
gem 'resque', '>= 2.0'
end

apraise_opentelemetry

(3..5).each { |v| gem_cucumber(v) }

appraise 'contrib' do
Expand Down
25 changes: 24 additions & 1 deletion Rakefile
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ namespace :spec do
RSpec::Core::RakeTask.new(:main) do |t, args|
t.pattern = 'spec/**/*_spec.rb'
t.exclude_pattern = 'spec/**/{contrib,benchmark,redis,opentracer,opentelemetry,auto_instrument}/**/*_spec.rb,'\
' spec/**/auto_instrument_spec.rb'
' spec/**/auto_instrument_spec.rb,spec/**/profiling/**/**opentelemetry**_spec.rb'
t.rspec_opts = args.to_a.join(' ')
end

Expand Down Expand Up @@ -68,6 +68,11 @@ namespace :spec do
t.rspec_opts = args.to_a.join(' ')
end

RSpec::Core::RakeTask.new(:'profiling-opentelemetry') do |t, args|
t.pattern = 'spec/**/profiling/**/**opentelemetry**_spec.rb'
t.rspec_opts = args.to_a.join(' ')
end

RSpec::Core::RakeTask.new(:contrib) do |t, args|
contrib_paths = [
'analytics',
Expand Down Expand Up @@ -628,6 +633,11 @@ task :ci do
declare 'bundle exec appraisal cucumber3 rake spec:cucumber'
declare 'bundle exec appraisal cucumber4 rake spec:cucumber'
declare 'bundle exec appraisal cucumber5 rake spec:cucumber'

# Profiling
declare 'bundle exec appraisal opentelemetry-pre-1-0 rake spec:profiling-opentelemetry'
declare 'bundle exec appraisal opentelemetry-1-0 rake spec:profiling-opentelemetry'

elsif Gem::Version.new('2.6.0') <= Gem::Version.new(RUBY_VERSION) \
&& Gem::Version.new(RUBY_VERSION) < Gem::Version.new('2.7.0')
# Main library
Expand Down Expand Up @@ -726,6 +736,10 @@ task :ci do
declare 'bundle exec appraisal cucumber3 rake spec:cucumber'
declare 'bundle exec appraisal cucumber4 rake spec:cucumber'
declare 'bundle exec appraisal cucumber5 rake spec:cucumber'

# Profiling
declare 'bundle exec appraisal opentelemetry-pre-1-0 rake spec:profiling-opentelemetry'
declare 'bundle exec appraisal opentelemetry-1-0 rake spec:profiling-opentelemetry'
end
elsif Gem::Version.new('2.7.0') <= Gem::Version.new(RUBY_VERSION) \
&& Gem::Version.new(RUBY_VERSION) < Gem::Version.new('3.0.0')
Expand Down Expand Up @@ -824,6 +838,10 @@ task :ci do
declare 'bundle exec appraisal cucumber3 rake spec:cucumber'
declare 'bundle exec appraisal cucumber4 rake spec:cucumber'
declare 'bundle exec appraisal cucumber5 rake spec:cucumber'

# Profiling
declare 'bundle exec appraisal opentelemetry-pre-1-0 rake spec:profiling-opentelemetry'
declare 'bundle exec appraisal opentelemetry-1-0 rake spec:profiling-opentelemetry'
end
elsif Gem::Version.new('3.0.0') <= Gem::Version.new(RUBY_VERSION)
# Main library
Expand Down Expand Up @@ -898,6 +916,11 @@ task :ci do
declare 'bundle exec appraisal cucumber3 rake spec:cucumber'
declare 'bundle exec appraisal cucumber4 rake spec:cucumber'
declare 'bundle exec appraisal cucumber5 rake spec:cucumber'

# Profiling
declare 'bundle exec appraisal opentelemetry-pre-1-0 rake spec:profiling-opentelemetry'
declare 'bundle exec appraisal opentelemetry-1-0 rake spec:profiling-opentelemetry'

end
end
end
Expand Down
22 changes: 22 additions & 0 deletions docs/GettingStarted.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,9 @@ To contribute, check out the [contribution guidelines][contribution docs] and [d
- [For application runtime](#for-application-runtime)
- [OpenTracing](#opentracing)
- [Profiling](#profiling)
- [Troubleshooting](#troubleshooting)
- [Profiling Resque jobs](#profiling-resque-jobs)
- [Linking OpenTelemetry traces with profiles](#linking-opentelemetry-traces-with-profiles)
- [Known issues and suggested configurations](#known-issues-and-suggested-configurations)
- [Payload too large](#payload-too-large)
- [Stack level too deep](#stack-level-too-deep)
Expand Down Expand Up @@ -2491,12 +2494,31 @@ However, additional instrumentation provided by Datadog can be activated alongsi

To get started with profiling, follow the [Profiler Getting Started Guide](https://docs.datadoghq.com/tracing/profiler/getting_started/?code-lang=ruby).

#### Troubleshooting

If you run into issues with profiling, please check the [Profiler Troubleshooting Guide](https://docs.datadoghq.com/tracing/profiler/profiler_troubleshooting/?code-lang=ruby).

#### Profiling Resque jobs

When profiling [Resque](https://github.com/resque/resque) jobs, you should set the `RUN_AT_EXIT_HOOKS=1` option described in the [Resque](https://github.com/resque/resque/blob/v2.0.0/docs/HOOKS.md#worker-hooks) documentation.

Without this flag, profiles for short-lived Resque jobs will not be available as Resque kills worker processes before they have a chance to submit this information.

#### Linking OpenTelemetry traces with profiles

Profiler's support for [Investigating Code Hotspots from Traces](https://docs.datadoghq.com/tracing/profiler/connect_traces_and_profiles)
is also available when using the [OpenTelemetry](https://github.com/open-telemetry/opentelemetry-ruby) Ruby libraries.

To enable this feature, and after following regular steps for enabling profiling,
modify your OpenTelemetry gem configuration to add an additional span processor:

```ruby
OpenTelemetry::SDK.configure do |c|
c.add_span_processor(Datadog::Profiling::Ext::OpenTelemetryTraceLinking.new)
# ... rest of your configuration for using OpenTelemetry
end
```

## Known issues and suggested configurations

### Payload too large
Expand Down
50 changes: 37 additions & 13 deletions docs/ProfilingDevelopment.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,13 @@ Components below live inside <../lib/ddtrace/profiling>:
* `Ext::CThread`: Extension used to enable CPU-time profiling via use of Pthread's `getcpuclockid`.
* `Ext::Forking`: Monkey patches `Kernel#fork`, adding a `Kernel#at_fork` callback mechanism which is used to restore
profiling abilities after the VM forks (such as re-instrumenting the main thread, and restarting profiler threads).
* `Ext::OpenTelemetryTraceLinking`: Helper class to automatically add the runtime id tag to OpenTelemetry traces.
Optionally added by users to their OpenTelemetry gem configuration to enable linking OpenTelemetry traces with profiles.
* `Pprof::*` (in <../lib/ddtrace/profiling/pprof>): Converts samples captured in the `Recorder` into the pprof format.
* `Tasks::Setup`: Takes care of loading our extensions/monkey patches to handle fork() and CPU profiling.
* `Transport::*` (in <../lib/ddtrace/profiling/transport>): Implements transmission of profiling payloads to the Datadog agent
or backend.
* `TraceIdentifiers::*`: Used to retrieve trace id and span id from tracers, to be used to connect traces to profiles.
* `BacktraceLocation`: Entity class used to represent an entry in a stack trace.
* `Buffer`: Bounded buffer used to store profiling events.
* `Exporter`: Writes profiling data to a given transport.
Expand All @@ -41,22 +44,14 @@ flow:
4. The `Setup` task activates our extensions
* `Datadog::Profiling::Ext::Forking`
* `Datadog::Profiling::Ext::CPU`
5. Still inside `Datadog::Components`, the `build_profiler` method then creates and wires up the Profiler:
```ruby
recorder = build_profiler_recorder(settings)
collectors = build_profiler_collectors(settings, recorder)
exporters = build_profiler_exporters(settings)
scheduler = build_profiler_scheduler(settings, recorder, exporters)

Datadog::Profiler.new(collectors, scheduler)
```
5. Still inside `Datadog::Components`, the `build_profiler` method then creates and wires up the Profiler as such:
```asciiflow
+------------+
| Profiler |
+-+--------+-+
| |
v v
+---------+--+ +--+--------+
+-+-------+--+
| |
v v
+---------+--+ +-+---------+
| Collectors | | Scheduler |
+---------+--+ +-+-------+-+
| | |
Expand Down Expand Up @@ -86,3 +81,32 @@ takes care of encoding the data and reporting it to the datadog agent (or to the
## How CPU-time profiling works

**TODO**: Document our pthread-based approach to getting CPU-time for threads.

## How linking of traces to profiles works

The [code hotspots feature](https://docs.datadoghq.com/tracing/profiler/connect_traces_and_profiles) allows users to start
from a trace and then to investigate the profile that corresponds to that trace.

This works in two steps:
1. Linking a trace to the profile that was gathered while it executed
2. Enabling the filtering of a profile to contain only the samples relating to a given trace/span

To link a trace to a profile, we must ensure that both have the same `runtime-id` tag.
This tag is in `Datadog::Runtime::Identity.id` and is automatically added by both the tracer and the profiler to reported
traces/profiles.

(For traces reported using the OpenTelemetry gem, users need to add our `Datadog::Profiling::Ext::OpenTelemetryTraceLinking`
class to their configuration, so that this tag gets added.)

The profiler backend links a trace covering a given time interval to the profiles covering the same time interval,
whenever they share the same `runtime-id`.

To further enable filtering of a profile to show only samples related to a given trace/span, each sample taken by the
profiler is tagged with the trace_id and span_id for the given trace/span.

This is done using the `Datadog::Profiling::TraceIdentifiers::Helper` that retrieves a trace_id and span_id, if
available, from the supported tracers. This helper is called by the `Collectors::Stack` during sampling.

Note that if a given trace executes too fast, it's possible that the profiler will not contain any samples for that
specific trace. Nevertheless, the linking still works and is useful, as it allows users to explore what was going on their
profile at that time, even if they can't filter down to the specific request.
35 changes: 35 additions & 0 deletions gemfiles/jruby_9.2.0.0_opentelemetry_1_0.gemfile

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading