Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor reading of trace identifiers in profiler #1591

Merged
merged 4 commits into from
Jul 14, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions docs/GettingStarted.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,8 @@ To contribute, check out the [contribution guidelines][contribution docs] and [d
- [For application runtime](#for-application-runtime)
- [OpenTracing](#opentracing)
- [Profiling](#profiling)
- [Troubleshooting](#troubleshooting)
- [Profiling Resque jobs](#profiling-resque-jobs)
- [Known issues and suggested configurations](#known-issues-and-suggested-configurations)
- [Payload too large](#payload-too-large)
- [Stack level too deep](#stack-level-too-deep)
Expand Down Expand Up @@ -2491,6 +2493,10 @@ However, additional instrumentation provided by Datadog can be activated alongsi

To get started with profiling, follow the [Profiler Getting Started Guide](https://docs.datadoghq.com/tracing/profiler/getting_started/?code-lang=ruby).

#### Troubleshooting

If you run into issues with profiling, please check the [Profiler Troubleshooting Guide](https://docs.datadoghq.com/tracing/profiler/profiler_troubleshooting/?code-lang=ruby).

#### Profiling Resque jobs

When profiling [Resque](https://github.com/resque/resque) jobs, you should set the `RUN_AT_EXIT_HOOKS=1` option described in the [Resque](https://github.com/resque/resque/blob/v2.0.0/docs/HOOKS.md#worker-hooks) documentation.
Expand Down
45 changes: 32 additions & 13 deletions docs/ProfilingDevelopment.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ Components below live inside <../lib/ddtrace/profiling>:
* `Tasks::Setup`: Takes care of loading our extensions/monkey patches to handle fork() and CPU profiling.
* `Transport::*` (in <../lib/ddtrace/profiling/transport>): Implements transmission of profiling payloads to the Datadog agent
or backend.
* `TraceIdentifiers::*`: Used to retrieve trace id and span id from tracers, to be used to connect traces to profiles.
* `BacktraceLocation`: Entity class used to represent an entry in a stack trace.
* `Buffer`: Bounded buffer used to store profiling events.
* `Exporter`: Writes profiling data to a given transport.
Expand All @@ -41,22 +42,14 @@ flow:
4. The `Setup` task activates our extensions
* `Datadog::Profiling::Ext::Forking`
* `Datadog::Profiling::Ext::CPU`
5. Still inside `Datadog::Components`, the `build_profiler` method then creates and wires up the Profiler:
```ruby
recorder = build_profiler_recorder(settings)
collectors = build_profiler_collectors(settings, recorder)
exporters = build_profiler_exporters(settings)
scheduler = build_profiler_scheduler(settings, recorder, exporters)

Datadog::Profiler.new(collectors, scheduler)
```
5. Still inside `Datadog::Components`, the `build_profiler` method then creates and wires up the Profiler as such:
```asciiflow
+------------+
| Profiler |
+-+--------+-+
| |
v v
+---------+--+ +--+--------+
+-+-------+--+
| |
v v
+---------+--+ +-+---------+
| Collectors | | Scheduler |
+---------+--+ +-+-------+-+
| | |
Expand Down Expand Up @@ -86,3 +79,29 @@ takes care of encoding the data and reporting it to the datadog agent (or to the
## How CPU-time profiling works

**TODO**: Document our pthread-based approach to getting CPU-time for threads.

## How linking of traces to profiles works

The [code hotspots feature](https://docs.datadoghq.com/tracing/profiler/connect_traces_and_profiles) allows users to start
from a trace and then to investigate the profile that corresponds to that trace.

This works in two steps:
1. Linking a trace to the profile that was gathered while it executed
2. Enabling the filtering of a profile to contain only the samples relating to a given trace/span

To link a trace to a profile, we must ensure that both have the same `runtime-id` tag.
This tag is in `Datadog::Runtime::Identity.id` and is automatically added by both the tracer and the profiler to reported
traces/profiles.

The profiler backend links a trace covering a given time interval to the profiles covering the same time interval,
whenever they share the same `runtime-id`.

To further enable filtering of a profile to show only samples related to a given trace/span, each sample taken by the
profiler is tagged with the trace_id and span_id for the given trace/span.

This is done using the `Datadog::Profiling::TraceIdentifiers::Helper` that retrieves a trace_id and span_id, if
available, from the supported tracers. This helper is called by the `Collectors::Stack` during sampling.

Note that if a given trace executes too fast, it's possible that the profiler will not contain any samples for that
specific trace. Nevertheless, the linking still works and is useful, as it allows users to explore what was going on their
profile at that time, even if they can't filter down to the specific request.
11 changes: 7 additions & 4 deletions lib/ddtrace/configuration/components.rb
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ def build_tracer(settings, agent_settings)
tracer
end

def build_profiler(settings, agent_settings)
def build_profiler(settings, agent_settings, tracer)
return unless Datadog::Profiling.supported? && settings.profiling.enabled

unless defined?(Datadog::Profiling::Tasks::Setup)
Expand Down Expand Up @@ -107,8 +107,10 @@ def build_profiler(settings, agent_settings)

# NOTE: Please update the Initialization section of ProfilingDevelopment.md with any changes to this method

trace_identifiers_helper = Datadog::Profiling::TraceIdentifiers::Helper.new(tracer: tracer)

recorder = build_profiler_recorder(settings)
collectors = build_profiler_collectors(settings, recorder)
collectors = build_profiler_collectors(settings, recorder, trace_identifiers_helper)
exporters = build_profiler_exporters(settings, agent_settings)
scheduler = build_profiler_scheduler(settings, recorder, exporters)

Expand Down Expand Up @@ -164,10 +166,11 @@ def build_profiler_recorder(settings)
Datadog::Profiling::Recorder.new(event_classes, settings.profiling.max_events)
end

def build_profiler_collectors(settings, recorder)
def build_profiler_collectors(settings, recorder, trace_identifiers_helper)
[
Datadog::Profiling::Collectors::Stack.new(
recorder,
trace_identifiers_helper: trace_identifiers_helper,
max_frames: settings.profiling.max_frames
# TODO: Provide proc that identifies Datadog worker threads?
# ignore_thread: settings.profiling.ignore_profiler
Expand Down Expand Up @@ -209,7 +212,7 @@ def initialize(settings)
@tracer = self.class.build_tracer(settings, agent_settings)

# Profiler
@profiler = self.class.build_profiler(settings, agent_settings)
@profiler = self.class.build_profiler(settings, agent_settings, @tracer)

# Runtime metrics
@runtime_metrics = self.class.build_runtime_metrics_worker(settings)
Expand Down
1 change: 1 addition & 0 deletions lib/ddtrace/profiling.rb
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,7 @@ def self.load_profiling
require 'ddtrace/profiling/transport/http'
require 'ddtrace/profiling/profiler'
require 'ddtrace/profiling/native_extension'
require 'ddtrace/profiling/trace_identifiers/helper'
require 'ddtrace/profiling/pprof/pprof_pb'

true
Expand Down
17 changes: 5 additions & 12 deletions lib/ddtrace/profiling/collectors/stack.rb
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,7 @@ module Collectors
# Collects stack trace samples from Ruby threads for both CPU-time (if available) and wall-clock.
# Runs on its own background thread.
#
# rubocop:disable Metrics/ClassLength
class Stack < Worker
class Stack < Worker # rubocop:disable Metrics/ClassLength
include Workers::Polling

DEFAULT_MAX_TIME_USAGE_PCT = 2.0
Expand All @@ -22,13 +21,15 @@ class Stack < Worker
attr_reader \
:recorder,
:max_frames,
:trace_identifiers_helper,
:ignore_thread,
:max_time_usage_pct,
:thread_api

def initialize(
recorder,
max_frames:,
trace_identifiers_helper:, # Usually an instance of Datadog::Profiling::TraceIdentifiers::Helper
ignore_thread: nil,
max_time_usage_pct: DEFAULT_MAX_TIME_USAGE_PCT,
thread_api: Thread,
Expand All @@ -38,6 +39,7 @@ def initialize(
)
@recorder = recorder
@max_frames = max_frames
@trace_identifiers_helper = trace_identifiers_helper
@ignore_thread = ignore_thread
@max_time_usage_pct = max_time_usage_pct
@thread_api = thread_api
Expand Down Expand Up @@ -123,7 +125,7 @@ def collect_thread_event(thread, wall_time_interval_ns)
locations = convert_backtrace_locations(locations)

thread_id = thread.respond_to?(:pthread_thread_id) ? thread.pthread_thread_id : thread.object_id
trace_id, span_id = get_trace_identifiers(thread)
trace_id, span_id = trace_identifiers_helper.trace_identifiers_for(thread)
cpu_time = get_cpu_time_interval!(thread)

Events::StackSample.new(
Expand Down Expand Up @@ -162,14 +164,6 @@ def get_cpu_time_interval!(thread)
interval
end

def get_trace_identifiers(thread)
return unless thread.is_a?(::Thread)
return unless Datadog.respond_to?(:tracer) && Datadog.tracer.respond_to?(:active_correlation)

identifier = Datadog.tracer.active_correlation(thread)
[identifier.trace_id, identifier.span_id]
end

def compute_wait_time(used_time)
# We took used_time to get the last sample.
#
Expand Down Expand Up @@ -256,7 +250,6 @@ def reset_cpu_time_tracking
end
end
end
# rubocop:enable Metrics/ClassLength
end
end
end
25 changes: 25 additions & 0 deletions lib/ddtrace/profiling/trace_identifiers/ddtrace.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# frozen_string_literal: true

module Datadog
module Profiling
module TraceIdentifiers
# Used by Datadog::Profiling::TraceIdentifiers::Helper to get the trace identifiers (trace id and span id) for a
# given thread, if there is an active trace for that thread in Datadog.tracer.
class Ddtrace
def initialize(tracer: nil)
@tracer = (tracer if tracer.respond_to?(:active_correlation))
end

def trace_identifiers_for(thread)
return unless @tracer

correlation = @tracer.active_correlation(thread)
trace_id = correlation.trace_id
span_id = correlation.span_id

[trace_id, span_id] if trace_id && trace_id != 0 && span_id && span_id != 0
end
end
end
end
end
34 changes: 34 additions & 0 deletions lib/ddtrace/profiling/trace_identifiers/helper.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# frozen_string_literal: true

require 'ddtrace/profiling/trace_identifiers/ddtrace'

module Datadog
module Profiling
module TraceIdentifiers
# Helper used to retrieve the trace identifiers (trace id and span id) for a given thread,
# if there is an active trace for that thread for the supported tracing APIs.
#
# This data is used to connect profiles to the traces -- samples in a profile will be tagged with this data and
# the profile can be filtered down to look at only the samples for a given trace.
class Helper
DEFAULT_SUPPORTED_APIS = [
::Datadog::Profiling::TraceIdentifiers::Ddtrace
].freeze
private_constant :DEFAULT_SUPPORTED_APIS

def initialize(tracer:, supported_apis: DEFAULT_SUPPORTED_APIS.map { |api| api.new(tracer: tracer) })
@supported_apis = supported_apis
end

def trace_identifiers_for(thread)
@supported_apis.each do |api|
trace_identifiers = api.trace_identifiers_for(thread)
return trace_identifiers unless trace_identifiers.nil?
end

nil
end
end
end
end
end
5 changes: 3 additions & 2 deletions spec/ddtrace/configuration/components_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@
.and_return(tracer)

expect(described_class).to receive(:build_profiler)
.with(settings, instance_of(Datadog::Configuration::AgentSettingsResolver::AgentSettings))
.with(settings, instance_of(Datadog::Configuration::AgentSettingsResolver::AgentSettings), tracer)
.and_return(profiler)

expect(described_class).to receive(:build_runtime_metrics_worker)
Expand Down Expand Up @@ -666,8 +666,9 @@
describe '::build_profiler' do
let(:agent_settings) { Datadog::Configuration::AgentSettingsResolver.call(settings, logger: nil) }
let(:profiler) { build_profiler }
let(:tracer) { instance_double(Datadog::Tracer) }

subject(:build_profiler) { described_class.build_profiler(settings, agent_settings) }
subject(:build_profiler) { described_class.build_profiler(settings, agent_settings, tracer) }

context 'when profiling is not supported' do
before { allow(Datadog::Profiling).to receive(:supported?).and_return(false) }
Expand Down
Loading