Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PROF-5860] Allow new CPU Profiling 2.0 **alpha** profiler to be enabled #2209

Merged
merged 1 commit into from
Aug 17, 2022

Conversation

ivoanjo
Copy link
Member

@ivoanjo ivoanjo commented Aug 5, 2022

The new Ruby profiler, aka "CPU Profiling 2.0", is considered to be alpha state. We do not recommend turning it on.

But! We actually can turn it on now -- by using DD_PROFILING_FORCE_ENABLE_NEW=true.

The rest of the pieces have been put into place in previous PRs.

What does this PR do?:

Add a setting that allows choosing between the "old" profiler codepath, and the new "CPU Profiling 2.0" codepath.

Motivation:

Making it possible to test the new profiler.

Additional Notes

This PR sits atop #2208 because without the component in #2208 the profiler would not be able to be turned on. It is otherwise independent from that change.

How to test the change?:

Here's a simple run of the new profiler:

$ DD_TRACE_DEBUG=true DD_PROFILING_ENABLED=true DD_SERVICE=ivoanjo-testing DD_ENV=staging DD_PROFILING_FORCE_ENABLE_NEW=true bundle exec ddtracerb exec ruby -e sleep
I, [2022-08-05T14:40:38.715368 #24783]  INFO -- ddtrace: [ddtrace] (dd-trace-rb/lib/datadog/profiling/tasks/setup.rb:47:in `check_if_cpu_time_profiling_is_supported') CPU time profiling skipped because native CPU time is not supported: Feature requires Linux; macOS is not supported. Profiles containing 'Wall time' data will still be reported.
D, [2022-08-05T14:40:38.727312 #24783] DEBUG -- ddtrace: [ddtrace] (dd-trace-rb/lib/datadog/core/workers/async.rb:133:in `start_worker') Starting thread for: #<Datadog::Core::Telemetry::Heartbeat:0x00007f825b982080>
D, [2022-08-05T14:40:38.727510 #24783] DEBUG -- ddtrace: [ddtrace] (dd-trace-rb/lib/datadog/core/configuration/components.rb:371:in `startup!') Profiling started
D, [2022-08-05T14:40:38.727647 #24783] DEBUG -- ddtrace: [ddtrace] (dd-trace-rb/lib/datadog/profiling/collectors/cpu_and_wall_time_worker.rb:33:in `block in start') Starting thread for: #<Datadog::Profiling::Collectors::CpuAndWallTimeWorker:0x00007f827b91b950>
D, [2022-08-05T14:40:38.727712 #24783] DEBUG -- ddtrace: [ddtrace] (dd-trace-rb/lib/datadog/core/workers/async.rb:133:in `start_worker') Starting thread for: #<Datadog::Profiling::Scheduler:0x00007f826c07bb60>
D, [2022-08-05T14:41:38.735153 #24783] DEBUG -- ddtrace: [ddtrace] (dd-trace-rb/lib/datadog/profiling/stack_recorder.rb:25:in `serialize') Encoded profile covering 2022-08-05T13:40:38Z to 2022-08-05T13:41:38Z
D, [2022-08-05T14:41:40.534027 #24783] DEBUG -- ddtrace: [ddtrace] (dd-trace-rb/lib/datadog/profiling/http_transport.rb:49:in `export') Successfully reported profiling data

The new Ruby profiler, aka "CPU Profiling 2.0", is considered to be
alpha state. We do not recommend turning it on.

But! We actually can turn it on now -- by using
`DD_PROFILING_FORCE_ENABLE_NEW=true`.

The rest of the pieces have been put into place in previous PRs.
@ivoanjo ivoanjo requested a review from a team August 5, 2022 13:49
Copy link
Member

@marcotc marcotc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ɑ 🥳

Base automatically changed from ivoanjo/prof-5859-sample-triggering to master August 17, 2022 13:39
@ivoanjo ivoanjo merged commit f7fdfb6 into master Aug 17, 2022
@ivoanjo ivoanjo deleted the ivoanjo/prof-5860-new-profiler-configuration branch August 17, 2022 13:40
@github-actions github-actions bot added this to the 0.54.3 milestone Aug 17, 2022
@ivoanjo ivoanjo modified the milestones: 0.54.3, 1.4.0 Aug 19, 2022
ivoanjo added a commit that referenced this pull request Oct 11, 2022
**What does this PR do?**:

(Important note: This feature is only available on the new CPU
Profiling 2.0 profiler which is still in **alpha**, see #2209)

This PR is the last piece of the puzzle started in #2304 and #2308.
With this change, time (both cpu-time and wall-time) spent by threads
doing garbage collection is now accounted for, and shows up in the
flamegraph.

This works by creating a new Ruby VM TracePoint to handle the
`RUBY_INTERNAL_EVENT_GC_ENTER` and `RUBY_INTERNAL_EVENT_GC_EXIT`
events. (These events are only available at the C-level;
Ruby-level Tracepoints cannot use them).

Then, whenever Ruby calls the TracePoint, we call the previously-added
`cpu_and_wall_time_collector_on_gc_start` and
`cpu_and_wall_time_collector_on_gc_finish` to track the time spent
in GC, and then insert it as a sample in the profiling output.

**Motivation**:

Without this work, time spent doing garbage collection is invisible
and blamed on methods directly. By making it visible, we enable
customers to make better informed decisions on what needs to be
optimized (or fixed!).

**Additional Notes**:

As I mentioned below, this only affects the new CPU Profiling 2.0
profiler codepath.

During development, I initially attempted to compare the time spent
in GC gathered via the TracePoint to the one that Ruby exposes via
[`GC::Profiler#total_time`](https://rubyapi.org/3.1/o/gc/profiler).
The results were off and when I looked into why, I discovered by
looking at Ruby's `gc_start` function in `gc.c` that the time
tracking for `GC::Profiler` only covers a subset of the GC process,
not the entire time.

Thus, when comparing values, expect the `GC::Profiler` to be way
less than the values exposed by the profiler.

I belive the approach and values recorded by the profiler are more
accurate since Ruby calls the `ENTER` event quite early in the
GC process, and `EXIT` quite late. See also
<ivoanjo/gvl-tracing#6> for a similar
approach proposed by a Ruby core developer.

**How to test the change?**:

TODO
ivoanjo added a commit that referenced this pull request Oct 11, 2022
**What does this PR do?**:

(Important note: This feature is only available on the new CPU
Profiling 2.0 profiler which is still in **alpha**, see #2209)

This PR is the last piece of the puzzle started in #2304 and #2308.
With this change, time (both cpu-time and wall-time) spent by threads
doing garbage collection is now accounted for, and shows up in the
flamegraph.

This works by creating a new Ruby VM TracePoint to handle the
`RUBY_INTERNAL_EVENT_GC_ENTER` and `RUBY_INTERNAL_EVENT_GC_EXIT`
events. (These events are only available at the C-level;
Ruby-level Tracepoints cannot use them).

Then, whenever Ruby calls the TracePoint, we call the previously-added
`cpu_and_wall_time_collector_on_gc_start` and
`cpu_and_wall_time_collector_on_gc_finish` to track the time spent
in GC, and then insert it as a sample in the profiling output.

**Motivation**:

Without this work, time spent doing garbage collection is invisible
and blamed on methods directly. By making it visible, we enable
customers to make better informed decisions on what needs to be
optimized (or fixed!).

**Additional Notes**:

As I mentioned below, this only affects the new CPU Profiling 2.0
profiler codepath.

During development, I initially attempted to compare the time spent
in GC gathered via the TracePoint to the one that Ruby exposes via
[`GC::Profiler#total_time`](https://rubyapi.org/3.1/o/gc/profiler).
The results were off and when I looked into why, I discovered by
looking at Ruby's `gc_start` function in `gc.c` that the time
tracking for `GC::Profiler` only covers a subset of the GC process,
not the entire time.

Furthermore, from what I was able to observe, it accounts for
cpu-time process-wide on Linux, and in some cases (macos),
only cpu-time in usermode, so that's another reason for
discrepancies.

Thus, when comparing values, expect the `GC::Profiler` to be way
less than the values exposed by the profiler.

I belive the approach and values recorded by the profiler are more
accurate since Ruby calls the `ENTER` event quite early in the
GC process, and `EXIT` quite late. See also
<ivoanjo/gvl-tracing#6> for a similar
approach proposed by a Ruby core developer.

**How to test the change?**:

TODO
ivoanjo added a commit that referenced this pull request Oct 26, 2022
**What does this PR do?**:

(Important note: This feature is only available on the new CPU
Profiling 2.0 profiler which is still in **alpha**, see #2209)

This PR is the last piece of the puzzle started in #2304 and #2323.
With this change, time (both cpu-time and wall-time) spent by threads
doing garbage collection is now accounted for, and shows up in the
flamegraph.

This works by creating a new Ruby VM TracePoint to handle the
`RUBY_INTERNAL_EVENT_GC_ENTER` and `RUBY_INTERNAL_EVENT_GC_EXIT`
events. (These events are only available at the C-level;
Ruby-level Tracepoints cannot use them).

Then, whenever Ruby calls the TracePoint, we call the previously-added
`cpu_and_wall_time_collector_on_gc_start` and
`cpu_and_wall_time_collector_on_gc_finish` to track the time spent
in GC, and then insert it as a sample in the profiling output
(using `cpu_and_wall_time_collector_sample_after_gc`).

**Motivation**:

Without this work, time spent doing garbage collection is invisible
and blamed on methods directly. By making it visible, we enable
customers to make better informed decisions on what needs to be
optimized (or fixed!).

**Additional Notes**:

As I mentioned below, this only affects the new CPU Profiling 2.0
profiler codepath.

During development, I initially attempted to compare the time spent
in GC gathered via the TracePoint to the one that Ruby exposes via
[`GC::Profiler#total_time`](https://rubyapi.org/3.1/o/gc/profiler).
The results were off and when I looked into why, I discovered by
looking at Ruby's `gc_start` function in `gc.c` that the time
tracking for `GC::Profiler` only covers a subset of the GC process,
not the entire time.

Furthermore, from what I was able to observe, it accounts for
cpu-time process-wide on Linux, and in some cases (macos),
only cpu-time in usermode, so that's another reason for
discrepancies.

Thus, when comparing values, expect the `GC::Profiler` to be way
less than the values exposed by the profiler.

I belive the tracepoint-based approach and values recorded by the
profiler to be more accurate since Ruby calls the `ENTER` event
quite early in the GC process, and `EXIT` quite late. See also
<ivoanjo/gvl-tracing#6> for a similar
approach proposed by a Ruby core developer.

**How to test the change?**:

Beyond the included code coverage, you should see
`Garbage Collection` frames representing time spent in GC
in flamegraphs (when using the new CPU Profiling 2.0
codepath, see above).

Here's a tiny Ruby script that triggers a lot of allocation:

```ruby
def do_alloc(n)
  alloc(n)
end

def alloc(n)
  n.times do
    Object.new
  end
end

while true
  do_alloc(100_000)
end
```
ivoanjo added a commit that referenced this pull request Oct 26, 2022
**What does this PR do?**:

(Important note: This fix only affects the new CPU Profiling 2.0
profiler which is still in **alpha**, see #2209)

In #2181 we added a concurrency-safe mechanism for `StackRecorder`.
Key to how it works, is that we keep to `ddog_Profile` instances
at every point in time, and we alternate between using them.

But this alternation introduced a bug with the `start_time` of
profiles. Previously, the `start_time` got set whenever a profile got
created or reset, and we used the current time for it.

That caused the following effect:

```
t=0
  created slot_one_profile (start_time=0)
  created slow_two_profile (start_time=0)
  make_active slot_one_profile

t=60
  serialized slot_one_profile (start_time=0, finish_time=60)
  reset slot_one_profile (start_time=60)
  make_active slot_two_profile

t=120
  serialized slot_two_profile (start_time=0, finish_time=120)
  reset slot_two_profile (start_time=120)
  make_active slot_one_profile

t=180
  serialized slot_one_profile (start_time=60, finish_time=180)
  reset slow_one_profile (start_time=180)
  make_active slot_two_profile
```

That is, other than the first profile (which is why we previously
missed this bug), every profile got double the duration as intended,
because we reset it after serialization, but that profile would
not be used for the next period.

To fix this issue, we additionally change the "make_active" step
above (actually implemented in
`serializer_flip_active_and_inactive_slots`) to set the correct
`start_time` on the profile that becomes active.

Thus, we get the correct behavior:

```
t=0
  created slot_one_profile (start_time=0)
  created slow_two_profile (start_time=0) # Ignored, will be changed later
  make_active slot_one_profile (start_time=0)

t=60
  serialized slot_one_profile (start_time=0, finish_time=60)
  reset slot_one_profile (start_time=60) # Ignored, will be changed later
  make_active slot_two_profile (start_time=60) # Correct start_time

t=120
  serialized slot_two_profile (start_time=60, finish_time=120)
  reset slot_two_profile (start_time=120) # Ignored, will be changed later
  make_active slot_one_profile (start_time=120) # Correct start_time

t=180
  serialized slot_one_profile (start_time=120, finish_time=180)
  reset slow_one_profile (start_time=180)
  make_active slot_two_profile (start_time=180)
```

**Motivation**:

Having profiles with the wrong duration breaks profile aggregation.

**Additional notes**:

As I mentioned above, this only affects the new CPU Profiling 2.0
profiler codepath, so I don't expect any customers to have ever ran
into this issue.

**How to test the change?**:

Change includes coverage. Furthermore, running the profiler with
`DD_TRACE_DEBUG` prints the start/finish timestamps after
serialization, which can be used to confirm they are correct.

Finally, comparing profiles before/after in the profiler UX will
also show the difference.
ivoanjo added a commit that referenced this pull request Oct 26, 2022
**What does this PR do?**:

(Important note: This fix only affects the new CPU Profiling 2.0
profiler which is still in **alpha**, see #2209)

In #2181 we added a concurrency-safe mechanism for `StackRecorder`.
Key to how it works, is that we keep two `ddog_Profile` instances
at every point in time, and we alternate between using them.

But this alternation introduced a bug with the `start_time` of
profiles. Previously, the `start_time` got set whenever a profile got
created or reset, and we used the current time for it.

That caused the following effect:

```
t=0
  created slot_one_profile (start_time=0)
  created slow_two_profile (start_time=0)
  make_active slot_one_profile

t=60
  serialized slot_one_profile (start_time=0, finish_time=60)
  reset slot_one_profile (start_time=60)
  make_active slot_two_profile

t=120
  serialized slot_two_profile (start_time=0, finish_time=120)
  reset slot_two_profile (start_time=120)
  make_active slot_one_profile

t=180
  serialized slot_one_profile (start_time=60, finish_time=180)
  reset slow_one_profile (start_time=180)
  make_active slot_two_profile
```

That is, other than the first profile (which is why we previously
missed this bug), every profile got double the duration as intended,
because we reset it after serialization, but that profile would
not be used for the next period.

To fix this issue, we additionally change the "make_active" step
above (actually implemented in
`serializer_flip_active_and_inactive_slots`) to set the correct
`start_time` on the profile that becomes active.

Thus, we get the correct behavior:

```
t=0
  created slot_one_profile (start_time=0)
  created slow_two_profile (start_time=0) # Ignored, will be changed later
  make_active slot_one_profile (start_time=0)

t=60
  serialized slot_one_profile (start_time=0, finish_time=60)
  reset slot_one_profile (start_time=60) # Ignored, will be changed later
  make_active slot_two_profile (start_time=60) # Correct start_time

t=120
  serialized slot_two_profile (start_time=60, finish_time=120)
  reset slot_two_profile (start_time=120) # Ignored, will be changed later
  make_active slot_one_profile (start_time=120) # Correct start_time

t=180
  serialized slot_one_profile (start_time=120, finish_time=180)
  reset slow_one_profile (start_time=180)
  make_active slot_two_profile (start_time=180)
```

**Motivation**:

Having profiles with the wrong duration breaks profile aggregation.

**Additional notes**:

As I mentioned above, this only affects the new CPU Profiling 2.0
profiler codepath, so I don't expect any customers to have ever ran
into this issue.

**How to test the change?**:

Change includes coverage. Furthermore, running the profiler with
`DD_TRACE_DEBUG` prints the start/finish timestamps after
serialization, which can be used to confirm they are correct.

Finally, comparing profiles before/after in the profiler UX will
also show the difference.
ivoanjo added a commit that referenced this pull request Oct 26, 2022
**What does this PR do?**:

(Important note: This fix only affects the new CPU Profiling 2.0
profiler which is still in **alpha**, see #2209)

In #2181 we added a concurrency-safe mechanism for `StackRecorder`.
Key to how it works, is that we keep two `ddog_Profile` instances
at every point in time, and we alternate between using them.

But this alternation introduced a bug with the `start_time` of
profiles. Previously, the `start_time` got set whenever a profile got
created or reset, and we used the current time for it.

That caused the following effect:

```
t=0
  created slot_one_profile (start_time=0)
  created slow_two_profile (start_time=0)
  make_active slot_one_profile

t=60
  serialized slot_one_profile (start_time=0, finish_time=60)
  reset slot_one_profile (start_time=60)
  make_active slot_two_profile

t=120
  serialized slot_two_profile (start_time=0, finish_time=120)
  reset slot_two_profile (start_time=120)
  make_active slot_one_profile

t=180
  serialized slot_one_profile (start_time=60, finish_time=180)
  reset slow_one_profile (start_time=180)
  make_active slot_two_profile
```

That is, other than the first profile (which is why we previously
missed this bug), every profile got double the duration as intended,
because we reset it after serialization, but that profile would
not be used for the next period.

To fix this issue, we additionally change the "make_active" step
above (actually implemented in
`serializer_flip_active_and_inactive_slots`) to set the correct
`start_time` on the profile that becomes active.

Thus, we get the correct behavior:

```
t=0
  created slot_one_profile (start_time=0)
  created slow_two_profile (start_time=0) # Ignored, will be changed later
  make_active slot_one_profile (start_time=0)

t=60
  serialized slot_one_profile (start_time=0, finish_time=60)
  reset slot_one_profile (start_time=60) # Ignored, will be changed later
  make_active slot_two_profile (start_time=60) # Correct start_time

t=120
  serialized slot_two_profile (start_time=60, finish_time=120)
  reset slot_two_profile (start_time=120) # Ignored, will be changed later
  make_active slot_one_profile (start_time=120) # Correct start_time

t=180
  serialized slot_one_profile (start_time=120, finish_time=180)
  reset slow_one_profile (start_time=180)
  make_active slot_two_profile (start_time=180)
```

**Motivation**:

Having profiles with the wrong duration breaks profile aggregation.

**Additional notes**:

As I mentioned above, this only affects the new CPU Profiling 2.0
profiler codepath, so I don't expect any customers to have ever ran
into this issue.

**How to test the change?**:

Change includes coverage. Furthermore, running the profiler with
`DD_TRACE_DEBUG` prints the start/finish timestamps after
serialization, which can be used to confirm they are correct.

Finally, comparing profiles before/after in the profiler UX will
also show the difference.
ivoanjo added a commit that referenced this pull request Oct 28, 2022
**What does this PR do?**:

(Important note: This fix only affects the new CPU Profiling 2.0
profiler which is still in **alpha**, see #2209)

In #2181 we added a concurrency-safe mechanism for `StackRecorder`.
Key to how it works, is that we keep two `ddog_Profile` instances
at every point in time, and we alternate between using them.

But this alternation introduced a bug with the `start_time` of
profiles. Previously, the `start_time` got set whenever a profile got
created or reset, and we used the current time for it.

That caused the following effect:

```
t=0
  created slot_one_profile (start_time=0)
  created slow_two_profile (start_time=0)
  make_active slot_one_profile

t=60
  serialized slot_one_profile (start_time=0, finish_time=60)
  reset slot_one_profile (start_time=60)
  make_active slot_two_profile

t=120
  serialized slot_two_profile (start_time=0, finish_time=120)
  reset slot_two_profile (start_time=120)
  make_active slot_one_profile

t=180
  serialized slot_one_profile (start_time=60, finish_time=180)
  reset slow_one_profile (start_time=180)
  make_active slot_two_profile
```

That is, other than the first profile (which is why we previously
missed this bug), every profile got double the duration as intended,
because we reset it after serialization, but that profile would
not be used for the next period.

To fix this issue, we additionally change the "make_active" step
above (actually implemented in
`serializer_flip_active_and_inactive_slots`) to set the correct
`start_time` on the profile that becomes active.

Thus, we get the correct behavior:

```
t=0
  created slot_one_profile (start_time=0)
  created slow_two_profile (start_time=0) # Ignored, will be changed later
  make_active slot_one_profile (start_time=0)

t=60
  serialized slot_one_profile (start_time=0, finish_time=60)
  reset slot_one_profile (start_time=60) # Ignored, will be changed later
  make_active slot_two_profile (start_time=60) # Correct start_time

t=120
  serialized slot_two_profile (start_time=60, finish_time=120)
  reset slot_two_profile (start_time=120) # Ignored, will be changed later
  make_active slot_one_profile (start_time=120) # Correct start_time

t=180
  serialized slot_one_profile (start_time=120, finish_time=180)
  reset slow_one_profile (start_time=180)
  make_active slot_two_profile (start_time=180)
```

**Motivation**:

Having profiles with the wrong duration breaks profile aggregation.

**Additional notes**:

As I mentioned above, this only affects the new CPU Profiling 2.0
profiler codepath, so I don't expect any customers to have ever ran
into this issue.

**How to test the change?**:

Change includes coverage. Furthermore, running the profiler with
`DD_TRACE_DEBUG` prints the start/finish timestamps after
serialization, which can be used to confirm they are correct.

Finally, comparing profiles before/after in the profiler UX will
also show the difference.
ivoanjo added a commit that referenced this pull request Oct 28, 2022
**What does this PR do?**:

(Important note: This feature is only available on the new CPU
Profiling 2.0 profiler which is still in **alpha**, see #2209)

This PR is the last piece of the puzzle started in #2304 and #2323.
With this change, time (both cpu-time and wall-time) spent by threads
doing garbage collection is now accounted for, and shows up in the
flamegraph.

This works by creating a new Ruby VM TracePoint to handle the
`RUBY_INTERNAL_EVENT_GC_ENTER` and `RUBY_INTERNAL_EVENT_GC_EXIT`
events. (These events are only available at the C-level;
Ruby-level Tracepoints cannot use them).

Then, whenever Ruby calls the TracePoint, we call the previously-added
`cpu_and_wall_time_collector_on_gc_start` and
`cpu_and_wall_time_collector_on_gc_finish` to track the time spent
in GC, and then insert it as a sample in the profiling output
(using `cpu_and_wall_time_collector_sample_after_gc`).

**Motivation**:

Without this work, time spent doing garbage collection is invisible
and blamed on methods directly. By making it visible, we enable
customers to make better informed decisions on what needs to be
optimized (or fixed!).

**Additional Notes**:

As I mentioned below, this only affects the new CPU Profiling 2.0
profiler codepath.

During development, I initially attempted to compare the time spent
in GC gathered via the TracePoint to the one that Ruby exposes via
[`GC::Profiler#total_time`](https://rubyapi.org/3.1/o/gc/profiler).
The results were off and when I looked into why, I discovered by
looking at Ruby's `gc_start` function in `gc.c` that the time
tracking for `GC::Profiler` only covers a subset of the GC process,
not the entire time.

Furthermore, from what I was able to observe, it accounts for
cpu-time process-wide on Linux, and in some cases (macos),
only cpu-time in usermode, so that's another reason for
discrepancies.

Thus, when comparing values, expect the `GC::Profiler` to be way
less than the values exposed by the profiler.

I belive the tracepoint-based approach and values recorded by the
profiler to be more accurate since Ruby calls the `ENTER` event
quite early in the GC process, and `EXIT` quite late. See also
<ivoanjo/gvl-tracing#6> for a similar
approach proposed by a Ruby core developer.

**How to test the change?**:

Beyond the included code coverage, you should see
`Garbage Collection` frames representing time spent in GC
in flamegraphs (when using the new CPU Profiling 2.0
codepath, see above).

Here's a tiny Ruby script that triggers a lot of allocation:

```ruby
def do_alloc(n)
  alloc(n)
end

def alloc(n)
  n.times do
    Object.new
  end
end

while true
  do_alloc(100_000)
end
```
ivoanjo added a commit that referenced this pull request Nov 2, 2022
**What does this PR do?**:

This PR extends the `thread id` label that is attached to profiling
samples to include the thread's "native thread id".

Up until now, the `thread id` that we attached was the result of
calling `Thread#object_id`. That approach allows us to easily match
what we see inside the Ruby process (because we can log the
`Thread#object_id` where something happens) with the profile.

BUT if we wanted to match the profile with what the operating system's
tooling showed, it was quite hard, because threads shown in the
OS task manager tools show an OS-level thread identifier, which is
not the same as the `Thread#object_id`.

To enable this matching, this PR changes the format of the
`thread id` from `"#{thread.object_id}"` to
`"#{thread.native_object_id} (#{thread.object_id})"` thus providing
both identifers. (This is OK with the profiling backend).

Because `Thread#native_object_id` is a Ruby 3.1+ feature, on older
Rubies we use a fallback identifier instead (`pthread_t`). This
identifier isn't as useful as the native identifier; in the future
we may want to explore a better fallback.

**Motivation**:

I found this helpful during development and when correlating
data from profiler with other tools.

**Additional Notes**:

This includes full support for macOS, even though we don't
officially support macOS.

This feature is only available on the new CPU Profiling 2.0
profiler which is still in alpha, see #2209.

**How to test the change?**:

This is easiest to test by profiling a Ruby app, and then checking
that the native ids for threads match up with what shows up on
`/proc`. (Remember, only for >= 3.1, for older Rubies it won't match.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants