DEBUG-2334 Dynamic Instrumentation code tracker component #3942

p-datadog · 2024-09-24T13:51:06Z

What does this PR do?

Adds the code tracker component. This is responsible for tracking the mapping from source file path to RubyVM::InstructionSequence object used for setting targeted trace points.

Motivation:
Efficient instrumentation of lines.

Additional Notes:
There will be further functionality added to CodeTracker later to instrument loaded code (requires the instrumentation component that hasn't been PR'ed yet).

How to test the change?
Unit tests at this time.

Unsure? Have a question? Request a review!

lib/datadog/di/code_tracker.rb

spec/datadog/di/code_tracker_spec.rb

pr-commenter · 2024-09-24T14:28:14Z

Benchmarks

Benchmark execution time: 2024-10-01 18:12:17

Comparing candidate commit 72d5102 in PR branch di-code-tracker with baseline commit a877d83 in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 23 metrics, 2 unstable metrics.

use Hash and Mutex instead

lib/datadog/di/code_tracker.rb

spec/datadog/di/code_tracker_spec.rb

Strech

I have a small feedback, but nothing critical to say

spec/datadog/di/code_tracker_spec.rb

lib/datadog/di/code_tracker.rb

ivoanjo · 2024-09-25T12:55:42Z

lib/datadog/di/code_tracker.rb

+        compiled_trace_point = TracePoint.trace(:script_compiled) do |tp|
+          # Useful attributes of the trace point object here:
+          # .instruction_sequence
+          # .method_id
+          # .path (refers to the code location that called the require/eval/etc.,
+          #   not where the loaded code is; use .path on the instruction sequence
+          #   to obtain the location of the compiled code)
+          # .eval_script
+          #
+          # For now just map the path to the instruction sequence.
+          path = tp.instruction_sequence.path
+          registry_lock.synchronize do
+            registry[path] = tp.instruction_sequence
+          end
+        end


Does script_compiled get emitted for each individual method in a file?

No, it should be emitted once per file.

Wait, in that case, how do we know the correct iseq to target, if 1 file has N iseqs? (Or I may be misunderstanding how this works?)

There is one iseq per file.

Interesting... I think my mental model was slightly off for this one.

As you pointed out taking for instance

# test.rb def a puts "a!" end def b puts "b!" end

and doing

[1] pry(main)> iseq = RubyVM::InstructionSequence.compile(File.read("test.rb")) [3] pry(main)> puts RubyVM::InstructionSequence.disasm(iseq) == disasm: #<ISeq:<compiled>@<compiled>:1 (1,0)-(8,3)> (catch: FALSE) 0000 definemethod :a, a ( 2)[Li] 0003 definemethod :b, b ( 6)[Li] 0006 putobject :b 0008 leave == disasm: #<ISeq:a@<compiled>:2 (2,0)-(4,3)> (catch: FALSE) 0000 putself ( 3)[LiCa] 0001 putstring "a!" 0003 opt_send_without_block <calldata!mid:puts, argc:1, FCALL|ARGS_SIMPLE> 0005 leave ( 4)[Re] == disasm: #<ISeq:b@<compiled>:6 (6,0)-(8,3)> (catch: FALSE) 0000 putself ( 7)[LiCa] 0001 putstring "b!" 0003 opt_send_without_block <calldata!mid:puts, argc:1, FCALL|ARGS_SIMPLE> 0005 leave ( 8)[Re] => nil

e.g. will mean we get it all in one go.

Yet... it seems there's still separate objects for the different iseqs -- there's a RubyVM::InstructionSequence#each_child and the object ids of the objects it returns are different from the top-level iseq so they seem like separate objects arranged in a tree, not one object that's presenting different views of itself.

That said, for the usage we'll be making in DI, maybe this extra distinction doesn't matter very much? And I learned something new :)

ivoanjo · 2024-09-25T12:59:38Z

lib/datadog/di/code_tracker.rb

+          # disable our trace point and do nothing.
+          if @compiled_trace_point
+            # Disable the local variable, leave instance variable as it is.
+            compiled_trace_point.disable


Is it me or was the tracepoint was not enabled before we disable it?

TracePoint.trace enables the trace point. .new does not enable. I agree it can be confusing at first glance.

Aaaaah that's super subtle, missed it! May be worth adding a comment or using an explicit enable to make it easier on the readers? (but just a suggestion)

Added a note.

lib/datadog/di/code_tracker.rb

Co-authored-by: Sergey Fedorov <oni.strech@gmail.com>

Co-authored-by: datadog-datadog-prod-us1[bot] <88084959+datadog-datadog-prod-us1[bot]@users.noreply.github.com>

Co-authored-by: Sergey Fedorov <oni.strech@gmail.com>

…tions

…by 2.5

lib/datadog/di/code_tracker.rb

codecov-commenter · 2024-09-27T18:48:10Z

Codecov Report

Attention: Patch coverage is 96.92308% with 4 lines in your changes missing coverage. Please review.

Project coverage is 97.87%. Comparing base (a877d83) to head (72d5102).

Files with missing lines	Patch %	Lines
lib/datadog/di/code_tracker.rb	89.74%	4 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #3942      +/-   ##
==========================================
- Coverage   97.87%   97.87%   -0.01%     
==========================================
  Files        1305     1313       +8     
  Lines       78224    78352     +128     
  Branches     3876     3886      +10     
==========================================
+ Hits        76559    76684     +125     
- Misses       1665     1668       +3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Co-authored-by: Sergey Fedorov <oni.strech@gmail.com>

lib/datadog/di/code_tracker.rb

p-datadog · 2024-09-30T17:31:37Z

I don't know why coverage for the 3 lines is missing. Those lines are exercised by the tests but perhaps they are not counted due to being under a trace point.

lib/datadog/di/code_tracker.rb

ivoanjo

👍 Left a few final notes/suggestions/questions, but overall it LGTM

ivoanjo · 2024-10-01T09:09:01Z

lib/datadog/di/code_tracker.rb

+            path = tp.instruction_sequence.path
+            registry_lock.synchronize do
+              registry[path] = tp.instruction_sequence
+            end


I wonder if there's situations where the path will be nil? e.g. does this tracepoint only fire on actual files, or would it catch evals and other things where the path may not exist? 🤔

The trace point is indeed called for eval'd code, but path will not be nil here because it is synthesized to be (eval at <file>:<line>) in these cases. There is also .absolute_path which is in fact nil for eval'd code, I changed to using this and filter out eval'd iseqs since I do not see how DI will be able to target such code by file & line.

👍 nice, looks good!

ivoanjo · 2024-10-01T09:19:24Z

lib/datadog/di/code_tracker.rb

+    # Tracks loaded Ruby code by source file and maintains a map from
+    # source file to the loaded code (instruction sequences).
+    # Also arranges for code in the loaded files to be instrumented by
+    # line probes that have already been received by the library.
+    #
+    # The loaded code is used to target line trace points when installing
+    # line probes which dramatically improves efficiency of line trace points.
+    #
+    # Note that, since most files will only be loaded one time (via the
+    # "require" mechanism), the code tracker needs to be global and not be
+    # recreated when the DI component is created.
+    #
+    # @api private
+    class CodeTracker


Might be worth documenting what parts of this class are concurrent and why.

e.g.If I understand it well (and I'm speculating/reverse engineering), the concurrency in registry is because iseqs_for_path may be concurrent with the tracepoint, but usually there will be no concurrency between multiple invocations for iseqs_for_path (presumably because remote configuration apply is sequential).

(I don't quite understand when there can be concurrency in start/stop; given that usually the dd-trace-rb components system takes care of enforcing that starting and stopping things is a sequential operation as well)

start & stop should not be running at the same time. start/stop could (I suppose) be running while the trace point is invoked. Since starting and stopping should happen once and never in normal usage, I reused the mutexes.

I'll admit that if you confirm that start/stop should not be concurrent then I don't quite understand what trace_point_lock is protecting 🤔

I saw you setup a meeting to discuss this, so let's continue the discussion there. In any case, I don't think the discussion should be a blocker to merging this PR, as we can always tweak this in a later PR.

Strech

well done, my points are resolved 👍🏼

spec/datadog/di/code_tracker_spec.rb

Co-authored-by: datadog-datadog-prod-us1[bot] <88084959+datadog-datadog-prod-us1[bot]@users.noreply.github.com>

* master: DEBUG-2334 Dynamic Instrumentation code tracker component (DataDog#3942) Fix typo in cleanup step Add Rails 4.2 to system-tests GH workflow Add all supported weblogs to system-tests GH Workflow file

DEBUG-2334 Dynamic Instrumentation code tracker component

bcf3991

p-datadog requested a review from a team as a code owner September 24, 2024 13:51

datadog-datadog-prod-us1 bot reviewed Sep 24, 2024

View reviewed changes

lib/datadog/di/code_tracker.rb Outdated Show resolved Hide resolved

spec/datadog/di/code_tracker_spec.rb Show resolved Hide resolved

Remove Concurrent::Map usage to avoid dependency on concurrent-ruby

12b3d63

use Hash and Mutex instead

datadog-datadog-prod-us1 bot reviewed Sep 24, 2024

View reviewed changes

lib/datadog/di/code_tracker.rb Outdated Show resolved Hide resolved

lib/datadog/di/code_tracker.rb Outdated Show resolved Hide resolved

lib/datadog/di/code_tracker.rb Outdated Show resolved Hide resolved

marcotc reviewed Sep 24, 2024

View reviewed changes

lib/datadog/di/code_tracker.rb Show resolved Hide resolved

marcotc reviewed Sep 24, 2024

View reviewed changes

lib/datadog/di/code_tracker.rb Show resolved Hide resolved

p-datadog mentioned this pull request Sep 25, 2024

DEBUG-2334 upgrade steep & rbs #3950

Merged

y9v approved these changes Sep 25, 2024

View reviewed changes

spec/datadog/di/code_tracker_spec.rb Show resolved Hide resolved

Strech reviewed Sep 25, 2024

View reviewed changes

spec/datadog/di/code_tracker_spec.rb Outdated Show resolved Hide resolved

spec/datadog/di/code_tracker_spec.rb Outdated Show resolved Hide resolved

spec/datadog/di/code_tracker_spec.rb Outdated Show resolved Hide resolved

lib/datadog/di/code_tracker.rb Outdated Show resolved Hide resolved

ivoanjo reviewed Sep 25, 2024

View reviewed changes

Strech reviewed Sep 25, 2024

View reviewed changes

lib/datadog/di/code_tracker.rb Outdated Show resolved Hide resolved

p-datadog and others added 10 commits September 25, 2024 10:56

Update spec/datadog/di/code_tracker_spec.rb

657f8bf

Co-authored-by: Sergey Fedorov <oni.strech@gmail.com>

Update lib/datadog/di/code_tracker.rb

ed08f40

Co-authored-by: Sergey Fedorov <oni.strech@gmail.com>

Update lib/datadog/di/code_tracker.rb

5e25e1f

Co-authored-by: datadog-datadog-prod-us1[bot] <88084959+datadog-datadog-prod-us1[bot]@users.noreply.github.com>

Update lib/datadog/di/code_tracker.rb

7e5066d

Co-authored-by: datadog-datadog-prod-us1[bot] <88084959+datadog-datadog-prod-us1[bot]@users.noreply.github.com>

Update spec/datadog/di/code_tracker_spec.rb

125d6e4

Co-authored-by: Sergey Fedorov <oni.strech@gmail.com>

fix registry lock type

2e1de86

start docstring

66be4eb

put entire start method under trace point lock

dd15c2e

Merge branch 'master' into di-code-tracker

cd766bb

use patched rbs to get RubyVM::InstructionSequence method type defini…

17952c4

…tions

p-datadog force-pushed the di-code-tracker branch from f65506a to 17952c4 Compare September 25, 2024 17:00

p added 2 commits September 25, 2024 13:08

mark tests as di tests because needed trace points do not exist on ru…

3b39340

…by 2.5

add spec helper require

ba20d12

Strech reviewed Sep 26, 2024

View reviewed changes

lib/datadog/di/code_tracker.rb Outdated Show resolved Hide resolved

p-datadog and others added 3 commits September 26, 2024 09:44

Merge branch 'master' into di-code-tracker

5b1f118

skip DI tests on ruby 2.5

ec3426e

Merge branch 'master' into di-code-tracker

f120f8f

standard

ff0d80c

Update lib/datadog/di/code_tracker.rb

95c976b

Co-authored-by: Sergey Fedorov <oni.strech@gmail.com>

datadog-datadog-prod-us1 bot reviewed Sep 30, 2024

View reviewed changes

lib/datadog/di/code_tracker.rb Outdated Show resolved Hide resolved

p added 3 commits September 30, 2024 10:40

standard

cef3c4f

delete constructor test

ed60efa

rbs patch was accepted, use latest rbs release

54d04aa

note .trace enables trace point

f1dd12d

datadog-datadog-prod-us1 bot reviewed Oct 1, 2024

View reviewed changes

lib/datadog/di/code_tracker.rb Show resolved Hide resolved

ivoanjo approved these changes Oct 1, 2024

View reviewed changes

Strech approved these changes Oct 1, 2024

View reviewed changes

p added 2 commits October 1, 2024 12:45

break out load & require test cases

7a8b12d

do not track evaled code

6882a74

datadog-datadog-prod-us1 bot reviewed Oct 1, 2024

View reviewed changes

spec/datadog/di/code_tracker_spec.rb Outdated Show resolved Hide resolved

spec/datadog/di/code_tracker_spec.rb Show resolved Hide resolved

p-datadog and others added 9 commits October 1, 2024 12:53

Update spec/datadog/di/code_tracker_spec.rb

c985b2b

Co-authored-by: datadog-datadog-prod-us1[bot] <88084959+datadog-datadog-prod-us1[bot]@users.noreply.github.com>

standard

b8ce73c

separate eval test cases with and without explicit location

d75b446

document eval path situation more

4cae1cc

standard

c77a571

Merge branch 'master' into di-code-tracker

96c94eb

init instance var to squelch ruby 2.6 warning

e2ac427

fix on ruby 3.0 and earlier

0093e62

steep

72d5102

p-datadog merged commit 1f1afc1 into master Oct 2, 2024
209 checks passed

p-datadog deleted the di-code-tracker branch October 2, 2024 12:54

github-actions bot added this to the 2.4.0 milestone Oct 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DEBUG-2334 Dynamic Instrumentation code tracker component #3942

DEBUG-2334 Dynamic Instrumentation code tracker component #3942

p-datadog commented Sep 24, 2024

pr-commenter bot commented Sep 24, 2024 •

edited

Loading

Strech left a comment

ivoanjo Sep 25, 2024

p-datadog Sep 25, 2024

ivoanjo Sep 25, 2024

p-datadog Sep 25, 2024

ivoanjo Sep 30, 2024

ivoanjo Sep 25, 2024

p-datadog Sep 25, 2024

ivoanjo Sep 30, 2024

p-datadog Oct 1, 2024

codecov-commenter commented Sep 27, 2024 •

edited

Loading

p-datadog commented Sep 30, 2024

ivoanjo left a comment

ivoanjo Oct 1, 2024

p-datadog Oct 1, 2024

ivoanjo Oct 2, 2024

ivoanjo Oct 1, 2024

p-datadog Oct 1, 2024

ivoanjo Oct 2, 2024

Strech left a comment

DEBUG-2334 Dynamic Instrumentation code tracker component #3942

DEBUG-2334 Dynamic Instrumentation code tracker component #3942

Conversation

p-datadog commented Sep 24, 2024

pr-commenter bot commented Sep 24, 2024 • edited Loading

Benchmarks

Strech left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-commenter commented Sep 27, 2024 • edited Loading

Codecov Report

p-datadog commented Sep 30, 2024

ivoanjo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Strech left a comment

Choose a reason for hiding this comment

pr-commenter bot commented Sep 24, 2024 •

edited

Loading

codecov-commenter commented Sep 27, 2024 •

edited

Loading