Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PROF-9821] Fix incorrect code provenance due to broken JSON monkey patch #3695

Merged

Conversation

ivoanjo
Copy link
Member

@ivoanjo ivoanjo commented Jun 10, 2024

What does this PR do?

This PR tweaks the
Datadog::Profiling::Collectors::CodeProvenance::Library class so that if it is encoded instance-filed-by-instance-field it results in a correct code provenance JSON file.

Specifically, the path argument is now stored as a paths array, to match what we expect in the JSON file.

Motivation:

We've discovered that in a certain combination of the oj and activesupport libraries, when oj is used to replace the standard library JSON gem, it incorrectly encodes our Library instances instance-field-by-instance-field instead of calling #to_json.

This resulted in incorrect code provenance files, which would be rejected by the backend.

Since the change is trivial (path => paths array), I've opted to change the shape of Library so that it still encodes correctly in the presence of a buggy JSON encoder monkey patch.

Additional Notes:

N/A

How to test the change?

I've added coverage to this issue. I did it in a bit of a roundabout way (using YAML), see details for why, but I claim it's a reasonable proxy for any encoder that encodes field-by-field, including oj.

…atch

**What does this PR do?**

This PR tweaks the
`Datadog::Profiling::Collectors::CodeProvenance::Library` class so that
if it is encoded instance-filed-by-instance-field it results in a
correct code provenance JSON file.

Specifically, the `path` argument is now stored as a `paths` array, to
match what we expect in the JSON file.

**Motivation:**

We've discovered that in a certain combination of the oj and
activesupport libraries, when oj is used to replace the standard library
JSON gem, it incorrectly encodes our `Library` instances
instance-field-by-instance-field instead of calling `#to_json`.

This resulted in incorrect code provenance files, which would be
rejected by the backend.

Since the change is trivial (`path` => `paths` array), I've opted to
change the shape of `Library` so that it still encodes correctly in
the presence of a buggy JSON encoder monkey patch.

**Additional Notes:**

N/A

**How to test the change?**

I've added coverage to this issue. I did it in a bit of a roundabout way
(using YAML), see details for why, but I claim it's a reasonable proxy
for any encoder that encodes field-by-field, including oj.
@ivoanjo ivoanjo requested a review from a team as a code owner June 10, 2024 10:06
@github-actions github-actions bot added the profiling Involves Datadog profiling label Jun 10, 2024
@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.11%. Comparing base (eab4b6e) to head (63efb3e).
Report is 2 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #3695   +/-   ##
=======================================
  Coverage   98.11%   98.11%           
=======================================
  Files        1225     1225           
  Lines       72743    72755   +12     
  Branches     3479     3479           
=======================================
+ Hits        71369    71382   +13     
+ Misses       1374     1373    -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ivoanjo ivoanjo requested a review from r1viollet June 10, 2024 11:22
path: '/example/path/to/datadog/gem',
)

serialized_without_to_json = YAML.dump(instance)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the reasoning here that YAML serialization is performed field by field, same as some JSON libraries would serialize, or that libraries like oj actually use YAML serialization logic? I am guessing it's the former and if so, a comment to this effect in the code would be helpful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! I've added a comment above the spec stating this:

    # So instead in this test we use YAML as an example of an encoder that doesn't use #to_json, and does it
    # field-by-field. Thus if the Library class doesn't match exactly what we want in the output, this test will fail.

@ivoanjo ivoanjo merged commit e086475 into master Jun 11, 2024
167 checks passed
@ivoanjo ivoanjo deleted the ivoanjo/prof-9821-fix-code-provenance-broken-encoder branch June 11, 2024 11:07
@ivoanjo ivoanjo added this to the 2.2.0 milestone Jun 11, 2024
@TonyCTHsu TonyCTHsu mentioned this pull request Jul 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
profiling Involves Datadog profiling
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants