From 63efb3ed6cc114de913a6bc16edf07154431bbed Mon Sep 17 00:00:00 2001 From: Ivo Anjo Date: Mon, 10 Jun 2024 10:58:59 +0100 Subject: [PATCH] [PROF-9821] Fix incorrect code provenance due to broken JSON monkey patch **What does this PR do?** This PR tweaks the `Datadog::Profiling::Collectors::CodeProvenance::Library` class so that if it is encoded instance-filed-by-instance-field it results in a correct code provenance JSON file. Specifically, the `path` argument is now stored as a `paths` array, to match what we expect in the JSON file. **Motivation:** We've discovered that in a certain combination of the oj and activesupport libraries, when oj is used to replace the standard library JSON gem, it incorrectly encodes our `Library` instances instance-field-by-instance-field instead of calling `#to_json`. This resulted in incorrect code provenance files, which would be rejected by the backend. Since the change is trivial (`path` => `paths` array), I've opted to change the shape of `Library` so that it still encodes correctly in the presence of a buggy JSON encoder monkey patch. **Additional Notes:** N/A **How to test the change?** I've added coverage to this issue. I did it in a bit of a roundabout way (using YAML), see details for why, but I claim it's a reasonable proxy for any encoder that encodes field-by-field, including oj. --- .../profiling/collectors/code_provenance.rb | 17 ++++- .../collectors/code_provenance_spec.rb | 72 +++++++++++++++++++ 2 files changed, 86 insertions(+), 3 deletions(-) diff --git a/lib/datadog/profiling/collectors/code_provenance.rb b/lib/datadog/profiling/collectors/code_provenance.rb index 73f13d34d8d..239a5b4e990 100644 --- a/lib/datadog/profiling/collectors/code_provenance.rb +++ b/lib/datadog/profiling/collectors/code_provenance.rb @@ -98,19 +98,30 @@ def record_loaded_files(loaded_files) end # Represents metadata we have for a ruby gem + # + # Important note: This class gets encoded to JSON with the built-in JSON gem. But, we've found that in some + # buggy cases, some Ruby gems monkey patch the built-in JSON gem and forget to call #to_json, and instead + # encode this class instance-field-by-instance-field. + # + # Thus, this class was setup to match the JSON output. Take this into consideration if you are adding new + # fields. (Also, we have a spec for this) class Library - attr_reader :kind, :name, :version, :path + attr_reader :kind, :name, :version def initialize(kind:, name:, version:, path:) @kind = kind.freeze @name = name.dup.freeze @version = version.to_s.dup.freeze - @path = path.dup.freeze + @paths = [path.dup.freeze].freeze freeze end def to_json(arg = nil) - { kind: @kind, name: @name, version: @version, paths: [@path] }.to_json(arg) + { kind: @kind, name: @name, version: @version, paths: @paths }.to_json(arg) + end + + def path + @paths.first end end end diff --git a/spec/datadog/profiling/collectors/code_provenance_spec.rb b/spec/datadog/profiling/collectors/code_provenance_spec.rb index 85544df3a9d..08ec41371f5 100644 --- a/spec/datadog/profiling/collectors/code_provenance_spec.rb +++ b/spec/datadog/profiling/collectors/code_provenance_spec.rb @@ -1,5 +1,6 @@ require 'datadog/profiling/collectors/code_provenance' require 'json-schema' +require 'yaml' RSpec.describe Datadog::Profiling::Collectors::CodeProvenance do subject(:code_provenance) { described_class.new } @@ -187,5 +188,76 @@ it 'renders the list of loaded libraries using the expected schema' do JSON::Validator.validate!(code_provenance_schema, code_provenance.generate_json) end + + # In PROF-9821 we run into an issue where some versions of OJ + activesupport + monkey patching the JSON gem + # would result in our Library instance being encoded instance-field-by-instance-field instead of by calling #to_json. + # + # This would obviously result in broken code provenance files. To fix this, we've adjusted the class to make sure + # that if you serialize it field-by-field, you still get a correct result. + # + # Reproducing this exact issue in CI is really annoying -- because it would be one more set of appraisails we'd run + # just to reproduce it and test. + # + # So instead in this test we use YAML as an example of an encoder that doesn't use #to_json, and does it + # field-by-field. Thus if the Library class doesn't match exactly what we want in the output, this test will fail. + # + # In case you want to reproduce the exact JSON issue, here's a reproducer: + # ````ruby + # require 'bundler/inline' + # + # gemfile do + # source 'https://rubygems.org' + # gem 'activesupport', '= 5.0.7.2' + # gem 'oj', '= 2.18.5' + # end + # + # require 'json' + # + # class Example + # def initialize = @hello = 1 + # def to_json(arg = nil) = {world: 2}.to_json(arg) + # end + # + # example = Example.new + # puts JSON.fast_generate(example) + # + # require 'oj' + # require 'active_support/core_ext/object/json' + # Oj.mimic_JSON() + # + # puts JSON.fast_generate(example) + # ``` + # + # Incorrect output: + # {"world":2} + # {"hello":1} + # + describe 'when JSON encoder is broken and skips #to_json' do + let(:library_class_without_to_json) do + Class.new(Datadog::Profiling::Collectors::CodeProvenance::Library) do + undef to_json + end + end + + it 'is still able to correctly encode a library instance' do + instance = library_class_without_to_json.new( + name: 'datadog', + kind: 'library', + version: '1.2.3', + path: '/example/path/to/datadog/gem', + ) + + serialized_without_to_json = YAML.dump(instance) + # Remove class annotation, so it deserializes back as a hash and not an instance of our class + serialized_without_to_json.gsub!(/---.*/, '---') + + expect(YAML.safe_load(serialized_without_to_json)).to eq( + 'name' => 'datadog', + 'kind' => 'library', + 'version' => '1.2.3', + 'paths' => ['/example/path/to/datadog/gem'], + ) + end + end end end