Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PROF-9821] Fix incorrect code provenance due to broken JSON monkey patch #3695

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 14 additions & 3 deletions lib/datadog/profiling/collectors/code_provenance.rb
Original file line number Diff line number Diff line change
Expand Up @@ -98,19 +98,30 @@ def record_loaded_files(loaded_files)
end

# Represents metadata we have for a ruby gem
#
# Important note: This class gets encoded to JSON with the built-in JSON gem. But, we've found that in some
# buggy cases, some Ruby gems monkey patch the built-in JSON gem and forget to call #to_json, and instead
# encode this class instance-field-by-instance-field.
#
# Thus, this class was setup to match the JSON output. Take this into consideration if you are adding new
# fields. (Also, we have a spec for this)
class Library
attr_reader :kind, :name, :version, :path
attr_reader :kind, :name, :version

def initialize(kind:, name:, version:, path:)
@kind = kind.freeze
@name = name.dup.freeze
@version = version.to_s.dup.freeze
@path = path.dup.freeze
@paths = [path.dup.freeze].freeze
freeze
end

def to_json(arg = nil)
{ kind: @kind, name: @name, version: @version, paths: [@path] }.to_json(arg)
{ kind: @kind, name: @name, version: @version, paths: @paths }.to_json(arg)
end

def path
@paths.first
end
end
end
Expand Down
72 changes: 72 additions & 0 deletions spec/datadog/profiling/collectors/code_provenance_spec.rb
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
require 'datadog/profiling/collectors/code_provenance'
require 'json-schema'
require 'yaml'

RSpec.describe Datadog::Profiling::Collectors::CodeProvenance do
subject(:code_provenance) { described_class.new }
Expand Down Expand Up @@ -187,5 +188,76 @@
it 'renders the list of loaded libraries using the expected schema' do
JSON::Validator.validate!(code_provenance_schema, code_provenance.generate_json)
end

# In PROF-9821 we run into an issue where some versions of OJ + activesupport + monkey patching the JSON gem
# would result in our Library instance being encoded instance-field-by-instance-field instead of by calling #to_json.
#
# This would obviously result in broken code provenance files. To fix this, we've adjusted the class to make sure
# that if you serialize it field-by-field, you still get a correct result.
#
# Reproducing this exact issue in CI is really annoying -- because it would be one more set of appraisails we'd run
# just to reproduce it and test.
#
# So instead in this test we use YAML as an example of an encoder that doesn't use #to_json, and does it
# field-by-field. Thus if the Library class doesn't match exactly what we want in the output, this test will fail.
#
# In case you want to reproduce the exact JSON issue, here's a reproducer:
# ````ruby
# require 'bundler/inline'
#
# gemfile do
# source 'https://rubygems.org'
# gem 'activesupport', '= 5.0.7.2'
# gem 'oj', '= 2.18.5'
# end
#
# require 'json'
#
# class Example
# def initialize = @hello = 1
# def to_json(arg = nil) = {world: 2}.to_json(arg)
# end
#
# example = Example.new
# puts JSON.fast_generate(example)
#
# require 'oj'
# require 'active_support/core_ext/object/json'
# Oj.mimic_JSON()
#
# puts JSON.fast_generate(example)
# ```
#
# Incorrect output:
# {"world":2}
# {"hello":1}
#
describe 'when JSON encoder is broken and skips #to_json' do
let(:library_class_without_to_json) do
Class.new(Datadog::Profiling::Collectors::CodeProvenance::Library) do
undef to_json
end
end

it 'is still able to correctly encode a library instance' do
instance = library_class_without_to_json.new(
name: 'datadog',
kind: 'library',
version: '1.2.3',
path: '/example/path/to/datadog/gem',
)

serialized_without_to_json = YAML.dump(instance)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the reasoning here that YAML serialization is performed field by field, same as some JSON libraries would serialize, or that libraries like oj actually use YAML serialization logic? I am guessing it's the former and if so, a comment to this effect in the code would be helpful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! I've added a comment above the spec stating this:

    # So instead in this test we use YAML as an example of an encoder that doesn't use #to_json, and does it
    # field-by-field. Thus if the Library class doesn't match exactly what we want in the output, this test will fail.

# Remove class annotation, so it deserializes back as a hash and not an instance of our class
serialized_without_to_json.gsub!(/---.*/, '---')

expect(YAML.safe_load(serialized_without_to_json)).to eq(
'name' => 'datadog',
'kind' => 'library',
'version' => '1.2.3',
'paths' => ['/example/path/to/datadog/gem'],
)
end
end
end
end
Loading