[SDTEST-523] Expand test impact analysis with allocation tracing #197

anmarchenko · 2024-07-05T15:37:20Z

Problem statement
Using line coverage for test impact analysis has a major limitation in Ruby: consider the following examples:

# this class does not have any executable lines
class MyClass < OtherClass
end

test "instantiate MyClass" do
  assert MyClass.new != nil
end

The test "instantiate MyClass" does not cover MyClass because there are no executable lines in MyClass. If initializer was inherited from OtherClass, then this test will have other_class.rb in the list of covered files but not my_class.rb.

This leads to a major intelligent test runner bug: if we change initializer of MyClass like that:

class MyClass
  def initialize(arg)
    @arg = arg
  end
end

then the test above will start failing because MyClass.new expects argument now. But because my_class.rb is not covered by this test, intelligent test runner will skip test by default! It causes broken tests to be merged in the default branch.

If this example might seem artificial, unfortunately the same happens with ActiveRecord models or with ActiveModel classes:

# perfectly valid Rails model that is not covered by any test
class Account < ApplicationRecord
  belongs_to :user
end

Solution
We cannot overcome this limitation by using line coverage: the code coverage approach works correctly in this case and this is just how line coverage works. We need to go deeper in Ruby VM tracing using techniques that are already used by continuous profiler.

For this limitation, I've chosen to reach out for heap allocation tracepoint: it is possible to spy on every new object allocation that happens in Ruby heap. Even if no code from this class is executed during the test, it is enough for us to know that the test instantiates instances of this class to add its filename to the list of impacted files.

Notes on implementation:

RUBY_INTERNAL_EVENT_NEWOBJ event type is used
rb_tracepoint_new to register RUBY_INTERNAL_EVENT_NEWOBJ tracepoint
Module.const_source_location(klass_name) Ruby API is used to get the source of a constant (every class name is a constant). This API is available from Ruby 2.7 - this is exactly the oldest Ruby version supported for test visibility product. We use this API from C with rb_funcall(rb_cObject, rb_intern("const_source_location"), 1, klass_name)
rb_protect to ignore exceptions when getting source code location of a class (it fails for many internal classes)
Many libraries generate anonymous classes with class name like #<Class:0ff0eabcde> - we explicitly ignore them because getting source location for these classes always fails

Known limitations

This approach only tracks objects instantiated during the test itself (in before hook or during the test). If the test suite has some models cached in global state and shared between tests, and these models don't have any methods implemented on them, the test impact analysis will still miss them. We need to change docs on Intelligent test runner in Ruby reiterating that global state is harmful and can cause flakiness and incompatibility with ITR.
Ruby versions 3.2.0 - 3.2.2 have a bug that causes failures when using this tracepoint: allocation profiling is disabled for these versions
rb_tracepoint_new cannot be attached to a specific thread, so the allocation tracing is enabled for multi threaded coverage mode only (this mode is the default one and the only one that can work for Rails, so it is not a major problem)
Anonymous classes source locations are not supported

How to test the change?
Tested using by running test suites of the following open source projects:

feedbin
rubocop
jekyll
middleman
vagrant
devdocs

Unit tests reproducing the original problem are provided. See performance evaluation below.

Performance evaluation
Median performance overhead for some OSS projects' test suites before this change:

Results from benchmarks after this change:

rubocop - 111,7% overhead - 28% increase
jekyll - 0,4% overhead - 0% increase
middleman - 37,6% overhead - 32% increase
devdocs - 23,8% overhead - 39% increase

Overall this change increases code coverage overhead on test suites by 30-40% (compared to overhead before the change) in relative numbers. In absolute numbers depending on project's size and characteristics it means from 7% up to 30% more time spent in tests (the maximum overhead is from rubocop which is particularly challenging for profiling: 21k relatively fast tests).

codecov-commenter · 2024-07-08T13:55:31Z

Codecov Report

Attention: Patch coverage is 98.21429% with 2 lines in your changes missing coverage. Please review.

Project coverage is 98.86%. Comparing base (579867e) to head (aebb1bf).

Files	Patch %	Lines
lib/datadog/ci/configuration/components.rb	77.77%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #197      +/-   ##
==========================================
- Coverage   98.87%   98.86%   -0.01%     
==========================================
  Files         231      235       +4     
  Lines       10368    10477     +109     
  Branches      475      481       +6     
==========================================
+ Hits        10251    10358     +107     
- Misses        117      119       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…source location multiple times; test that allocations between tests are not attributed to a test

…sts. Fix them by safely getting source location with suppressing exceptions.

lib/datadog/ci/configuration/components.rb

ivoanjo

Left a few notes! I didn't do a pass very deep into the get_source_location part (ran out of time -- I can do it in the next pass :D) but hopefully this helps

lib/datadog/ci/configuration/components.rb

ext/datadog_cov/datadog_cov.c

… in st.c instead and process them once when test ends

…ypes

devinburnette · 2024-07-12T19:31:43Z

just tested this branch as of the latest commit (0b1618c) and verified the segfault is gone and the test impact analysis is working as expected for models.

anmarchenko · 2024-07-12T20:05:45Z

Thank you for testing and your feedback @devinburnette! I'll do a couple more passes on Monday and will release soon

…object_allocation_tracepoint to nil instead

anmarchenko · 2024-07-15T08:04:57Z

@ivoanjo this is ready for another pass - seems to be working now

ivoanjo

Left another round of suggestions, but I think overall my comments fall in the "extra things" bucket, and this PR seems reasonable even as-is.

spec/ddcov/ddcov_spec.rb

ext/datadog_cov/datadog_cov.c

ivoanjo · 2024-07-15T11:30:37Z

ext/datadog_cov/datadog_cov.c

+  enum ruby_value_type type = rb_type(new_object);
+  if (type != RUBY_T_OBJECT && type != RUBY_T_STRUCT)
  {
    return;
  }

-  // if ignored_path is provided and the current filename is located under the ignored_path, we skip it too
-  // this is useful for ignoring bundled gems location
-  if (dd_cov_data->ignored_path_len != 0 && strncmp(dd_cov_data->ignored_path, filename_ptr, dd_cov_data->ignored_path_len) == 0)
+  VALUE klass = rb_class_of(new_object);
+  if (klass == Qnil || klass == 0)
+  {
+    return;
+  }
+  // Skip anonymous classes starting with "#<Class".
+  // it allows us to skip the source location lookup that will always fail
+  const char *name = rb_obj_classname(new_object);


As a follow-up, on the discussion of what we can do in the new object tracepoint, in some cases rb_obj_classname will definitely cause new objects to be allocated.

Thus doing this check here may be not be safe (although arguably we've been doing it in the profiler and I've never seen issues...) and you may want to delay it perhaps.

The good news is maybe there's a few, even better options. I was looking what rb_obj_classname does, and maybe there's an alternative that is even better for our purposes because it doesn't even need to allocate the string to represent the anonymous class -- I'm thinking of rb_mod_name or some of the other ones that exist.

I like your rb_mod_name suggestion - it returns Qnil if klass is anonymous, so it works for filtering out anonymous classes, and works a bit faster

ext/datadog_cov/datadog_cov.c

ivoanjo · 2024-07-15T13:31:25Z

ext/datadog_cov/datadog_cov.c

+// Get source location for a given class name
+static VALUE get_source_location(VALUE klass_name)
+{
+  return rb_funcall(rb_cObject, rb_intern("const_source_location"), 1, klass_name);
+}


So on the topic of performance, it occurs to me that if you're starting and stopping coverage on every individual test case, you may be redoing these kinds of lookups again and again.

Maybe not for this PR, but perhaps it's worth considering having some kind of a cache that would live across coverage start/stops? (Of course with proper sizing, etc)

Yes, interesting thing that I tried two kind of caches:

hashtable (st.c) that tracks source file for every class

hashtable that for every class stores boolean value indicating whether the source file is from inside project or from external gem

Every combination of these 2 approaches ether do not change overhead for rubocop test source or makes it on about 4% slower (on average)! I did not investigate it further so I don't know yet why intuition does not work in this case: maybe if I have time I'll do another pass.

For now, I am leaving it as is without additional caches as I believe that every optimisation must be backed by solid data, otherwise it will be just another source of bugs

Sounds reasonable 👍

ext/datadog_cov/datadog_cov.c

…ses_table is NULL

anmarchenko changed the title ~~[SDTEST-523] Expand test impact analysis to the heap allocation~~ [SDTEST-523] Expand test impact analysis with allocation profiler Jul 8, 2024

anmarchenko added 12 commits July 10, 2024 13:33

fix service name for datadog sca

1245ee7

failing test as a base repro for the coverage limitation

d11d74d

printing class names of the objects allocated during the test run

7797a10

print filenames of objects instantiated during the test

f007907

heap allocation analysis: working POC that passes tests

a621640

skip anonymous classes

dcc6f4b

track classes already covered for the current test to skip computing …

8a4b135

…source location multiple times; test that allocations between tests are not attributed to a test

Add unit tests that simulate issues encountered during integration te…

5c737c0

…sts. Fix them by safely getting source location with suppressing exceptions.

add setting for allocation tracing

59df766

add use_allocation_tracing parameter for DDCov tool

ca32796

rename some C functions, write more comments

955843c

minor typo

b78f095

anmarchenko force-pushed the anmarchenko/heap_allocation_tracepoint branch from 6e18534 to b78f095 Compare July 10, 2024 11:33

anmarchenko changed the title ~~[SDTEST-523] Expand test impact analysis with allocation profiler~~ [SDTEST-523] Expand test impact analysis with allocation tracing Jul 10, 2024

devinburnette reviewed Jul 10, 2024

View reviewed changes

lib/datadog/ci/configuration/components.rb Show resolved Hide resolved

anmarchenko marked this pull request as ready for review July 11, 2024 08:50

anmarchenko requested review from a team as code owners July 11, 2024 08:50

anmarchenko requested a review from liashenko July 11, 2024 08:50

finish comment

2209587

ivoanjo reviewed Jul 11, 2024

View reviewed changes

anmarchenko added 3 commits July 12, 2024 16:06

do not call CRuby API on NEWOBJ tracepoint, collect allocated classes…

3fc3ac0

… in st.c instead and process them once when test ends

move skipping anonymous classes to the tracepoint, add struct to tests

4007592

minor: comments, remove RUBY_T_DATA from the list of tracked object t…

0b1618c

…ypes

anmarchenko added 2 commits July 15, 2024 09:30

use enum for threading mode

912d9f4

remove allocation_tracing_enabled field from dd_cov_data struct, set …

6e1d2e1

…object_allocation_tracepoint to nil instead

ivoanjo approved these changes Jul 15, 2024

View reviewed changes

anmarchenko added 7 commits July 15, 2024 17:00

if GC happens during DDCov allocation, it might segfault because klas…

f076e90

…ses_table is NULL

insert in hashtable without lookup

df899ce

pass dd_cov_data pointer directly to NEWOBJ tracepoint callback

a020a39

simplify anonymoous class check with rb_mod_name

d62aef2

separate spec for Struct coverage, add spec for Data

d318b3f

remove what should not have been committed

9dde8c1

use rb_protect to ignore exceptions instead of raise/rescue

aebb1bf

anmarchenko merged commit 186ccf6 into main Jul 15, 2024
28 checks passed

anmarchenko deleted the anmarchenko/heap_allocation_tracepoint branch July 15, 2024 17:15

github-actions bot added this to the 1.2.0 milestone Jul 15, 2024

anmarchenko mentioned this pull request Jul 16, 2024

Bump to version 1.2.0 #199

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SDTEST-523] Expand test impact analysis with allocation tracing #197

[SDTEST-523] Expand test impact analysis with allocation tracing #197

anmarchenko commented Jul 5, 2024 •

edited

Loading

codecov-commenter commented Jul 8, 2024 •

edited

Loading

ivoanjo left a comment

devinburnette commented Jul 12, 2024

anmarchenko commented Jul 12, 2024

anmarchenko commented Jul 15, 2024

ivoanjo left a comment

ivoanjo Jul 15, 2024

anmarchenko Jul 15, 2024 •

edited

Loading

ivoanjo Jul 15, 2024

anmarchenko Jul 15, 2024 •

edited

Loading

ivoanjo Jul 16, 2024

[SDTEST-523] Expand test impact analysis with allocation tracing #197

[SDTEST-523] Expand test impact analysis with allocation tracing #197

Conversation

anmarchenko commented Jul 5, 2024 • edited Loading

codecov-commenter commented Jul 8, 2024 • edited Loading

Codecov Report

ivoanjo left a comment

Choose a reason for hiding this comment

devinburnette commented Jul 12, 2024

anmarchenko commented Jul 12, 2024

anmarchenko commented Jul 15, 2024

ivoanjo left a comment

Choose a reason for hiding this comment

ivoanjo Jul 15, 2024

Choose a reason for hiding this comment

anmarchenko Jul 15, 2024 • edited Loading

Choose a reason for hiding this comment

ivoanjo Jul 15, 2024

Choose a reason for hiding this comment

anmarchenko Jul 15, 2024 • edited Loading

Choose a reason for hiding this comment

ivoanjo Jul 16, 2024

Choose a reason for hiding this comment

anmarchenko commented Jul 5, 2024 •

edited

Loading

codecov-commenter commented Jul 8, 2024 •

edited

Loading

anmarchenko Jul 15, 2024 •

edited

Loading

anmarchenko Jul 15, 2024 •

edited

Loading