[SDTEST-138] code coverage extension fixes and improvements #171

anmarchenko · 2024-05-03T14:12:28Z

What does this PR do?
Applies the following changes and optimizations to per test code coverage:

removes lines code coverage mode as it is not used and is not going to be used in the near future, so it is just a dead code right now
caches last seen filename pointer so that if we cover sequential lines in the same file we skip prefix check and saving file name in the coverage hash table
dd_cov_data->root and dd_cov_data->ignored_path are now copied (once) in malloc'd memory and stored as C-strings
to avoid string copying, rb_profile_frames function is now used to store filename in resulting hash: it improves performance and solves UTF-8 filenames issue

How to test the change?
Tested using test-environment integration test stand: https://github.com/DataDog/test-environment/pull/283

The following changes in performance overhead were recorded:

Rubocop
From 109% to 86% => 27% improvement

Middleman
From 29,7% to 28,6% => no change

Jekyll
From 0,4% to 0,3% => no change

devdocs
From 16,4% to 16,9% => no change

Overall result: it does not change anything for projects where code coverage overhead was low enough but provides meaningful improvement for the project where code coverage overhead was the largest.

codecov-commenter · 2024-05-03T14:16:53Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.85%. Comparing base (f58bb35) to head (b04133e).

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #171      +/-   ##
==========================================
- Coverage   98.86%   98.85%   -0.01%     
==========================================
  Files         228      228              
  Lines       10096    10082      -14     
  Branches      466      465       -1     
==========================================
- Hits         9981     9967      -14     
  Misses        115      115

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ivoanjo

Left some notes!

Big props for going ahead and removing unused things -- git always remembers, and less code means less bugs :)

ivoanjo · 2024-05-08T13:38:20Z

ext/datadog_cov/datadog_cov.c

-// constants
-#define DD_COV_TARGET_FILES 1
-#define DD_COV_TARGET_LINES 2
+#include "ruby/st.h"


Minor: I spotted we're a bit inconsistent about this on ddtrace as well

Suggested change

#include "ruby/st.h"

#include <ruby/st.h>

ivoanjo · 2024-05-08T13:51:00Z

ext/datadog_cov/datadog_cov.c

+  // skip if we cover the same file again
+  if (dd_cov_data->last_filename_ptr == (uintptr_t)filename)
  {
    return;
  }
+  dd_cov_data->last_filename_ptr = (uintptr_t)filename;


ivoanjo · 2024-05-08T13:58:50Z

ext/datadog_cov/datadog_cov.c

  VALUE root;
-  int mode;
-  VALUE coverage;
+  st_table *coverage;
+  uintptr_t last_filename_ptr;
 };

 static void dd_cov_mark(void *ptr)
 {
  struct dd_cov_data *dd_cov_data = ptr;
-  rb_gc_mark_movable(dd_cov_data->coverage);
  rb_gc_mark_movable(dd_cov_data->root);
 }

 static void dd_cov_free(void *ptr)
 {
  struct dd_cov_data *dd_cov_data = ptr;
-
+  st_free_table(dd_cov_data->coverage);
  xfree(dd_cov_data);
 }

 static void dd_cov_compact(void *ptr)
 {
  struct dd_cov_data *dd_cov_data = ptr;
-  dd_cov_data->coverage = rb_gc_location(dd_cov_data->coverage);
  dd_cov_data->root = rb_gc_location(dd_cov_data->root);
 }


Actually as I was leaving a few other comments regarding the root below, it occurred to me that one potential optimization is to also get rid of the VALUE root entirely, and instead copying it to a regular char * that we manage manually.

This would enable a bunch of optimizations and simplifications:

Nothing to mark anymore

Nothing to compact anymore

Set it once when coverage starts and don't check again

Cache length as well

ivoanjo · 2024-05-08T14:23:59Z

ext/datadog_cov/datadog_cov.c

+static int insert_into_result(st_data_t filename, st_data_t _val, st_data_t coverage)
+{
+  rb_hash_aset((VALUE)coverage, rb_str_new2((char *)filename), Qtrue);
+  return ST_CONTINUE;
+}


So I was peeking at the VM sources and the rb_sourcefile char * seems to come from a Ruby string.

Being that the case... I'm not sure it's entirely fine to hold on to it directly 🤔, e.g. that it will always be alive during the lifetime of coverage gathering. E.g. perhaps it's fine if Ruby never frees/GCs instruction sequences + it never moves those strings, but we'd probably need to validate that.

The alternative would be to keep our own copy of the char *, which we only would need to do if it hasn't been seen before. That is, at insertion time, we could do a lookup first, since the happy path will be that we've seen the file before, and only if we find it doesn't exist on the map, would we need to duplicate the char *.

Another alternative, if we wanted a VALUE but wanted to avoid the copy, would be to use the rb_profile_frames to get the top entry of the stack trace, which is basically the same thing we're getting here. I suspect overhead would be similar to rb_sourcefile(), but haven't checked.

Thanks for pointing it out! yes, I kind of suspected that this is a bit fishy.

I will try the approach with replacing st.h with Ruby hash again but storing VALUE returned by rb_profile_frames to avoid creating new strings: it could provide me the same performance win. If this does not work, will revert and try the memcpy approach.

I also have other small change to code coverage tool here that adds possibility to ignore bundle path (if bundled gems located in the project's folder): #174
There are no major changes there but just in case you would like to look 😄

👍 If you're going in that direction, remember to use the limit arguments in rb_profile_frames to get only the top entry.

ivoanjo · 2024-05-08T15:32:58Z

ext/datadog_cov/datadog_cov.c

@@ -159,6 +120,12 @@ static VALUE dd_cov_start(VALUE self)
  return self;


⬆️ Also, now that we're moving away from line coverage, maybe it would be a good idea to revisit the use of RUBY_EVENT_LINE? E.g. perhaps a combination of coarser-grained events (call, b_call, fiber_switch, etc) would be an overall performance win?

Using RUBY_EVENT_CALL increased overhead twice: I guess this event is not optimized as good as RUBY_EVENMT_LINE

anmarchenko · 2024-06-04T12:58:09Z

ext/datadog_cov/datadog_cov.c


-static int is_prefix(VALUE prefix, const char *str)
+char *ruby_strndup(const char *str, size_t size)


this one is stolen from profiling extension

…, cache last seen filename pointer

ivoanjo

👍 LGTM

This looks great! I've left a few final suggestions, but feel free to do them separately or not at all.

I'm assuming the performance improvements in the description are up-to-date with the move to rb_profile_frames, if not, I recommend doing a pass, just to make sure that some of the latest changes in the PR didn't regress the expected improvements.

ext/datadog_cov/datadog_cov.c

ivoanjo · 2024-06-10T10:44:41Z

ext/datadog_cov/datadog_cov.c

+  // if ignored_path is provided and the current filename is located under the ignored_path, we skip it too
+  // this is useful for ignoring bundled gems location
+  if (dd_cov_data->ignored_path_len != 0 && strncmp(dd_cov_data->ignored_path, filename_ptr, dd_cov_data->ignored_path_len) == 0)


Minor: It occurred to me that if ignored_path is supposed to include root as its prefix, we could store only that part and compare only that part here ;)

Thanks, this one I will make a not of for the future improvements

ext/datadog_cov/datadog_cov.c

ivoanjo · 2024-06-10T10:54:07Z

spec/ddcov/ddcov_spec.rb

+      it "supports files with non-ASCII characters" do
+        subject.start
+        expect(I❤️Ruby.new.call).to eq("I ❤️ Ruby")
+        coverage = subject.stop
+        expect(coverage.size).to eq(1)
+        expect(coverage.keys).to include(absolute_path("calculator/code_with_❤️.rb"))


❤️ Nice that we support this now! :D

anmarchenko · 2024-06-10T11:02:37Z

@ivoanjo yes, the number for performance improvements is final and was measured with the current state of this PR. Indeed, the switch to rb_profile_frames caused more overhead than rb_sourcefile but it is safer and solves an issue with UTF-8 filenames, so I consider it being worth it.

anmarchenko changed the title ~~remove lines mode from ddcov extension as it is dead code right now~~ [CIVIS-9581] code coverage extension fixes and improvements May 3, 2024

anmarchenko marked this pull request as ready for review May 6, 2024 15:40

anmarchenko requested review from a team as code owners May 6, 2024 15:40

anmarchenko requested a review from nikita-tkachenko-datadog May 6, 2024 15:40

nikita-tkachenko-datadog approved these changes May 7, 2024

View reviewed changes

ivoanjo reviewed May 8, 2024

View reviewed changes

ivoanjo mentioned this pull request May 8, 2024

[CIVIS-9951] add settings option to ignore code coverage for bundled gems location #174

Merged

ivoanjo reviewed May 8, 2024

View reviewed changes

anmarchenko force-pushed the anmarchenko/code_coverage_improvements branch from df72f52 to 9caee22 Compare May 15, 2024 12:19

anmarchenko marked this pull request as draft May 15, 2024 14:55

anmarchenko force-pushed the anmarchenko/code_coverage_improvements branch 2 times, most recently from 99b0c9f to e6c08ed Compare May 21, 2024 11:05

anmarchenko marked this pull request as ready for review May 21, 2024 12:48

anmarchenko changed the title ~~[CIVIS-9581] code coverage extension fixes and improvements~~ [SDTEST-138] code coverage extension fixes and improvements May 28, 2024

anmarchenko force-pushed the anmarchenko/code_coverage_improvements branch from e6c08ed to b3db6ae Compare June 4, 2024 11:52

anmarchenko commented Jun 4, 2024

View reviewed changes

anmarchenko mentioned this pull request Jun 6, 2024

[SDTEST-408] multi threaded code coverage support for datadog_cov #189

Merged

remove lines coverage, use rb_profile_frames to avoid copying strings…

0da2baa

…, cache last seen filename pointer

anmarchenko force-pushed the anmarchenko/code_coverage_improvements branch from b3db6ae to 0da2baa Compare June 7, 2024 07:09

use filename returned from profile frames for prefix path checks

44e32d4

ivoanjo approved these changes Jun 10, 2024

View reviewed changes

anmarchenko added 4 commits June 10, 2024 13:09

raise when starting coverage without root

357dedd

use RSTRING_PTR instead of StringValuePtr

e72af0c

replace frame_buffer with a single top_frame for profiling frames

9cdd680

use local top_frame variable for profile frames

b04133e

anmarchenko merged commit 6dd395c into main Jun 10, 2024
28 checks passed

anmarchenko deleted the anmarchenko/code_coverage_improvements branch June 10, 2024 11:37

github-actions bot added this to the 1.1.0 milestone Jun 10, 2024

anmarchenko mentioned this pull request Jun 11, 2024

Bump to version 1.0.1 #192

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SDTEST-138] code coverage extension fixes and improvements #171

[SDTEST-138] code coverage extension fixes and improvements #171

anmarchenko commented May 3, 2024 •

edited

Loading

codecov-commenter commented May 3, 2024 •

edited

Loading

ivoanjo left a comment

ivoanjo May 8, 2024

ivoanjo May 8, 2024

ivoanjo May 8, 2024

ivoanjo May 8, 2024

anmarchenko May 8, 2024 •

edited

Loading

ivoanjo May 8, 2024

ivoanjo May 8, 2024

anmarchenko May 17, 2024

anmarchenko Jun 4, 2024

ivoanjo left a comment

ivoanjo Jun 10, 2024

anmarchenko Jun 10, 2024

ivoanjo Jun 10, 2024

anmarchenko commented Jun 10, 2024

		@@ -159,6 +120,12 @@ static VALUE dd_cov_start(VALUE self)
		return self;


		static int is_prefix(VALUE prefix, const char *str)
		char ruby_strndup(const char str, size_t size)

[SDTEST-138] code coverage extension fixes and improvements #171

[SDTEST-138] code coverage extension fixes and improvements #171

Conversation

anmarchenko commented May 3, 2024 • edited Loading

codecov-commenter commented May 3, 2024 • edited Loading

Codecov Report

ivoanjo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anmarchenko May 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ivoanjo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anmarchenko commented Jun 10, 2024

anmarchenko commented May 3, 2024 •

edited

Loading

codecov-commenter commented May 3, 2024 •

edited

Loading

anmarchenko May 8, 2024 •

edited

Loading