Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SDTEST-138] code coverage extension fixes and improvements #171

Merged
merged 6 commits into from
Jun 10, 2024

Conversation

anmarchenko
Copy link
Member

@anmarchenko anmarchenko commented May 3, 2024

What does this PR do?
Applies the following changes and optimizations to per test code coverage:

  • removes lines code coverage mode as it is not used and is not going to be used in the near future, so it is just a dead code right now
  • caches last seen filename pointer so that if we cover sequential lines in the same file we skip prefix check and saving file name in the coverage hash table
  • dd_cov_data->root and dd_cov_data->ignored_path are now copied (once) in malloc'd memory and stored as C-strings
  • to avoid string copying, rb_profile_frames function is now used to store filename in resulting hash: it improves performance and solves UTF-8 filenames issue

How to test the change?
Tested using test-environment integration test stand: https://github.com/DataDog/test-environment/pull/283

The following changes in performance overhead were recorded:

Rubocop
From 109% to 86% => 27% improvement

Middleman
From 29,7% to 28,6% => no change

Jekyll
From 0,4% to 0,3% => no change

devdocs
From 16,4% to 16,9% => no change

Overall result: it does not change anything for projects where code coverage overhead was low enough but provides meaningful improvement for the project where code coverage overhead was the largest.

@anmarchenko anmarchenko changed the title remove lines mode from ddcov extension as it is dead code right now [CIVIS-9581] code coverage extension fixes and improvements May 3, 2024
@codecov-commenter
Copy link

codecov-commenter commented May 3, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.85%. Comparing base (f58bb35) to head (b04133e).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #171      +/-   ##
==========================================
- Coverage   98.86%   98.85%   -0.01%     
==========================================
  Files         228      228              
  Lines       10096    10082      -14     
  Branches      466      465       -1     
==========================================
- Hits         9981     9967      -14     
  Misses        115      115              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@anmarchenko anmarchenko marked this pull request as ready for review May 6, 2024 15:40
@anmarchenko anmarchenko requested review from a team as code owners May 6, 2024 15:40
Copy link
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some notes!

Big props for going ahead and removing unused things -- git always remembers, and less code means less bugs :)

// constants
#define DD_COV_TARGET_FILES 1
#define DD_COV_TARGET_LINES 2
#include "ruby/st.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: I spotted we're a bit inconsistent about this on ddtrace as well

Suggested change
#include "ruby/st.h"
#include <ruby/st.h>

Comment on lines 89 to 125
// skip if we cover the same file again
if (dd_cov_data->last_filename_ptr == (uintptr_t)filename)
{
return;
}
dd_cov_data->last_filename_ptr = (uintptr_t)filename;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 👍

Comment on lines 8 to 51
VALUE root;
int mode;
VALUE coverage;
st_table *coverage;
uintptr_t last_filename_ptr;
};

static void dd_cov_mark(void *ptr)
{
struct dd_cov_data *dd_cov_data = ptr;
rb_gc_mark_movable(dd_cov_data->coverage);
rb_gc_mark_movable(dd_cov_data->root);
}

static void dd_cov_free(void *ptr)
{
struct dd_cov_data *dd_cov_data = ptr;

st_free_table(dd_cov_data->coverage);
xfree(dd_cov_data);
}

static void dd_cov_compact(void *ptr)
{
struct dd_cov_data *dd_cov_data = ptr;
dd_cov_data->coverage = rb_gc_location(dd_cov_data->coverage);
dd_cov_data->root = rb_gc_location(dd_cov_data->root);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually as I was leaving a few other comments regarding the root below, it occurred to me that one potential optimization is to also get rid of the VALUE root entirely, and instead copying it to a regular char * that we manage manually.

This would enable a bunch of optimizations and simplifications:

  • Nothing to mark anymore
  • Nothing to compact anymore
  • Set it once when coverage starts and don't check again
  • Cache length as well

Comment on lines 123 to 169
static int insert_into_result(st_data_t filename, st_data_t _val, st_data_t coverage)
{
rb_hash_aset((VALUE)coverage, rb_str_new2((char *)filename), Qtrue);
return ST_CONTINUE;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I was peeking at the VM sources and the rb_sourcefile char * seems to come from a Ruby string.

Being that the case... I'm not sure it's entirely fine to hold on to it directly 🤔, e.g. that it will always be alive during the lifetime of coverage gathering. E.g. perhaps it's fine if Ruby never frees/GCs instruction sequences + it never moves those strings, but we'd probably need to validate that.

The alternative would be to keep our own copy of the char *, which we only would need to do if it hasn't been seen before. That is, at insertion time, we could do a lookup first, since the happy path will be that we've seen the file before, and only if we find it doesn't exist on the map, would we need to duplicate the char *.


Another alternative, if we wanted a VALUE but wanted to avoid the copy, would be to use the rb_profile_frames to get the top entry of the stack trace, which is basically the same thing we're getting here. I suspect overhead would be similar to rb_sourcefile(), but haven't checked.

Copy link
Member Author

@anmarchenko anmarchenko May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing it out! yes, I kind of suspected that this is a bit fishy.

I will try the approach with replacing st.h with Ruby hash again but storing VALUE returned by rb_profile_frames to avoid creating new strings: it could provide me the same performance win. If this does not work, will revert and try the memcpy approach.

I also have other small change to code coverage tool here that adds possibility to ignore bundle path (if bundled gems located in the project's folder): #174
There are no major changes there but just in case you would like to look 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 If you're going in that direction, remember to use the limit arguments in rb_profile_frames to get only the top entry.

@@ -159,6 +120,12 @@ static VALUE dd_cov_start(VALUE self)
return self;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⬆️ Also, now that we're moving away from line coverage, maybe it would be a good idea to revisit the use of RUBY_EVENT_LINE? E.g. perhaps a combination of coarser-grained events (call, b_call, fiber_switch, etc) would be an overall performance win?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using RUBY_EVENT_CALL increased overhead twice: I guess this event is not optimized as good as RUBY_EVENMT_LINE

@anmarchenko anmarchenko force-pushed the anmarchenko/code_coverage_improvements branch from df72f52 to 9caee22 Compare May 15, 2024 12:19
@anmarchenko anmarchenko marked this pull request as draft May 15, 2024 14:55
@anmarchenko anmarchenko force-pushed the anmarchenko/code_coverage_improvements branch 2 times, most recently from 99b0c9f to e6c08ed Compare May 21, 2024 11:05
@anmarchenko anmarchenko marked this pull request as ready for review May 21, 2024 12:48
@anmarchenko anmarchenko changed the title [CIVIS-9581] code coverage extension fixes and improvements [SDTEST-138] code coverage extension fixes and improvements May 28, 2024
@anmarchenko anmarchenko force-pushed the anmarchenko/code_coverage_improvements branch from e6c08ed to b3db6ae Compare June 4, 2024 11:52

static int is_prefix(VALUE prefix, const char *str)
char *ruby_strndup(const char *str, size_t size)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one is stolen from profiling extension

@anmarchenko anmarchenko force-pushed the anmarchenko/code_coverage_improvements branch from b3db6ae to 0da2baa Compare June 7, 2024 07:09
Copy link
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 LGTM

This looks great! I've left a few final suggestions, but feel free to do them separately or not at all.

I'm assuming the performance improvements in the description are up-to-date with the move to rb_profile_frames, if not, I recommend doing a pass, just to make sure that some of the latest changes in the PR didn't regress the expected improvements.

ext/datadog_cov/datadog_cov.c Outdated Show resolved Hide resolved
ext/datadog_cov/datadog_cov.c Outdated Show resolved Hide resolved
Comment on lines +145 to +147
// if ignored_path is provided and the current filename is located under the ignored_path, we skip it too
// this is useful for ignoring bundled gems location
if (dd_cov_data->ignored_path_len != 0 && strncmp(dd_cov_data->ignored_path, filename_ptr, dd_cov_data->ignored_path_len) == 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: It occurred to me that if ignored_path is supposed to include root as its prefix, we could store only that part and compare only that part here ;)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this one I will make a not of for the future improvements

ext/datadog_cov/datadog_cov.c Outdated Show resolved Hide resolved
Comment on lines +49 to +54
it "supports files with non-ASCII characters" do
subject.start
expect(I❤️Ruby.new.call).to eq("I ❤️ Ruby")
coverage = subject.stop
expect(coverage.size).to eq(1)
expect(coverage.keys).to include(absolute_path("calculator/code_with_❤️.rb"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️ Nice that we support this now! :D

@anmarchenko
Copy link
Member Author

@ivoanjo yes, the number for performance improvements is final and was measured with the current state of this PR. Indeed, the switch to rb_profile_frames caused more overhead than rb_sourcefile but it is safer and solves an issue with UTF-8 filenames, so I consider it being worth it.

@anmarchenko anmarchenko merged commit 6dd395c into main Jun 10, 2024
28 checks passed
@anmarchenko anmarchenko deleted the anmarchenko/code_coverage_improvements branch June 10, 2024 11:37
@github-actions github-actions bot added this to the 1.1.0 milestone Jun 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants