-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SDTEST-138] code coverage extension fixes and improvements #171
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #171 +/- ##
==========================================
- Coverage 98.86% 98.85% -0.01%
==========================================
Files 228 228
Lines 10096 10082 -14
Branches 466 465 -1
==========================================
- Hits 9981 9967 -14
Misses 115 115 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some notes!
Big props for going ahead and removing unused things -- git always remembers, and less code means less bugs :)
ext/datadog_cov/datadog_cov.c
Outdated
// constants | ||
#define DD_COV_TARGET_FILES 1 | ||
#define DD_COV_TARGET_LINES 2 | ||
#include "ruby/st.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: I spotted we're a bit inconsistent about this on ddtrace as well
#include "ruby/st.h" | |
#include <ruby/st.h> |
ext/datadog_cov/datadog_cov.c
Outdated
// skip if we cover the same file again | ||
if (dd_cov_data->last_filename_ptr == (uintptr_t)filename) | ||
{ | ||
return; | ||
} | ||
dd_cov_data->last_filename_ptr = (uintptr_t)filename; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice 👍
ext/datadog_cov/datadog_cov.c
Outdated
VALUE root; | ||
int mode; | ||
VALUE coverage; | ||
st_table *coverage; | ||
uintptr_t last_filename_ptr; | ||
}; | ||
|
||
static void dd_cov_mark(void *ptr) | ||
{ | ||
struct dd_cov_data *dd_cov_data = ptr; | ||
rb_gc_mark_movable(dd_cov_data->coverage); | ||
rb_gc_mark_movable(dd_cov_data->root); | ||
} | ||
|
||
static void dd_cov_free(void *ptr) | ||
{ | ||
struct dd_cov_data *dd_cov_data = ptr; | ||
|
||
st_free_table(dd_cov_data->coverage); | ||
xfree(dd_cov_data); | ||
} | ||
|
||
static void dd_cov_compact(void *ptr) | ||
{ | ||
struct dd_cov_data *dd_cov_data = ptr; | ||
dd_cov_data->coverage = rb_gc_location(dd_cov_data->coverage); | ||
dd_cov_data->root = rb_gc_location(dd_cov_data->root); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually as I was leaving a few other comments regarding the root
below, it occurred to me that one potential optimization is to also get rid of the VALUE root
entirely, and instead copying it to a regular char *
that we manage manually.
This would enable a bunch of optimizations and simplifications:
- Nothing to mark anymore
- Nothing to compact anymore
- Set it once when coverage starts and don't check again
- Cache length as well
ext/datadog_cov/datadog_cov.c
Outdated
static int insert_into_result(st_data_t filename, st_data_t _val, st_data_t coverage) | ||
{ | ||
rb_hash_aset((VALUE)coverage, rb_str_new2((char *)filename), Qtrue); | ||
return ST_CONTINUE; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I was peeking at the VM sources and the rb_sourcefile
char *
seems to come from a Ruby string.
Being that the case... I'm not sure it's entirely fine to hold on to it directly 🤔, e.g. that it will always be alive during the lifetime of coverage gathering. E.g. perhaps it's fine if Ruby never frees/GCs instruction sequences + it never moves those strings, but we'd probably need to validate that.
The alternative would be to keep our own copy of the char *
, which we only would need to do if it hasn't been seen before. That is, at insertion time, we could do a lookup first, since the happy path will be that we've seen the file before, and only if we find it doesn't exist on the map, would we need to duplicate the char *
.
Another alternative, if we wanted a VALUE
but wanted to avoid the copy, would be to use the rb_profile_frames
to get the top entry of the stack trace, which is basically the same thing we're getting here. I suspect overhead would be similar to rb_sourcefile()
, but haven't checked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing it out! yes, I kind of suspected that this is a bit fishy.
I will try the approach with replacing st.h
with Ruby hash again but storing VALUE returned by rb_profile_frames
to avoid creating new strings: it could provide me the same performance win. If this does not work, will revert and try the memcpy approach.
I also have other small change to code coverage tool here that adds possibility to ignore bundle path (if bundled gems located in the project's folder): #174
There are no major changes there but just in case you would like to look 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 If you're going in that direction, remember to use the limit arguments in rb_profile_frames
to get only the top entry.
@@ -159,6 +120,12 @@ static VALUE dd_cov_start(VALUE self) | |||
return self; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⬆️ Also, now that we're moving away from line coverage, maybe it would be a good idea to revisit the use of RUBY_EVENT_LINE
? E.g. perhaps a combination of coarser-grained events (call, b_call, fiber_switch, etc) would be an overall performance win?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using RUBY_EVENT_CALL increased overhead twice: I guess this event is not optimized as good as RUBY_EVENMT_LINE
df72f52
to
9caee22
Compare
99b0c9f
to
e6c08ed
Compare
e6c08ed
to
b3db6ae
Compare
|
||
static int is_prefix(VALUE prefix, const char *str) | ||
char *ruby_strndup(const char *str, size_t size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this one is stolen from profiling extension
…, cache last seen filename pointer
b3db6ae
to
0da2baa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 LGTM
This looks great! I've left a few final suggestions, but feel free to do them separately or not at all.
I'm assuming the performance improvements in the description are up-to-date with the move to rb_profile_frames
, if not, I recommend doing a pass, just to make sure that some of the latest changes in the PR didn't regress the expected improvements.
// if ignored_path is provided and the current filename is located under the ignored_path, we skip it too | ||
// this is useful for ignoring bundled gems location | ||
if (dd_cov_data->ignored_path_len != 0 && strncmp(dd_cov_data->ignored_path, filename_ptr, dd_cov_data->ignored_path_len) == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: It occurred to me that if ignored_path
is supposed to include root
as its prefix, we could store only that part and compare only that part here ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this one I will make a not of for the future improvements
it "supports files with non-ASCII characters" do | ||
subject.start | ||
expect(I❤️Ruby.new.call).to eq("I ❤️ Ruby") | ||
coverage = subject.stop | ||
expect(coverage.size).to eq(1) | ||
expect(coverage.keys).to include(absolute_path("calculator/code_with_❤️.rb")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️ Nice that we support this now! :D
@ivoanjo yes, the number for performance improvements is final and was measured with the current state of this PR. Indeed, the switch to |
What does this PR do?
Applies the following changes and optimizations to per test code coverage:
dd_cov_data->root
anddd_cov_data->ignored_path
are now copied (once) in malloc'd memory and stored as C-stringsrb_profile_frames
function is now used to store filename in resulting hash: it improves performance and solves UTF-8 filenames issueHow to test the change?
Tested using test-environment integration test stand: https://github.com/DataDog/test-environment/pull/283
The following changes in performance overhead were recorded:
Rubocop
From 109% to 86% => 27% improvement
Middleman
From 29,7% to 28,6% => no change
Jekyll
From 0,4% to 0,3% => no change
devdocs
From 16,4% to 16,9% => no change
Overall result: it does not change anything for projects where code coverage overhead was low enough but provides meaningful improvement for the project where code coverage overhead was the largest.