-
Notifications
You must be signed in to change notification settings - Fork 554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix merging race conditions #570
Fix merging race conditions #570
Conversation
hah, looks like rubocop needs to be pinned even more precisely ... master fails on 0.48.1 but passes on 0.48.0 😭 |
5f5ed37
to
7d4d8cc
Compare
Hi there! Thanks a ton for this 🎉 Might take me a little while to get around to a proper review, feel free to ping me if I haven't gotten to it within a week (goal is sooner, but you know... things). I just noticed and fixed the rubocop thing... changes in patch levels 😦 Master should be good after this so you can rebase/merge whatever. Sorry for the inconvenience! |
7d4d8cc
to
78c1171
Compare
Rebased to get the rubocop fix 🤞 |
@@ -75,23 +75,21 @@ def add_not_loaded_files(result) | |||
# from cache using SimpleCov::ResultMerger if use_merging is activated (default) | |||
# | |||
def result | |||
# Ensure the variable is defined to avoid ruby warnings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was ostensibly added for rubocop, but rubocop and ruby don't complain about it now, so ¯_(ツ)_/¯
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably was solved some time when result?
was introduced/use or so
spec/result_merger_spec.rb
Outdated
|
||
it "returns nil for SimpleCov.result" do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This spec was misleading in that it only returned nil if .running == false
, and it also happened independently of use_merging
... so it strikes me as not very useful, and a better spec can be found in spec/simplecov_spec.rb:22
Note that the behavior has now changed slightly; previously .result
always returned nil
if SimpleCov wasn't running. Now it only returns nil
if it's not running and use_merging == false
. If use_merging == true
, you always get something useful from result
(whether or not it merges the current Coverage.result
depends on .running
). This is useful for times when you have a post-CI script for formatting results and coverage thresholds, and the script itself doesn't need coverage (say, you have a script that aggregates and merges .resultset.json
from many distributed workers, and want to format and enforce coverage on the final result).
It occurs to me it might be worth reworking the e.g. could do something like def synchronize_resultset
# make it reentrant
return yield if @resultset_locked
begin
@resultset_locked = true
File.open(resultset_writelock, "w+") do |f|
f.flock(File::LOCK_EX)
result = yield
end
result
ensure
@resultset_locked = false
end
end
def stored_data
synchronize_resultset do
return unless File.exist?(resultset_path)
data = File.read(resultset_path)
return if data.nil? || data.length < 2
data
end
end
def store_result(result)
synchronize_resultset do
new_set = resultset(true)
command_name, data = result.to_hash.first
new_set[command_name] = data
File.open(resultset_path, "w+") do |f|
f.puts JSON.pretty_generate(new_set)
end
end
true
end That said, that would be a bigger change than the simple method caching I'm introducing in this pull request, so I'm happy to do that as a separate issue/PR. |
would be awesome to have this merged 🌟 |
I agree. sorry I've been rather busy :| I know I probably said this before but I'll try to make time this weekend to have a look :) Sorry for the delay! |
ping ⭐️ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👋
Hello, thanks for digging into this and taking the time to fix this. Really appreciated! 🌟
At the same time, sorry it took me so long to get to this. Been quite busy in all of private life, work life and open source/conference life. This was always a bit daunting to look at as I'm not so familiar with that part of the code base myself yet 😅
If it's too long ago for you and you won't get around to making the adjustments here or whatever then I totally understand and will look to do them myself. Want to have this in the next release.
Looks good overall, my points are mostly minor and mostly about testing.
Just to check my understanding of the problem: As we didn't cache the reultset (always read it from file) it might have happened that we access the file and read an empty one while another one is busy trying to store the result?
Thank you a lot!!! 🚀 💚
edit: also thanks for the pings 👍
@@ -75,23 +75,21 @@ def add_not_loaded_files(result) | |||
# from cache using SimpleCov::ResultMerger if use_merging is activated (default) | |||
# | |||
def result | |||
# Ensure the variable is defined to avoid ruby warnings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably was solved some time when result?
was introduced/use or so
lib/simplecov/result_merger.rb
Outdated
def resultset(force_reload = false) | ||
@resultset = nil if force_reload | ||
|
||
@resultset ||= if (data = stored_data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer for the assignment not to be inside the if but rather outside it. Alternatively stored_data
could also be memoized imo
spec/result_merger_spec.rb
Outdated
it "caches by default" do | ||
set = SimpleCov::ResultMerger.resultset | ||
new_set = SimpleCov::ResultMerger.resultset | ||
expect(new_set).to equal(set) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure, but if we don't change the files etc. those should be the same whether we use merging or not, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the key point here is the rspec equal
matcher calls #equal?
, which checks if they are the same ruby object, not just if they are equivalent. with caching they are the same object, without they are not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I prefer be
over equal
for that because equal
is so close to eq
and eql
that it's way too confusing that one checks equality, the other identity. I think be
communicates that better (+ the docs mention it as primary for object identity), I can also change it myself though :)
expect(actual).to be(expected) # passes if actual.equal?(expected)
expect(actual).to eq(expected) # passes if actual == expected
expect(actual).to eql(expected) # passes if actual.eql?(expected)
expect(actual).to equal(expected) # passes if actual.equal?(expected)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh hah, I had always assumed be
did an ==
check, not .equal?
... i'll change these equal
matchers to be
instead 👍
spec/simplecov_spec.rb
Outdated
end | ||
|
||
it "doesn't store the current coverage" do | ||
expect(SimpleCov::ResultMerger).to receive(:store_result).never |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason against doing not_to receive
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope, i can change it. either way sounds fine to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer not_to
but can also do the change myself if needbe. Thanks a lot :)
spec/simplecov_spec.rb
Outdated
end | ||
|
||
it "doesn't merge the result" do | ||
expect(SimpleCov::ResultMerger).to receive(:merged_result).never |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same thing about not_to
:)
spec/simplecov_spec.rb
Outdated
end | ||
|
||
it "merges the result" do | ||
expect(SimpleCov.result).to equal(the_merged_result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not quite sure but an empty double might be equal to any other empty double. Better give it a name or something like double "some merged result"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#equal?
:)
|
||
it "uses the result from Coverage" do | ||
expect(Coverage).to receive(:result).once.and_return({}) | ||
SimpleCov.result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like we might wanna assert the return value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a little weird here because .result
just gives us the the_merged_result
it got from SimpleCov::ResultMerger
... so I don't know how useful it is to check the return value here.
That said, I tweaked the similar spec on line 31 (non merging scenario) to show that the Coverage data does indeed get used
end | ||
|
||
it "adds not-loaded-files" do | ||
expect(SimpleCov).to receive(:add_not_loaded_files).once.and_return({}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as add_not_loaded_files is on the thing under test I thinkw e mgiht opt to test what it does not that it gets called, wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, well i'd say SimpleCov.result
is the thing under test, not SimpleCov
, so i kind of like this test ¯_(ツ)_/¯ ... maybe add_not_loaded_files
could/should by handled by a separate object (though that's a bigger refactor) ? at any rate, it feels weird for a .result
test to have to stub/fake a bunch of File
/Dir
stuff. though i do agree some sort of test around add_not_loaded_files
' behavior would be nice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not wanting to stub a bunch of file stuff is a perfectly fine reason for stubbing early 👍 I somehow thought the effects would be easier to test, but with this I much prefer the current stub. True, maybe a refactoring for another object but I'd have to dig deeper to see what scope would make sense there.
lib/simplecov.rb
Outdated
# | ||
# Clear out the previously cached .result. Primarily useful in testing | ||
# | ||
def clear_result! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Imo there should only be !
versions of methods that exist without !
to show slightly more dangerous ones so I'd opt to go without this :)
spec/simplecov_spec.rb
Outdated
expect(SimpleCov.result).to equal(result) | ||
end | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for speccing all this out!
The other idea seems good btw. - thank you. I'm just not sure how the |
the idea is that it's possible that at any rate, i'll try to push up a tweaked PR in the next few days |
@jenseng thank you a lot, let me know if you need some help or something so that we can get this released 🎉 |
78c1171
to
0e1d6e8
Compare
Ok, I believe it should be good to go... I think I addressed all the feedback, and commented where it didn't make sense to change it IMO. I also went ahead and made this change so that reads are also synchronized. Let me know if I missed anything, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a bunch of questions around the file write lock :) Thanks for all your work! 🎉
# given time | ||
def synchronize_resultset | ||
# make it reentrant | ||
return yield if defined?(@resultset_locked) && @resultset_locked |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need the defined?
check? It should default to nil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does, but it issues a ruby warning instance variable @resultset_locked not initialized
, which fails the build, thanks to spec/support/fail_rspec_on_ruby_warning.rb
# given time | ||
def synchronize_resultset | ||
# make it reentrant | ||
return yield if defined?(@resultset_locked) && @resultset_locked |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also still not sure what making it reentrant here means and accomplishes, but reentrancy also has been a long time ago for me. The way I see it, the block we are given is now also called when the result set is already locked. Doesn't this effectively circumvent the lock?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, so the idea is that if we call synchronize_resultset
when we are already inside a synchronize_resultset
call, we just let it through, since we know we already have a lock from the outer call. So we're not circumventing the lock, so much as saying "we already have a lock higher in the call stack, so just run the block". Due to how flock works, we would get a deadlock if we didn't, since the inner flock would wait forever for the outer one to release.
The reason we need to do this is we want to make both reads and writes acquire a lock so that neither one does anything unsafe. And in order to safely do a write, we 1. first need to read and 2. we need to ensure nobody else writes between our read and our write, otherwise we might lose data. So to accomplish that, write does synchronize_resultset
around the whole thing so that any other processes have to wait. And since its nested read will also do synchronize_resultset
, we use @resultset_locked
to let it know it doesn't actually need a lock since we already got one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which works as we are just one process, got it, thanks for taking the time to explain! :)
# given time | ||
def synchronize_resultset | ||
# make it reentrant | ||
return yield if defined?(@resultset_locked) && @resultset_locked |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of that said, do you happen to know of a way to test this? We could probably extract synchronize_resultset and yse that to write some specs around it which might be worthwhile, not sure. I just see code that I have questions about and often feel better, the better that code is tested :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I have a couple ideas, I'll tweak the PR with some more tests
2e37743
to
a70e27b
Compare
Ok, the latest tweak adds 5 new specs to spec/result_merger_spec.rb (4 around synchronization, 1 around persisting to disk), bringing the total new specs in this pull request up to 19 😅 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a ton once again :D
Now, we just need to get JRuby working again and then this can get MERGED :D Thanks for your patience and all the spec writing! 🙏
spec/result_merger_spec.rb
Outdated
fork do | ||
# ensure the parent process has enough time to get a | ||
# lock before we do | ||
sleep 0.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fork is not implemented on JRuby... :'( can we go and launch a separate one off script as a replacement?
With JRuby, we'd also need to wait way longer for that but that might be switched on whether we are running JRuby or not... or we could just switch the whole spec if it's too much of a hassle.
1) SimpleCov::ResultMerger.synchronize_resultset blocks other processes
Failure/Error: fork do
NotImplementedError:
fork is not available on this platform
# ./spec/result_merger_spec.rb:137:in `block in (root)'
spec/result_merger_spec.rb
Outdated
|
||
Process.wait | ||
|
||
expect(file.read).to eq("process 1\nprocess 2\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👏
Ensure we cache and synchronize appropriately so we don't merge more than once or do unsafe reads of the resultset. This has the added benefit of reducing the runtime of at_exit hooks by over 66%, since we now do less than 1/3 the work. See https://gist.github.com/jenseng/62465f674f8c02de09ef776f23d4dca4 for simple repro script. Basically there are 3 main problems when using merging: 1. `SimpleCov::ResultMerger.stored_data` doesn't synchronize its read, so it can see an empty file if another process is writing it. 2. `SimpleCov::ResultMerger.resultset` calls `.stored_data` twice and never caches the result. 3. `SimpleCov.result` doesn't cache the `@result`, so the default `at_exit` behavior causes it to store and merge 3 times. Due to 1. and 2., this is extra bad This can cause the formatter to miss out on coverage data and/or get the wrong values for covered percentages. Furthermore, if you use OJ, `JSON.parse("") -> nil`, which means `.resultset` can be nil, so this race condition causes exceptions as seen in simplecov-ruby#406. In addition to fixing the race condition, also add ` || {}` to make `.stored_data` a bit more robust and protect against an empty .resultset.json.
a70e27b
to
6f12dd6
Compare
Ok hopefully that should do it 🤞 ... spec works more or less the same as before, except now it uses |
Woohoo, thanks! 🎉 🍰 🍺 |
* [ci skip] * reordering in bug fixes shows my subjective perception of how many people this might affect/relevance :)
0.15.0 (2017-08-14) ([changes](simplecov-ruby/simplecov@v0.14.1...v0.15.0)) ======= ## Enhancements * Ability to use regex filters for removing files from the output. See [#589](simplecov-ruby/simplecov#589) (thanks @jsteel) ## Bugfixes * Fix merging race condition when running tests in parallel and merging them. See [#570](simplecov-ruby/simplecov#570) (thanks @jenseng) * Fix relevant lines for unloaded files - comments, skipped code etc. are correctly classigied as irrelevant. See [#605](simplecov-ruby/simplecov#605) (thanks @odlp) * Allow using simplecov with frozen-string-literals enabled. See [#590](simplecov-ruby/simplecov#590) (thanks @pat) * Make sure Array Filter can use all other filter types. See [#589](simplecov-ruby/simplecov#589) (thanks @jsteel) * Make sure file names use `Simplecov.root` as base avoiding using full absolute project paths. See [#589](simplecov-ruby/simplecov#589) (thanks @jsteel)
Ensure we cache appropriately so we don't merge more than once or do unsafe reads of the resultset.
See https://gist.github.com/jenseng/62465f674f8c02de09ef776f23d4dca4 for simple repro script.
Fixes #406 and may partially address #559 (though enforcing covered_percentage still requires workarounds à la this comment, since any process before the last one won't have all the coverage data)
Basically there are 3 main problems when using merging:
SimpleCov.result
doesn't cache the@result
, so the defaultat_exit
behavior causes it to store and merge 3 times.SimpleCov::ResultMerger.resultset
calls.stored_data
twice.SimpleCov::ResultMerger.merged_result
doesn't synchronize or use a cached.resultset
, so a concurrent.store_result
call can cause us to read an empty file.This can cause the formatter to miss out on coverage data in our formatters and/or get the wrong values for covered percentages.
Furthermore, if you use OJ,
JSON.parse("") -> nil
, which means.resultset
can be nil, causing exceptions as seen in #406. So I went ahead and rolled that fix into this, since|| {}
is trivial, and the bug can easily be verified with my repro script above.