-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove data race caused by doing sample on rum thread #1177
Remove data race caused by doing sample on rum thread #1177
Conversation
@cltnschlosser Thanks for contributing and explaining the issue so well! We decided to follow the other approach you suggested and update the unit tests afterwards. We'll release this as a hotfix release as soon as possible. I'll keep the opened issue posted with the updates. |
@cltnschlosser Thanks for contributing and explaining the issue so well! We decided to follow the other approach you suggested, and update the unit tests afterwards. This will ensure we start gathering the vitals as soon as we instantiate the object. We'll release this as a hotfix as soon as possible. |
Looks like #1181 is actually crashing due to the overflow, so you'll want to fix both issues (Maybe memory issue is causing the overflow, not sure) |
@maciejburda I updated this diff to take the initial sample, fixed the broken test, added a new test as well. Also I changed the internals of VitalCPUReader to use EDIT: It could also just be the memory corruption / data race issue causing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was about to jump on this issue, but I can see you've got it all right!
Thanks a ton for the great contribution (again!).
I feel like it's more likely that we are dealing with memory corruption / data race.
I'll make sure it's merged and released asap.
Unable to reproduce this locally:
When I run it locally the value is |
Same, passes both locally and when using local CI CLI. Maybe we can try increasing the wait time? 🤔 I can take a closer look later today. |
Made some experiments and this change seems to do the trick: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good 👍, but please make sure the test is not flaky before merging it.
What and why?
Seeing this crash after upgrading from 1.12.1 to 1.15.0:
My initial suspicion was integer overflow with the
UInt32
s being used here (natural_t
), but I tried to reproduce that and I think it would result in a slightly different stacktrace. So then I noticed this comment inVitalCPUReader
:And I realized that it's crashing on the rum thread.
How?
Remove initial sample call that was happening on rum thread.
This passes current unit tests.
Alternatively this call can be moved to
Runloop.main.perform {}
, but that caused a test failure (could just be a test issue), so I went with this approach for now.Review checklist
Custom CI job configuration (optional)