Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve multicore SDR #1501

Merged
merged 3 commits into from
Sep 9, 2021
Merged

Improve multicore SDR #1501

merged 3 commits into from
Sep 9, 2021

Conversation

dignifiedquire
Copy link
Contributor

Based on #1477 this updates the parent selection logic in multicore sdr.

It does not implement the bitmask update, as that is not thread safe, and only the thread that fills the buffer is allowed to set this value. This I missed in my earlier review.

@qy3u
Copy link
Contributor

qy3u commented Sep 7, 2021

It seems that

// Don't overrun the buffer
while cur_node > (parents_cache.get_consumer() + lookahead - 1) {
thread::sleep(Duration::from_micros(10));
}
makes that update the bitmask before increase consumer in the hash loop thread safe.
Otherwise, the current fill& getimplemention should not be thread safe too? 🤔

@cryptonemo cryptonemo mentioned this pull request Sep 7, 2021
10 tasks
@dignifiedquire
Copy link
Contributor Author

The original code was

*base_parent_missing = 0x20; // Mark base parent 5 as missing

instead of

base_parent_missing.set(5);

which only sets the bit, without clearing. Which resulted in the bug you discovered.

@dignifiedquire
Copy link
Contributor Author

The last change speeds up replication from 12min to 11min on my machine.

Copy link
Collaborator

@porcuquine porcuquine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dignifiedquire and I went over this, and I believe it now clears the bitmask correctly (and in correct place). This is consistent with the observation that original code (no clearing) was slowest, but first attempted fix was much faster but failed tests. Given that we also have an understanding of the original transcription error that led to the original code, I'm pretty confident in this iteration.

Nevertheless, especially since we've been bitten once, we should make sure this code is tested at least as much as the first attempted fix was. Remember, the discovered test failures were non-deterministic and not expected to always manifest. We need to test this enough times to have reasonable expectation that any failures introduced will have showed up. If we see any such failures, we need to take them seriously (as we did with the previous).

Even though I'm more confident in this than I was in the previous, I'm blocking merging here until I hear that we have successfully tested this (as thoroughly as we eventually did the previous). The reason I am requesting changes rather than approving with 'conditions' is that the latter approach didn't prevent the code from being merged last time. So this is a slight process adjustment based on what we learned in the last round: we need a bit of extra diligence here.

cryptonemo
cryptonemo previously approved these changes Sep 9, 2021
Copy link
Collaborator

@cryptonemo cryptonemo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ran the small lifecycle tests looped overnight without issue. This is where the failure showed up both on CI and on my machine in the past.

@dignifiedquire
Copy link
Contributor Author

Rebased into clear commits, the different changes.

@cryptonemo
Copy link
Collaborator

Did you want me to re-test here, or are we good to go?

Copy link
Collaborator

@porcuquine porcuquine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on @cryptonemo's testing — which he believes is sufficient to have caught the previous error but uncovered no problems here — I am approving this now.

@cryptonemo cryptonemo merged commit ff7daa9 into master Sep 9, 2021
@dignifiedquire dignifiedquire deleted the fix-bitmask-multicore-sdr branch September 9, 2021 19:05
cryptonemo added a commit that referenced this pull request Sep 16, 2021
cryptonemo added a commit that referenced this pull request Sep 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants