-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fix] Updates conditions to advance from odd and even rounds #3119
Conversation
@mdelle1 can you address this PR's issues? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch
c522e9a
to
044ff0c
Compare
Signed-off-by: Raymond Chu <14917648+raychu86@users.noreply.github.com>
Note: This PR was tested on a network with a malicious validator that was spoofing rounds and did not run into block production issues. |
I ran this PR (c9e9d01) with 5 nodes on my laptop and every node periodically gets an error, after this PR it should be a debug statement because we won't be ready until we see quorum.
I ran mainnet-latest (3b47375) and it indeed didn't have the same issue. |
return false; | ||
} | ||
// Retrieve the leader certificate. | ||
let Some(leader_certificate) = self.leader_certificate.read().clone() else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this moved down? We can return early if the leader certificate is not found, and not incur the quorum threshold check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the leader was not found, you should advance to the next round only if you've seen a quorum of certificates. Otherwise, nodes may prematurely jump to the next round on a faulty leader before seeing quorum.
Co-authored-by: Howard Wu <9260812+howardwu@users.noreply.github.com> Signed-off-by: Raymond Chu <14917648+raychu86@users.noreply.github.com>
Co-authored-by: Howard Wu <9260812+howardwu@users.noreply.github.com> Signed-off-by: Raymond Chu <14917648+raychu86@users.noreply.github.com>
Co-authored-by: Howard Wu <9260812+howardwu@users.noreply.github.com> Signed-off-by: Raymond Chu <14917648+raychu86@users.noreply.github.com>
// Retrieve the authors of the current certificates. | ||
let authors = current_certificates.clone().into_iter().map(|c| c.author()).collect(); | ||
// Check if quorum threshold is reached. | ||
if !committee_lookback.is_quorum_threshold_reached(&authors) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this particular quroum_threshold_reached
check needed? Does this not invalidate the subsequent checks below?
`stake_with_leader >= committee_lookback.availability_threshold()
|| stake_without_leader >= committee_lookback.quorum_threshold()`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It wouldn't invalidate the previous check because you can still see 'f+1' votes for the leader but not see '2f+1' certificates in the round. Here you should commit the leader, but wait to see '2f+1' certificates before advancing to the next round. Previously, the code was also returning 'true' if the timer had expired. But again, with an expired timer we should still include a quorum check before advancing.
Motivation
This PR updates the conditions to advance from odd and even rounds by adding a quorum check. Previously, in odd rounds the code would return ‘true’ if the previous even round leader was unseen [1]. In some scenarios, this would cause nodes to pre-maturely increment rounds before seeing ‘2f+1’ certificates, the quorum threshold to increment rounds in Bullshark. For example, if the leader in even round ‘r’ was unseen, in odd round ‘r+1’ a malicious node could send a batch header with a round number beyond the GC limit. When nodes try to sync with the batch header [2], they will increment their BFT to the next round since the previous leader was unseen.
[1] https://github.com/AleoHQ/snarkOS/blob/mainnet/node/bft/src/bft.rs#L370
[2] https://github.com/AleoHQ/snarkOS/blob/mainnet/node/bft/src/primary.rs#L1249