Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix] Updates conditions to advance from odd and even rounds #3119

Merged
merged 19 commits into from
Mar 22, 2024

Conversation

mdelle1
Copy link
Contributor

@mdelle1 mdelle1 commented Feb 23, 2024

Motivation

This PR updates the conditions to advance from odd and even rounds by adding a quorum check. Previously, in odd rounds the code would return ‘true’ if the previous even round leader was unseen [1]. In some scenarios, this would cause nodes to pre-maturely increment rounds before seeing ‘2f+1’ certificates, the quorum threshold to increment rounds in Bullshark. For example, if the leader in even round ‘r’ was unseen, in odd round ‘r+1’ a malicious node could send a batch header with a round number beyond the GC limit. When nodes try to sync with the batch header [2], they will increment their BFT to the next round since the previous leader was unseen.

[1] https://github.com/AleoHQ/snarkOS/blob/mainnet/node/bft/src/bft.rs#L370
[2] https://github.com/AleoHQ/snarkOS/blob/mainnet/node/bft/src/primary.rs#L1249

@vicsn vicsn self-requested a review February 23, 2024 11:34
@raychu86 raychu86 marked this pull request as draft February 23, 2024 17:38
@howardwu
Copy link
Contributor

howardwu commented Mar 9, 2024

@mdelle1 can you address this PR's issues?

Copy link

@mimoo mimoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch

@raychu86 raychu86 marked this pull request as ready for review March 11, 2024 16:15
@raychu86 raychu86 force-pushed the fix/bft-increment-rounds branch 2 times, most recently from c522e9a to 044ff0c Compare March 11, 2024 23:55
Signed-off-by: Raymond Chu <14917648+raychu86@users.noreply.github.com>
@raychu86
Copy link
Contributor

Note: This PR was tested on a network with a malicious validator that was spoofing rounds and did not run into block production issues.

@vicsn
Copy link
Contributor

vicsn commented Mar 12, 2024

I ran this PR (c9e9d01) with 5 nodes on my laptop and every node periodically gets an error, after this PR it should be a debug statement because we won't be ready until we see quorum.

2024-03-12T10:52:50.035313Z ERROR BFT - A leader certificate was found, but 'is_ready' is false is_ready=false

I ran mainnet-latest (3b47375) and it indeed didn't have the same issue.

node/bft/src/bft.rs Outdated Show resolved Hide resolved
return false;
}
// Retrieve the leader certificate.
let Some(leader_certificate) = self.leader_certificate.read().clone() else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this moved down? We can return early if the leader certificate is not found, and not incur the quorum threshold check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the leader was not found, you should advance to the next round only if you've seen a quorum of certificates. Otherwise, nodes may prematurely jump to the next round on a faulty leader before seeing quorum.

node/bft/src/bft.rs Outdated Show resolved Hide resolved
node/bft/src/bft.rs Outdated Show resolved Hide resolved
raychu86 and others added 2 commits March 12, 2024 16:46
Co-authored-by: Howard Wu <9260812+howardwu@users.noreply.github.com>
Signed-off-by: Raymond Chu <14917648+raychu86@users.noreply.github.com>
Co-authored-by: Howard Wu <9260812+howardwu@users.noreply.github.com>
Signed-off-by: Raymond Chu <14917648+raychu86@users.noreply.github.com>
Co-authored-by: Howard Wu <9260812+howardwu@users.noreply.github.com>
Signed-off-by: Raymond Chu <14917648+raychu86@users.noreply.github.com>
// Retrieve the authors of the current certificates.
let authors = current_certificates.clone().into_iter().map(|c| c.author()).collect();
// Check if quorum threshold is reached.
if !committee_lookback.is_quorum_threshold_reached(&authors) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this particular quroum_threshold_reached check needed? Does this not invalidate the subsequent checks below?

 `stake_with_leader >= committee_lookback.availability_threshold()
            || stake_without_leader >= committee_lookback.quorum_threshold()`

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wouldn't invalidate the previous check because you can still see 'f+1' votes for the leader but not see '2f+1' certificates in the round. Here you should commit the leader, but wait to see '2f+1' certificates before advancing to the next round. Previously, the code was also returning 'true' if the timer had expired. But again, with an expired timer we should still include a quorum check before advancing.

@howardwu howardwu merged commit 09c14db into mainnet-staging Mar 22, 2024
23 of 24 checks passed
@howardwu howardwu deleted the fix/bft-increment-rounds branch March 22, 2024 22:08
@howardwu howardwu changed the title Updates conditions to advance from odd and even rounds [Fix] Updates conditions to advance from odd and even rounds Mar 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants