Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lr/double quorum #3922

Merged
merged 66 commits into from
Dec 18, 2024
Merged

Lr/double quorum #3922

merged 66 commits into from
Dec 18, 2024

Conversation

lukaszrzasik
Copy link
Contributor

@lukaszrzasik lukaszrzasik commented Nov 26, 2024

Closes #<ISSUE_NUMBER>

This PR:

  • Modifies the QuorumProposal and Leaf2 types to include the optional next epoch justify QC.
  • Adds a new type NextEpochQuorumData2. It's a thin wrapper around QuorumData2. The additional type makes it possible to distinguish between vote and certificate type aliases.
  • Adds a new hotshot event NextEpochQc2Formed to signal that a QC for has been formed using votes from the nodes in the next epoch.
  • Adds next epoch vote collectors in the consensus task. The collectors collect votes from the nodes that belong to the next epoch. Given enough votes an additional (next epoch) QC is formed.
  • Adds logic to save the payload early. The payload is stored as soon as the block is formed and the node is able send the DA proposal. Up until now the payload was stored only after a node received and validated the DA proposal. The payload might be needed earlier to be able to calculate VID shares for the nodes in the next epoch.
  • Adds logic to attach the next epoch QC to the proposal if the proposal's normal justify QC is for the last block in the epoch.
  • Modifies the proposal validating logic. Adds additional checks:
    • if it's an epoch transition proposal, make sure the next epoch justify QC exists
    • make sure the justify QC and the next epoch justify QC are for the same leaf
    • validate the signatures on the next epoch justify QC
  • Modifies the voting logic so that the nodes from the next epoch vote for the last block in the previous epoch.
  • Adds logic to send the VID shares for the next epoch nodes when sending a proposal for the last block in the epoch.
  • Adds a field in the shared consensus state that stores the highest known next epoch QC. Adds a similar field to the storage.
  • Adds an epoch parameter to the threshold methods in the Membership trait. This is required because the thresholds depend on the number of nodes in the given epoch.
  • Modifies OverallSafetyTask. It now uses the total nodes number and threshold directly from the Membership object. This is required to properly test the types with the variable stake table.

This PR does not:

  • Modify the DA logic. It means that the next epoch nodes do not participate in the DAC formation.
  • Modify the timeout and view sync logic. These will be adjusted in the next PRs.

Key places to review:

if block_number == 0 || epoch_height == 0 {
false
} else {
block_number % epoch_height == 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned on Zulip, I think this should be block_number + 1.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nvm now I understand that block 0 is a special case.

Base automatically changed from lr/variable-stake-table to main December 12, 2024 16:29
Copy link
Contributor

@ss-es ss-es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense overall, just some comments

Comment on lines 1004 to 1005
/// Next epoch highest QC that was seen
next_epoch_high_qc: Option<NextEpochQuorumCertificate2<TYPES>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit unclear to me (also it's not clear we need it in the hotshot initializer; does it cause problems to default it to None?)

can the comment be updated to be more descriptive about where this should come from?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need it if a node restarted during epoch transition and is about to propose. I've adjusted the comment.

// If we haven't upgraded to Epochs just return None right away
if self
.upgrade_lock
.version(self.view_number)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use version_infallible here I think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, thank you!

Comment on lines +208 to +213
if let Some(next_epoch_qc) = self.consensus.read().await.next_epoch_high_qc() {
if next_epoch_qc.data.leaf_commit == high_qc.data.leaf_commit {
// We have it already, no reason to wait
return Some(next_epoch_qc.clone());
}
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this, can't we pass this directly to this function call as an Option<> (like we pass high_qc)? Do we have to lock consensus state here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean here. I've created an issue to follow up on the unresolved comments from this PR: #3978

Comment on lines +184 to +189
if justify_qc.view_number() != next_epoch_justify_qc.view_number()
|| justify_qc.data.epoch != next_epoch_justify_qc.data.epoch
|| justify_qc.data.leaf_commit != next_epoch_justify_qc.data.leaf_commit
{
bail!("Next epoch justify qc exists but it's not equal with justify qc.");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we replace this with ensure!()? I feel it's easier to read the check that way

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe the one right after this as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I agree. I used bail! here to be consistent with the surrounding code. We might consider creating a task for rewriting all the bail / ensure statements in this function. For now I've added your comment to the follow up issue: #3978

@@ -157,6 +166,65 @@ impl<TYPES: NodeType, I: NodeImplementation<TYPES>> VidTaskState<TYPES, I> {
return None;
}

HotShotEvent::QuorumProposalSend(proposal, _) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason we can't do this on BlockRecv? (is it because we aren't currently passing the block number there?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there is a reason. This is related to the discussion we had about the epoch acquired at the moment when BlockRecv is issued being unreliable. In the next PR I've slightly changed the view change logic. It seems that thanks to that the BlockRecv's epoch has become reliable. I need to test it some more and, if everything works as expected, I will adjust the code here as well.

&mut self,
high_qc: NextEpochQuorumCertificate2<TYPES>,
) -> Result<()> {
ensure!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we rewrite this to avoid the unwrap call even if we know it's fine?

e.g. let Some(qc) = self.next_epoch... else { return Ok(()); }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, I've changed it.

@@ -493,6 +494,7 @@ impl<TYPES: NodeType, I: NodeImplementation<TYPES>, V: Versions> SystemContext<T

let api = self.clone();
let view_number = api.consensus.read().await.cur_view();
let epoch = api.consensus.read().await.cur_epoch();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let consensus_reader = api.consensus.read().await;
let view_number = consensus_reader.cur_view();
let epoch = consensus_reader.cur_epoch();
drop(consensus_reader);

Copy link
Contributor

@pls148 pls148 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good. I suggested one change regarding locking, but I wouldn't consider it too critical as I'll be making a cleanup pass later on for locking behavior.

I'd also address @ss-es comments before merging!

@pls148 pls148 self-assigned this Dec 17, 2024
@lukaszrzasik
Copy link
Contributor Author

Generally looks good. I suggested one change regarding locking, but I wouldn't consider it too critical as I'll be making a cleanup pass later on for locking behavior.

I'd also address @ss-es comments before merging!

I actually answered Salman's comments earlier today but forgot to publish my answer. I'll adjust the lock / drop part.

@lukaszrzasik lukaszrzasik merged commit a927746 into main Dec 18, 2024
17 checks passed
@lukaszrzasik lukaszrzasik deleted the lr/double-quorum branch December 18, 2024 16:16
@sveitser sveitser mentioned this pull request Dec 19, 2024
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants