Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3966 add_epoch_root and sync_l1 on Membership #3984

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

pls148
Copy link
Contributor

@pls148 pls148 commented Dec 19, 2024

Closes #3966

This PR:

Adds add_epoch_root and sync_l1 to Membership trait, calling them from within quorum_vote.

This PR does not:

Key places to review:

/// Handles notifications that a new epoch root has been created
/// Is called under a read lock to the Membership. Return a callback
/// with Some to have that callback invoked under a write lock.
fn add_epoch_root(
Copy link
Contributor

@sveitser sveitser Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't we say we need this to be async?

Currently I imagine we would want to do external requests here or even if we de-couple it we may want to take out some async locks or read from channels. Then make the closure with &mut self minimal, so only update the internal state and do nothing else.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be non async but sync_l1 should be async?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really want to use sync_l1 unless it can't be avoided. It shouldn't leak into consensus that we are using the L1 if we can avoid it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have only one function, then yes, it should be async due to calls to L1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, I kept sync_l1 as well but both are async. If those are never implemented then it's free to drop them from hotshot at a later time.

.block_number();

// Skip if this is not the expected block.
if task_state.epoch_height != 0 && (decided_block_number + 3) % task_state.epoch_height == 0 {
Copy link
Contributor

@sveitser sveitser Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently what would happen if a node misses the block when the update is triggered? Would it never have add_epoch_root called for that epoch?

I'm wondering if we the Membership trait should have a method like is_epoch_initialized(&self, epoch) -> bool so that if the node is not online for the beginning of the epoch we can still initialize the membership for the epoch. But then we also have to dig out the correct header from history which I think might be cumbersome.

So maybe this should be handled with catchup in the confirmation layer? What do you think @imabdulbasit ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think doing a catchup is better, but I am wondering how we can do that now with Hotshot managing the locks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For our existing catchup it's hotshot that triggers the catchup (for example by calling validate_and_apply_header) and the confirmation layer that performs the catchup. Maybe it should be the same here: hotshot needs to call something to trigger catchup for the membership and the sequencer performs the catchup.

If we were to do it analoguous for membership we would have to make all the reading functions async and &mut self which seems terrible. I think we can think of something better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example hotshot could call add_epoch_root (probably with another name) for every view then the sequencer could decide if it needs to do catchup for this epoch (and in most cases do (almost) nothing and return None as write_callback). This way we ensure the catchup is actually triggered.

I think our overall design is a bit sub-optimal because it's an invariant that the membership is static for an epoch but that invariant isn't really reflected in the code. I think eventually we should enforce this in hotshot and the confirmation layer should somehow provide a constructor for a "EpochMembership" type that takes an epoch as input that is called by hotshot where needed. @ss-es mentioned this would be difficult to do in hotshot but I'm not sure pushing the complexity elsewhere is actually better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah but then hotshot has to decide when to call it? I think it would calling it for every view would be too much maybe. It makes sense to do catch up when the membership get functions() do not return any data?

Copy link
Contributor

@imabdulbasit imabdulbasit Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we do catchup at the node startup and pass the Committee after catchup so hotshot does not need to do anything?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think catchup at startup might not work if a node misses some views for some other reason, for example due to a temporary network outage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah but then hotshot has to decide when to call it? I think it would calling it for every view would be too much maybe.

If it doesn't do anything except for once per epoch then what's the problem with calling it every view?

Copy link
Contributor

@tbro tbro Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if the Membership trait should have a method like is_epoch_initialized(&self, epoch) -> bool so that if the node is not online for the beginning of the epoch we can still initialize the membership for the epoch. But then we also have to dig out the correct header from history which I think might be cumbersome.

Why do we need this? When node comes online it should go through same routine as all nodes, which includes calling add_epoch_root for the block being proposed/validated. Hotshot should ensure mechanisms are in place to concert the late joining node. I don't see the need for an extra mechanism.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be reading it wrongly but I think the current implementation in this PR only calls the function if decided_block_number + 3) % task_state.epoch_height == 0 so if it joins at a later point and the decided block has moved on it won't call it. Maybe I'm misunderstanding.

epoch_from_block_number(decided_block_number, task_state.epoch_height) + 1,
);

let membership_reader = task_state.membership.read().await;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly personal preference, but I think it is more idiomatic to prefer introducing a scoped block over explicit drops. Something like:

let write_callback = {
    let membership_reader = task_state.membership.read().await;
    membership_reader.add_epoch_root(next_epoch_number, proposal.block_header.clone())
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@sveitser sveitser mentioned this pull request Dec 20, 2024
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Membership trait changes to support PoS
6 participants