-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
3966 add_epoch_root and sync_l1 on Membership #3984
base: main
Are you sure you want to change the base?
Conversation
crates/types/src/traits/election.rs
Outdated
/// Handles notifications that a new epoch root has been created | ||
/// Is called under a read lock to the Membership. Return a callback | ||
/// with Some to have that callback invoked under a write lock. | ||
fn add_epoch_root( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't we say we need this to be async?
Currently I imagine we would want to do external requests here or even if we de-couple it we may want to take out some async locks or read from channels. Then make the closure with &mut self minimal, so only update the internal state and do nothing else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be non async but sync_l1 should be async?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really want to use sync_l1
unless it can't be avoided. It shouldn't leak into consensus that we are using the L1 if we can avoid it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have only one function, then yes, it should be async due to calls to L1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, I kept sync_l1 as well but both are async. If those are never implemented then it's free to drop them from hotshot at a later time.
.block_number(); | ||
|
||
// Skip if this is not the expected block. | ||
if task_state.epoch_height != 0 && (decided_block_number + 3) % task_state.epoch_height == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently what would happen if a node misses the block when the update is triggered? Would it never have add_epoch_root
called for that epoch?
I'm wondering if we the Membership trait should have a method like is_epoch_initialized(&self, epoch) -> bool
so that if the node is not online for the beginning of the epoch we can still initialize the membership for the epoch. But then we also have to dig out the correct header from history which I think might be cumbersome.
So maybe this should be handled with catchup in the confirmation layer? What do you think @imabdulbasit ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think doing a catchup is better, but I am wondering how we can do that now with Hotshot managing the locks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For our existing catchup it's hotshot that triggers the catchup (for example by calling validate_and_apply_header
) and the confirmation layer that performs the catchup. Maybe it should be the same here: hotshot needs to call something to trigger catchup for the membership and the sequencer performs the catchup.
If we were to do it analoguous for membership we would have to make all the reading functions async and &mut self which seems terrible. I think we can think of something better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example hotshot could call add_epoch_root
(probably with another name) for every view then the sequencer could decide if it needs to do catchup for this epoch (and in most cases do (almost) nothing and return None
as write_callback). This way we ensure the catchup is actually triggered.
I think our overall design is a bit sub-optimal because it's an invariant that the membership is static for an epoch but that invariant isn't really reflected in the code. I think eventually we should enforce this in hotshot and the confirmation layer should somehow provide a constructor for a "EpochMembership" type that takes an epoch
as input that is called by hotshot where needed. @ss-es mentioned this would be difficult to do in hotshot but I'm not sure pushing the complexity elsewhere is actually better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah but then hotshot has to decide when to call it? I think it would calling it for every view would be too much maybe. It makes sense to do catch up when the membership get functions() do not return any data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we do catchup at the node startup and pass the Committee
after catchup so hotshot does not need to do anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think catchup at startup might not work if a node misses some views for some other reason, for example due to a temporary network outage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah but then hotshot has to decide when to call it? I think it would calling it for every view would be too much maybe.
If it doesn't do anything except for once per epoch then what's the problem with calling it every view?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if the Membership trait should have a method like
is_epoch_initialized(&self, epoch) -> bool
so that if the node is not online for the beginning of the epoch we can still initialize the membership for the epoch. But then we also have to dig out the correct header from history which I think might be cumbersome.
Why do we need this? When node comes online it should go through same routine as all nodes, which includes calling add_epoch_root
for the block being proposed/validated. Hotshot should ensure mechanisms are in place to concert the late joining node. I don't see the need for an extra mechanism.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be reading it wrongly but I think the current implementation in this PR only calls the function if decided_block_number + 3) % task_state.epoch_height == 0
so if it joins at a later point and the decided block has moved on it won't call it. Maybe I'm misunderstanding.
epoch_from_block_number(decided_block_number, task_state.epoch_height) + 1, | ||
); | ||
|
||
let membership_reader = task_state.membership.read().await; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly personal preference, but I think it is more idiomatic to prefer introducing a scoped block over explicit drop
s. Something like:
let write_callback = {
let membership_reader = task_state.membership.read().await;
membership_reader.add_epoch_root(next_epoch_number, proposal.block_header.clone())
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Closes #3966
This PR:
Adds add_epoch_root and sync_l1 to Membership trait, calling them from within quorum_vote.
This PR does not:
Key places to review: