-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added a CONF round #44
Conversation
The non-termination was due to the direct use of `bin_values` when invoking the common coin. The fix amounts to making the CONF round depend only on the subset of `bin_values` sent in the CONF message.
I cannot repeat the CI error on my machine. All tests pass. |
It was the broadcast test that failed with a |
src/agreement.rs
Outdated
/// subsets of `bool`. | ||
#[cfg_attr(feature = "serialization-serde", derive(Serialize))] | ||
#[derive(Clone, Copy, Debug, PartialEq, Eq, PartialOrd, Ord)] | ||
pub enum BinValues { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a lot of code; wouldn't it be easier to keep this as a set of bool
s, or maybe as a [bool; 2]
or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried [bool; 2] at first but replaced it with a specialised enum because you need to define operations anyway, insert
, union
, and also from_iter
for count_conf
, and those actually take the most of code. Maybe move the enum and the methods to a separate module?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, if we need this I'd also put it in a separate module. (Maybe a submodule of agreement
if it's only used here.)
src/agreement.rs
Outdated
/// the set of values carried by these messages, vals, are a subset of | ||
/// bin_values_r (note that bin_values_r may continue to change as BVAL_r | ||
/// messages are received, thus this condition may be triggered upon arrival | ||
/// of either an AUX_r or a BVAL_r message). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should use the same notation as the code in our comments (rather than the one from the paper), i.e. "Aux(_, b)
" and "Value(_, b)
" instead of "AUX_r
" and "BVAL_r
", to be consistent. At least in doc comments (///
— but maybe also in other comments?) all expressions that refer to the code should be in backticks. (That's a Clippy lint, but Clippy can't always detect it.)
To avoid having the underscore everywhere (and be able to just write e.g. "Aux(b)
") we could change the types:
AgreementMessage {
epoch: u32,
content: AgreementMessageContent,
}
(Anyway, just brainstorming here; none of this is required to merge this PR.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. This paragraph was actually copied from the Honey Badger paper. I can lift the epoch in this PR. It's no biggie.
src/agreement.rs
Outdated
} | ||
|
||
/// Counts the number of received CONF messages. The argument `sent_conf` contains the values | ||
/// committed committed previously at the start of the CONF round. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should have committed "committed" instead of "committed committed". 😁
self.sent_bval.clear(); | ||
self.received_aux.clear(); | ||
self.epoch += 1; | ||
self.start_next_epoch(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, that's much clearer! 👍
src/agreement.rs
Outdated
fn handle_conf(&mut self, sender_id: &NodeUid, v: BinValues) -> AgreementResult<()> { | ||
self.received_conf.insert(sender_id.clone(), v); | ||
if let Some(sent_conf) = self.sent_conf { | ||
let (count_vals, vals) = self.count_conf(sent_conf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still a bit confused about how the fix is supposed to work: Do we need to count the other nodes' Conf
s that contain a subset of our own Conf
, or of our current bin_values
, which could have grown since we sent the Conf
? (Or couldn't it?)
Is it guaranteed to terminate this way? I imagine that we could have sent a Conf(True)
but everyone else sent a Conf(Both)
? In that case we would never get past this round, I think. Or can't that actually happen?
If it can, we probably need to use bin_values
here instead. (I.e. count_conf
wouldn't need an argument, and sent_conf
could just be a bool
. We'd need to check the condition whenever we receive a Conf
as well as when we receive a Value
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
A few minor nit-picks, but nothing blocking.
src/agreement/mod.rs
Outdated
#[cfg_attr(feature = "serialization-serde", derive(Serialize))] | ||
#[derive(Clone, Debug, PartialEq, Eq, PartialOrd, Ord)] | ||
pub enum AgreementMessage { | ||
pub enum AgreementContent { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docs of the variants need to be updated: they still say "with an epoch".
src/agreement/mod.rs
Outdated
received_aux: HashMap<NodeUid, bool>, | ||
/// Values received in `Aux` messages. Reset on every epoch update. | ||
received_aux: BTreeMap<NodeUid, bool>, | ||
/// Received CONF messages. Reset on every epoch update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's still a "CONF
" here, and a "BVAL
" four lines above.
let our_uid = self.uid.clone(); | ||
self.handle_bval(&our_uid, b) | ||
} | ||
|
||
fn send_conf(&mut self) -> AgreementResult<()> { | ||
if self.conf_round { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be called sent_conf
, analogously to the other two sent_*
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That might imply a wrong reading of sent_conf
as the actual sent value. Instead, conf_round
is a property whether Agreement
is currently performing a Conf
round.
src/agreement/mod.rs
Outdated
/// FIXME: Clarify whether the values of AUX messages should be the same or | ||
/// not. It is assumed in `count_aux` that they can differ. | ||
/// FIXME: Clarify whether the values of `Aux` messages should be the same or not. It is assumed | ||
/// in `count_aux` that they can differ. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can that FIXME
be removed? The following paragraph seems to answer this question.
In order to guarantee termination, the common coin is invoked in the CONF round only for the subset of
bin_values
committed to in the CONF message. The Cobalt fix used the whole set ofbin_values
, which led to non-termination in our tests.