Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added a CONF round #44

Merged
merged 7 commits into from
May 28, 2018
Merged

Added a CONF round #44

merged 7 commits into from
May 28, 2018

Conversation

vkomenda
Copy link
Contributor

In order to guarantee termination, the common coin is invoked in the CONF round only for the subset of bin_values committed to in the CONF message. The Cobalt fix used the whole set of bin_values, which led to non-termination in our tests.

vkomenda added 2 commits May 23, 2018 18:38
The non-termination was due to the direct use of `bin_values` when invoking the
common coin. The fix amounts to making the CONF round depend only on the subset
of `bin_values` sent in the CONF message.
@vkomenda vkomenda requested a review from afck May 23, 2018 22:09
@vkomenda
Copy link
Contributor Author

I cannot repeat the CI error on my machine. All tests pass.

@afck
Copy link
Collaborator

afck commented May 24, 2018

It was the broadcast test that failed with a signal: 4, SIGILL: illegal instruction; so it probably doesn't have anything to do with your change. First I suspected that some dependency broke, but cargo update didn't reproduce the problem on my machine either. Maybe let's just restart the Travis build and see whether it happens again?

src/agreement.rs Outdated
/// subsets of `bool`.
#[cfg_attr(feature = "serialization-serde", derive(Serialize))]
#[derive(Clone, Copy, Debug, PartialEq, Eq, PartialOrd, Ord)]
pub enum BinValues {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a lot of code; wouldn't it be easier to keep this as a set of bools, or maybe as a [bool; 2] or something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried [bool; 2] at first but replaced it with a specialised enum because you need to define operations anyway, insert, union, and also from_iter for count_conf, and those actually take the most of code. Maybe move the enum and the methods to a separate module?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if we need this I'd also put it in a separate module. (Maybe a submodule of agreement if it's only used here.)

src/agreement.rs Outdated
/// the set of values carried by these messages, vals, are a subset of
/// bin_values_r (note that bin_values_r may continue to change as BVAL_r
/// messages are received, thus this condition may be triggered upon arrival
/// of either an AUX_r or a BVAL_r message).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use the same notation as the code in our comments (rather than the one from the paper), i.e. "Aux(_, b)" and "Value(_, b)" instead of "AUX_r" and "BVAL_r", to be consistent. At least in doc comments (/// — but maybe also in other comments?) all expressions that refer to the code should be in backticks. (That's a Clippy lint, but Clippy can't always detect it.)

To avoid having the underscore everywhere (and be able to just write e.g. "Aux(b)") we could change the types:

AgreementMessage {
    epoch: u32,
    content: AgreementMessageContent,
}

(Anyway, just brainstorming here; none of this is required to merge this PR.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. This paragraph was actually copied from the Honey Badger paper. I can lift the epoch in this PR. It's no biggie.

src/agreement.rs Outdated
}

/// Counts the number of received CONF messages. The argument `sent_conf` contains the values
/// committed committed previously at the start of the CONF round.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should have committed "committed" instead of "committed committed". 😁

self.sent_bval.clear();
self.received_aux.clear();
self.epoch += 1;
self.start_next_epoch();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, that's much clearer! 👍

src/agreement.rs Outdated
fn handle_conf(&mut self, sender_id: &NodeUid, v: BinValues) -> AgreementResult<()> {
self.received_conf.insert(sender_id.clone(), v);
if let Some(sent_conf) = self.sent_conf {
let (count_vals, vals) = self.count_conf(sent_conf);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still a bit confused about how the fix is supposed to work: Do we need to count the other nodes' Confs that contain a subset of our own Conf, or of our current bin_values, which could have grown since we sent the Conf? (Or couldn't it?)
Is it guaranteed to terminate this way? I imagine that we could have sent a Conf(True) but everyone else sent a Conf(Both)? In that case we would never get past this round, I think. Or can't that actually happen?
If it can, we probably need to use bin_values here instead. (I.e. count_conf wouldn't need an argument, and sent_conf could just be a bool. We'd need to check the condition whenever we receive a Conf as well as when we receive a Value.)

Copy link
Collaborator

@afck afck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!
A few minor nit-picks, but nothing blocking.

#[cfg_attr(feature = "serialization-serde", derive(Serialize))]
#[derive(Clone, Debug, PartialEq, Eq, PartialOrd, Ord)]
pub enum AgreementMessage {
pub enum AgreementContent {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs of the variants need to be updated: they still say "with an epoch".

received_aux: HashMap<NodeUid, bool>,
/// Values received in `Aux` messages. Reset on every epoch update.
received_aux: BTreeMap<NodeUid, bool>,
/// Received CONF messages. Reset on every epoch update.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's still a "CONF" here, and a "BVAL" four lines above.

let our_uid = self.uid.clone();
self.handle_bval(&our_uid, b)
}

fn send_conf(&mut self) -> AgreementResult<()> {
if self.conf_round {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be called sent_conf, analogously to the other two sent_*?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might imply a wrong reading of sent_conf as the actual sent value. Instead, conf_round is a property whether Agreement is currently performing a Conf round.

/// FIXME: Clarify whether the values of AUX messages should be the same or
/// not. It is assumed in `count_aux` that they can differ.
/// FIXME: Clarify whether the values of `Aux` messages should be the same or not. It is assumed
/// in `count_aux` that they can differ.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can that FIXME be removed? The following paragraph seems to answer this question.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants