Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix] #1716: Fix consensus failure with f=0 cases #2124

Merged
merged 5 commits into from
Apr 22, 2022

Conversation

s8sato
Copy link
Contributor

@s8sato s8sato commented Apr 20, 2022

Description of the Change

  • The topology reshuffles when the shift goes full circle
  • Either of the followings:
    • Give an true solution to the failure
    • Give an mitigation by exposing the number of the last view changes as the network sanity indicator

Minor changes

  • As a bonus, expose the number of the view changes in the current round as the network sanity indicator
  • Refactor kura_inspector with clap version bump

Issue

Benefits

Possible Drawbacks

@github-actions github-actions bot added the iroha2-dev The re-implementation of a BFT hyperledger in RUST label Apr 20, 2022
@s8sato s8sato marked this pull request as ready for review April 20, 2022 18:25
Comment on lines 367 to 368
pub fn reshuffle_after(&self) -> u64 {
self.sorted_peers.len() as u64
Copy link
Contributor Author

@s8sato s8sato Apr 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this fits to the view change strategy

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think it's better not to have it be a configuration parameter.

Here are some of my preliminary thoughts.

I think you are assuming that the peers are shifted by one. I think they are but I'm not entirely certain. Can you confirm this?

Now, if we analyze the problem here, the end goal is to find the next valid topology, if there is a valid topology, in a minimum number of iterations in the average case (or maybe in the worst case?). Since we don't have hijiri yet and we assume all peers to be equally trustworthy I think we could gain some benefit from making it a bit more stochastic, i.e. reshuffling even before we've exhausted all the permutations with the current relative ordering.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also notice, that we don't bring down view_change_proofs to 0 after reshuffling. Is that ok? I think this indicates that we'll be reshuffling all the time after reshuffle_after_n_view_changes is reached

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you confirm this?

Yeah, confirmed by a new unit test -- network_topology::tests::topology_shifts_by_one_at_view_change

find the next valid topology

Interesting problem.
I guess one of your saying is that just shifting by one is very likely to result in similar faulty peers.
So if every peer were uniformly faulty, I'd minimize the overlapped section -- shift by f.
In reality, however, things are more complicated because

  • the leader and the proxy tail are privileged
  • there can be also artificial faults

reshuffling all the time after reshuffle_after_n_view_changes is reached

You are right. This is a mis-implementation unless the parameter increases with every reshuffle.
https://github.com/hyperledger/iroha/blob/81f46bb1a14e16d6eb4cf33f691b673237ec371d/core/src/sumeragi/network_topology.rs#L208
Well, I think we can check if view_change_proofs.len() is congruent to 0 modulo sorted_peers.len()
-- as long as we take the current shift-by-one strategy

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I think we can check if view_change_proofs.len() is congruent to 0 modulo sorted_peers.len()

yeah, this would do in the current implementation. If you're going to fix this I would appreciate a test as well, if not, please open an issue

Copy link
Contributor

@mversic mversic Apr 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In reality, however, things are more complicated because

It's hard to say what would be the best method of shifting/shuffling peers. Maybe we should open a discussion about this in a separate issue and have it shift a full circle here as you did

Copy link
Contributor Author

@s8sato s8sato Apr 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for the shuffle/shift judge I'll fix it and make a test in this PR.
As for the comprehensive discussion about the view change method, created #2133

@s8sato s8sato changed the title [fix] #1716: Rescue the consensus failure when f decrements to 0 [fix] #1716: Fix consensus failure with f=0 cases Apr 20, 2022
mversic
mversic previously approved these changes Apr 21, 2022
core/src/sumeragi/fault.rs Show resolved Hide resolved
core/src/sumeragi/network_topology.rs Outdated Show resolved Hide resolved
core/src/sumeragi/network_topology.rs Show resolved Hide resolved
Signed-off-by: s8sato <49983831+s8sato@users.noreply.github.com>
Signed-off-by: s8sato <49983831+s8sato@users.noreply.github.com>
Signed-off-by: s8sato <49983831+s8sato@users.noreply.github.com>
…work sanity indicator

Signed-off-by: s8sato <49983831+s8sato@users.noreply.github.com>
Signed-off-by: s8sato <49983831+s8sato@users.noreply.github.com>
@appetrosyan appetrosyan merged commit da64704 into hyperledger-iroha:iroha2-dev Apr 22, 2022
@s8sato s8sato added the api-changes Changes in the API for client libraries label Apr 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-changes Changes in the API for client libraries iroha2-dev The re-implementation of a BFT hyperledger in RUST
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants