-
Notifications
You must be signed in to change notification settings - Fork 279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix] #1716: Fix consensus failure with f=0 cases #2124
Conversation
pub fn reshuffle_after(&self) -> u64 { | ||
self.sorted_peers.len() as u64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this fits to the view change strategy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think it's better not to have it be a configuration parameter.
Here are some of my preliminary thoughts.
I think you are assuming that the peers are shifted by one. I think they are but I'm not entirely certain. Can you confirm this?
Now, if we analyze the problem here, the end goal is to find the next valid topology, if there is a valid topology, in a minimum number of iterations in the average case (or maybe in the worst case?). Since we don't have hijiri yet and we assume all peers to be equally trustworthy I think we could gain some benefit from making it a bit more stochastic, i.e. reshuffling even before we've exhausted all the permutations with the current relative ordering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also notice, that we don't bring down view_change_proofs
to 0 after reshuffling. Is that ok? I think this indicates that we'll be reshuffling all the time after reshuffle_after_n_view_changes
is reached
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you confirm this?
Yeah, confirmed by a new unit test -- network_topology::tests::topology_shifts_by_one_at_view_change
find the next valid topology
Interesting problem.
I guess one of your saying is that just shifting by one is very likely to result in similar faulty peers.
So if every peer were uniformly faulty, I'd minimize the overlapped section -- shift by f.
In reality, however, things are more complicated because
- the leader and the proxy tail are privileged
- there can be also artificial faults
reshuffling all the time after
reshuffle_after_n_view_changes
is reached
You are right. This is a mis-implementation unless the parameter increases with every reshuffle.
https://github.com/hyperledger/iroha/blob/81f46bb1a14e16d6eb4cf33f691b673237ec371d/core/src/sumeragi/network_topology.rs#L208
Well, I think we can check if view_change_proofs.len()
is congruent to 0 modulo sorted_peers.len()
-- as long as we take the current shift-by-one strategy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I think we can check if view_change_proofs.len() is congruent to 0 modulo sorted_peers.len()
yeah, this would do in the current implementation. If you're going to fix this I would appreciate a test as well, if not, please open an issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In reality, however, things are more complicated because
It's hard to say what would be the best method of shifting/shuffling peers. Maybe we should open a discussion about this in a separate issue and have it shift a full circle here as you did
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for the shuffle/shift judge I'll fix it and make a test in this PR.
As for the comprehensive discussion about the view change method, created #2133
f
decrements to 0Signed-off-by: s8sato <49983831+s8sato@users.noreply.github.com>
Signed-off-by: s8sato <49983831+s8sato@users.noreply.github.com>
Signed-off-by: s8sato <49983831+s8sato@users.noreply.github.com>
…work sanity indicator Signed-off-by: s8sato <49983831+s8sato@users.noreply.github.com>
Signed-off-by: s8sato <49983831+s8sato@users.noreply.github.com>
Description of the Change
Give an mitigation by exposing the number of the last view changes as the network sanity indicatorMinor changes
kura_inspector
withclap
version bumpIssue
f
decrements to 0 #1716Benefits
Possible Drawbacks