Skip to content
This repository has been archived by the owner on Jun 20, 2024. It is now read-only.

better convergence when peers restart #1641

Merged
merged 1 commit into from
Nov 9, 2015

Conversation

rade
Copy link
Member

@rade rade commented Nov 5, 2015

Previously when a peer restarted, information about the new incarnation (i.e. with a different UID) was not accepted by other peers (and connections would be dropped) unless all knowledge of the
previous incarnation had been purged. This could result in a lot of connection churn and hence connectivity disruption, and, in some pathological cases, very slow convergence and hence acceptance of the new incarnation into the network.

We now no longer drop connections when encountering different incarnations of a peer. There are two situations when that can happen:

  1. on connection establishment

we simply proceed

  1. on receipt of gossip

to ensure convergence we

a) treat the UID as an additional discriminator when deciding whether we should update our information about a peer with that which was gossiped. Specifically, we update the information we hold when a) the gossiped version is greater, or b) is the same and the UID is greater.

b) include the UID in the information we update

c) move our own version number beyond any we receive for ourselves, if the received UID differs from ours.

With (a) we establishes a total order of peer information across several incarnations of the same peer. i.e. we consider information to be fresher if it has a higher version, or the same version and higher UID. This may seem somewhat counter intutive, since it will generally treat information about new incarnations as older than old incarnations, since incarnations always start life with version 1. But to do better we'd need to establish a total order of incarnations that matches their temporal occurrence. Which requires some sort of durable state.

So instead we have (c). Through that we learn the highest version number of any old incarnation of ourselves that other peers still hold, and then make sure that our version is greater than that. Essentially we continue where the old incarnations left off. It's as if instead of restarting we had simply changed UIDs. And due to (a) and (b) the information about the new incarnation of ourselves, now with a higher version, will supersede that of the old incarnations.

Fixes #1554.

@rade
Copy link
Member Author

rade commented Nov 5, 2015

I have run the test from #1554 (comment) against this. No connection drops at all, even after resuming all the gossiping. And all peers end up with the correct info about the restarted peer (new UID, etc).

// received by other peers.
pending.localPeerModified = peers.ourself.setVersionBeyond(newPeer.Version)
}
default:

This comment was marked as abuse.

This comment was marked as abuse.

@bboreham
Copy link
Contributor

bboreham commented Nov 9, 2015

Apart from my comment on the comment, LGTM

@bboreham bboreham assigned rade and unassigned bboreham Nov 9, 2015
@rade rade force-pushed the 1554-converge-on-peer-uid-change branch from 301d97c to b33b208 Compare November 9, 2015 14:50
Previously when a peer restarted, information about the new
incarnation (i.e. with a different UID) was not accepted by other
peers (and connections would be dropped) unless all knowledge of the
previous incarnation had been purged. This could result in a lot of
connection churn and hence connectivity disruption, and, in some
pathological cases, very slow convergence and hence acceptance of the
new incarnation into the network.

We now no longer drop connections when encountering different
incarnations of a peer. There are two situations when that can happen:

1) on connection establishment

we simply proceed

2) on receipt of gossip

to ensure convergence we

a) treat the UID as an additional discriminator when deciding whether
we should update our information about a peer with that which was
gossiped. Specifically, we update the information we hold when a) the
gossiped version is greater, or b) is the same and the UID is
greater.

b) include the UID in the information we update

c) move our own version number beyond any we receive for ourselves, if
the received UID differs from ours.

With (a) we establishes a total order of peer information across
several incarnations of the same peer. i.e. we consider information to
be fresher if it has a higher version, or the same version and higher
UID. This may seem somewhat counter intutive, since it will generally
treat information about new incarnations as older than old
incarnations, since incarnations always start life with version 1. But
to do better we'd need to establish a total order of incarnations that
matches their temporal occurrence. Which requires some sort of durable
state.

So instead we have (c). Through that we learn the highest version
number of any old incarnation of ourselves that other peers still
hold, and then make sure that our version is greater than
that. Essentially we continue where the old incarnations left
off. It's as if instead of restarting we had simply changed UIDs. And
due to (a) and (b) the information about the new incarnation of
ourselves, now with a higher version, will supersede that of the old
incarnations.

Fixes #1554.
@rade rade force-pushed the 1554-converge-on-peer-uid-change branch from b33b208 to bfb050c Compare November 9, 2015 14:58
@rade rade assigned bboreham and unassigned rade Nov 9, 2015
bboreham added a commit that referenced this pull request Nov 9, 2015
…ange

better convergence when peers restart; fixes #1554.
@bboreham bboreham merged commit 53ac432 into master Nov 9, 2015
@awh awh deleted the 1554-converge-on-peer-uid-change branch November 9, 2015 16:39
@rade rade modified the milestone: 1.3.0 Nov 11, 2015
@awh awh modified the milestones: 1.4.0, 1.3.0 Nov 12, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants