Skip to content
This repository has been archived by the owner on Jun 20, 2024. It is now read-only.

prevent stalling all broadcasts when single connection is blocked #1695

Closed
wants to merge 3 commits into from

Conversation

rade
Copy link
Member

@rade rade commented Nov 17, 2015

See commit messages for details.

This also fixes #1793.

@rade
Copy link
Member Author

rade commented Nov 17, 2015

I have tested this with

WEAVE_NO_FASTDP=1 NUM_WEAVES=<N> bin/multiweave launch --conn-limit 100`

measuring how long it takes from the end of that until

WEAVE_CONTAINER_NAME=weave1 weave status

shows all connections between peers as established.

For N=40 there is no difference between master and this branch - both take about 90 seconds. For N=50, to my surprise, this branch takes a lot longer than master. The logs show a fair amount of connection churn due to dropped heartbeats. I think this is simply due to higher CPU load on this branch since it propagates broadcasts better.

The very issue we are fixing here does actually reduce CPU load since a single stalled connection will cause all subsequent broadcasts to accumulate locally before being sent when the connection unblocks. Overall this causes fewer broadcasts to be sent, which in turn means fewer broadcasts are received and processed. And some "flapping" connection may well get removed from the topology broadcast stream through the merging. Which in turn means fewer connection establishments get triggered through peer discovery.

@rade rade force-pushed the gossip-broadcast-stall branch from 190a133 to faa2713 Compare November 17, 2015 16:43
@rade rade self-assigned this Nov 17, 2015
@rade rade force-pushed the gossip-broadcast-stall branch 8 times, most recently from 988464b to 155f651 Compare November 25, 2015 14:56
@rade rade force-pushed the gossip-broadcast-stall branch from 155f651 to 3c33007 Compare December 1, 2015 15:25
@rade rade force-pushed the gossip-broadcast-stall branch 7 times, most recently from dd1df78 to ece7f6a Compare December 8, 2015 17:18
@rade rade removed their assignment Dec 8, 2015
@rade
Copy link
Member Author

rade commented Dec 8, 2015

I've spent some more time analysing the performance with multiweave and believe my above analysis is correct. I have filed #1761 and #1762 to look into the root causes.

Sorry this is just one commit and may look at bit daunting. But there are really just four things going on here:

  • ditching of GossipData - this accounts for the bulk of changes across the many files and is largely mechanical
  • ditching of the per-broadcaster GossipSenders - this affects gossip_channel.go only
  • buffering logic in GossipSender - this is all in gossip.go
  • shutting down of GossipSender on send failure - this is a small API change on ProtocolSender and some extra channel signalling in GossipSender

Unfortunately they cannot be teased apart into separate commits while retaining workingness.

@rade rade force-pushed the gossip-broadcast-stall branch 2 times, most recently from a0ae52e to 3756d0a Compare December 9, 2015 12:39
@bboreham bboreham added this to the 1.5.0 milestone Dec 9, 2015
@rade rade self-assigned this Dec 10, 2015
@rade
Copy link
Member Author

rade commented Dec 10, 2015

I'd like to get #1773 merged first and then run another set of performance tests here.

@rade rade force-pushed the gossip-broadcast-stall branch 2 times, most recently from 7d19dd7 to ae3b0a0 Compare December 15, 2015 20:38
@rade rade force-pushed the gossip-broadcast-stall branch 2 times, most recently from 870cc00 to c942ae1 Compare December 20, 2015 18:24
@rade rade force-pushed the gossip-broadcast-stall branch 11 times, most recently from 9ba07bf to 909e1e6 Compare December 21, 2015 19:16
rade added 3 commits December 21, 2015 22:50
Previously a GossipChannel had one GossipSender per broadcast source
peer. That sender would stall when any of the next-hop connections
were blocked, causing broadcasts from the same source peer to be
accumulate (via GossipData.Merge) locally even though it could be sent
down the remaining unblocked connections.

We can't just have a GossipSender per connection, because broadcast
messages carry the source peer and hence only messages from the same
source peer can be merged.

The obvious fix would be for GossipChannels to have a GossipSender per
combination of source peer and connection. But that is potentially
O(n_peers * n_connections) GossipSenders, which in a fully connected
mesh is O(n_peers^2).

So instead we do the following:

1. We abandon the concept of GossipData. The gossip interfaces now
deal in encoded gossip. This simplifies the interface and its usage
quite a lot and also makes it more amendable to possible future
exposure via, say, HTTP.

1. We get rid of the per broadcast source GossipSenders and instead
send broadcast messages via the per connection GossipSenders.

2. Instead of single GossipData cell, GossipSenders contain a buffer
of ProtocolMsgs. This buffer has a dynamic bound - the aggregate size
of the encoded gossip must not exceed the size of the last complete
gossip times a certain factor (currently set to 2). When the size is
exceeded we clear the buffer and when the GossipSender is next ready
to send it obtains a complete gossip and sends that. More on that
below.

3. GossipSenders shut down when the sending fails. This ensures that
we waste no cycles obtaining complete gossip data for dead
connections.

The buffering logic is best considered as a compression scheme. There
are three types of gossip: complete, differential ("send on everything
new I've learnt from receiving some gossip") and incremental
(GossipBroadcast). There is no point buffering more than the complete
gossip size of differential gossip - we might as well just send the
complete gossip. The situation is the same for incremental gossip
except that replacing a bunch of incremental gossip with complete
gossip makes propagation less efficient, and can delay convergence,
since incremental gossip propagates via optimal broadcast routing
rather than neighbour gossiping. So we do give up something here. But
it's better than either having unbounded buffers, or quadratic space
complexity, or simply throwing away gossip when we cannot send it
immediately.

Note that we are assuming here that GossipBroadcast and Gossip talk
about the same information. That is the case in all current uses, and
documented in the Gossiper API by reference to a "state".

There is one wart here: SurrogateGossiper. It has no idea what
complete gossip sizes are since it does not generate any complete
gossip. So instead just buffers everything. Which is exactly what it
did previously, since it didn't know how to merge GossipData
either. So no change, really.

With the per-broadcast-source GossipSenders gone, we can get rid of
GossipSender gc...

Instead of maintaining a Connection->GossipSender map in the
GossipChannels - which we then need to gc when connections get closed
- we keep gossip_channel_name->GossipSender maps in LocalConnections.

Access to the map is guarded by a separate lock, rather than re-using
the LocalConnection lock. This reduces lock contention and risk of
deadlock.

In the GossipSenders, we replace the Stop() method with a signalling
channel, i.e. senders stop when that channel gets closed. Handily, the
LocalConnection.finished channel fits that description. So when a
LocalConnection's run loop terminates, any existing and future
associated GossipSenders will end up terminating too.
The code now does provide all the necessary guarantees. In particular,
the GossipSender.sendAll mark ensures that any complete gossip sent is
always at least as up to date as it was when any previous message was
sent (or dropped).

Fixes #1793.
@rade rade force-pushed the gossip-broadcast-stall branch from 909e1e6 to 5d1705b Compare December 21, 2015 22:50
@rade rade modified the milestones: 1.5.0, n/a Jan 15, 2016
@rade
Copy link
Member Author

rade commented Jan 15, 2016

A lot of the changes in here were supplanted by #1826, #1855 and #1856. The remaining substantial change here is the removal of GossipData. This is a substantial change in the API and behaviour. It makes the gossip API much neater - and more amenable to remoting - but getting it to behave reasonably under load has proved challenging.

I've pushed the branch to my own repo, in case we ever want to revive this.

@rade rade closed this Jan 15, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

re-ordering of topology gossip can delay convergence
2 participants