prevent stalling all broadcasts when single connection is blocked #1695

rade · 2015-11-17T16:18:37Z

See commit messages for details.

This also fixes #1793.

rade · 2015-11-17T16:39:40Z

I have tested this with

WEAVE_NO_FASTDP=1 NUM_WEAVES=<N> bin/multiweave launch --conn-limit 100`

measuring how long it takes from the end of that until

WEAVE_CONTAINER_NAME=weave1 weave status

shows all connections between peers as established.

For N=40 there is no difference between master and this branch - both take about 90 seconds. For N=50, to my surprise, this branch takes a lot longer than master. The logs show a fair amount of connection churn due to dropped heartbeats. I think this is simply due to higher CPU load on this branch since it propagates broadcasts better.

The very issue we are fixing here does actually reduce CPU load since a single stalled connection will cause all subsequent broadcasts to accumulate locally before being sent when the connection unblocks. Overall this causes fewer broadcasts to be sent, which in turn means fewer broadcasts are received and processed. And some "flapping" connection may well get removed from the topology broadcast stream through the merging. Which in turn means fewer connection establishments get triggered through peer discovery.

rade · 2015-12-08T17:34:26Z

I've spent some more time analysing the performance with multiweave and believe my above analysis is correct. I have filed #1761 and #1762 to look into the root causes.

Sorry this is just one commit and may look at bit daunting. But there are really just four things going on here:

ditching of GossipData - this accounts for the bulk of changes across the many files and is largely mechanical
ditching of the per-broadcaster GossipSenders - this affects gossip_channel.go only
buffering logic in GossipSender - this is all in gossip.go
shutting down of GossipSender on send failure - this is a small API change on ProtocolSender and some extra channel signalling in GossipSender

Unfortunately they cannot be teased apart into separate commits while retaining workingness.

rade · 2015-12-10T19:13:33Z

I'd like to get #1773 merged first and then run another set of performance tests here.

Previously a GossipChannel had one GossipSender per broadcast source peer. That sender would stall when any of the next-hop connections were blocked, causing broadcasts from the same source peer to be accumulate (via GossipData.Merge) locally even though it could be sent down the remaining unblocked connections. We can't just have a GossipSender per connection, because broadcast messages carry the source peer and hence only messages from the same source peer can be merged. The obvious fix would be for GossipChannels to have a GossipSender per combination of source peer and connection. But that is potentially O(n_peers * n_connections) GossipSenders, which in a fully connected mesh is O(n_peers^2). So instead we do the following: 1. We abandon the concept of GossipData. The gossip interfaces now deal in encoded gossip. This simplifies the interface and its usage quite a lot and also makes it more amendable to possible future exposure via, say, HTTP. 1. We get rid of the per broadcast source GossipSenders and instead send broadcast messages via the per connection GossipSenders. 2. Instead of single GossipData cell, GossipSenders contain a buffer of ProtocolMsgs. This buffer has a dynamic bound - the aggregate size of the encoded gossip must not exceed the size of the last complete gossip times a certain factor (currently set to 2). When the size is exceeded we clear the buffer and when the GossipSender is next ready to send it obtains a complete gossip and sends that. More on that below. 3. GossipSenders shut down when the sending fails. This ensures that we waste no cycles obtaining complete gossip data for dead connections. The buffering logic is best considered as a compression scheme. There are three types of gossip: complete, differential ("send on everything new I've learnt from receiving some gossip") and incremental (GossipBroadcast). There is no point buffering more than the complete gossip size of differential gossip - we might as well just send the complete gossip. The situation is the same for incremental gossip except that replacing a bunch of incremental gossip with complete gossip makes propagation less efficient, and can delay convergence, since incremental gossip propagates via optimal broadcast routing rather than neighbour gossiping. So we do give up something here. But it's better than either having unbounded buffers, or quadratic space complexity, or simply throwing away gossip when we cannot send it immediately. Note that we are assuming here that GossipBroadcast and Gossip talk about the same information. That is the case in all current uses, and documented in the Gossiper API by reference to a "state". There is one wart here: SurrogateGossiper. It has no idea what complete gossip sizes are since it does not generate any complete gossip. So instead just buffers everything. Which is exactly what it did previously, since it didn't know how to merge GossipData either. So no change, really. With the per-broadcast-source GossipSenders gone, we can get rid of GossipSender gc... Instead of maintaining a Connection->GossipSender map in the GossipChannels - which we then need to gc when connections get closed - we keep gossip_channel_name->GossipSender maps in LocalConnections. Access to the map is guarded by a separate lock, rather than re-using the LocalConnection lock. This reduces lock contention and risk of deadlock. In the GossipSenders, we replace the Stop() method with a signalling channel, i.e. senders stop when that channel gets closed. Handily, the LocalConnection.finished channel fits that description. So when a LocalConnection's run loop terminates, any existing and future associated GossipSenders will end up terminating too.

The code now does provide all the necessary guarantees. In particular, the GossipSender.sendAll mark ensures that any complete gossip sent is always at least as up to date as it was when any previous message was sent (or dropped). Fixes #1793.

rade · 2016-01-15T04:36:45Z

A lot of the changes in here were supplanted by #1826, #1855 and #1856. The remaining substantial change here is the removal of GossipData. This is a substantial change in the API and behaviour. It makes the gossip API much neater - and more amenable to remoting - but getting it to behave reasonably under load has proved challenging.

I've pushed the branch to my own repo, in case we ever want to revive this.

rade force-pushed the gossip-broadcast-stall branch from 190a133 to faa2713 Compare November 17, 2015 16:43

rade self-assigned this Nov 17, 2015

rade force-pushed the gossip-broadcast-stall branch 8 times, most recently from 988464b to 155f651 Compare November 25, 2015 14:56

rade force-pushed the gossip-broadcast-stall branch from 155f651 to 3c33007 Compare December 1, 2015 15:25

awh force-pushed the master branch from 2cfaf5a to 1f438db Compare December 2, 2015 14:32

rade force-pushed the gossip-broadcast-stall branch 7 times, most recently from dd1df78 to ece7f6a Compare December 8, 2015 17:18

rade removed their assignment Dec 8, 2015

rade force-pushed the gossip-broadcast-stall branch 2 times, most recently from a0ae52e to 3756d0a Compare December 9, 2015 12:39

bboreham added this to the 1.5.0 milestone Dec 9, 2015

rade self-assigned this Dec 10, 2015

rade force-pushed the gossip-broadcast-stall branch 2 times, most recently from 7d19dd7 to ae3b0a0 Compare December 15, 2015 20:38

rade mentioned this pull request Dec 15, 2015

re-ordering of topology gossip can delay convergence #1793

Open

rade force-pushed the gossip-broadcast-stall branch 2 times, most recently from 870cc00 to c942ae1 Compare December 20, 2015 18:24

rade mentioned this pull request Dec 20, 2015

gossip can be sent back to neighbour from which it was received #1807

Closed

rade force-pushed the gossip-broadcast-stall branch 11 times, most recently from 9ba07bf to 909e1e6 Compare December 21, 2015 19:16

rade added 3 commits December 21, 2015 22:50

optimisation: no need to re-encode relayed broadcasts

a1010ce

remove workaround for #1793

5d1705b

The code now does provide all the necessary guarantees. In particular, the GossipSender.sendAll mark ensures that any complete gossip sent is always at least as up to date as it was when any previous message was sent (or dropped). Fixes #1793.

rade force-pushed the gossip-broadcast-stall branch from 909e1e6 to 5d1705b Compare December 21, 2015 22:50

rade modified the milestones: 1.5.0, n/a Jan 15, 2016

rade added the resolution/invalid label Jan 15, 2016

rade closed this Jan 15, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prevent stalling all broadcasts when single connection is blocked #1695

prevent stalling all broadcasts when single connection is blocked #1695

rade commented Nov 17, 2015

rade commented Nov 17, 2015

rade commented Dec 8, 2015

rade commented Dec 10, 2015

rade commented Jan 15, 2016

prevent stalling all broadcasts when single connection is blocked #1695

prevent stalling all broadcasts when single connection is blocked #1695

Conversation

rade commented Nov 17, 2015

rade commented Nov 17, 2015

rade commented Dec 8, 2015

rade commented Dec 10, 2015

rade commented Jan 15, 2016