-
Notifications
You must be signed in to change notification settings - Fork 673
prevent stalling all broadcasts when single connection is blocked #1695
Conversation
I have tested this with
measuring how long it takes from the end of that until
shows all connections between peers as For N=40 there is no difference between master and this branch - both take about 90 seconds. For N=50, to my surprise, this branch takes a lot longer than master. The logs show a fair amount of connection churn due to dropped heartbeats. I think this is simply due to higher CPU load on this branch since it propagates broadcasts better. The very issue we are fixing here does actually reduce CPU load since a single stalled connection will cause all subsequent broadcasts to accumulate locally before being sent when the connection unblocks. Overall this causes fewer broadcasts to be sent, which in turn means fewer broadcasts are received and processed. And some "flapping" connection may well get removed from the topology broadcast stream through the merging. Which in turn means fewer connection establishments get triggered through peer discovery. |
190a133
to
faa2713
Compare
988464b
to
155f651
Compare
155f651
to
3c33007
Compare
dd1df78
to
ece7f6a
Compare
I've spent some more time analysing the performance with multiweave and believe my above analysis is correct. I have filed #1761 and #1762 to look into the root causes. Sorry this is just one commit and may look at bit daunting. But there are really just four things going on here:
Unfortunately they cannot be teased apart into separate commits while retaining workingness. |
a0ae52e
to
3756d0a
Compare
I'd like to get #1773 merged first and then run another set of performance tests here. |
7d19dd7
to
ae3b0a0
Compare
870cc00
to
c942ae1
Compare
9ba07bf
to
909e1e6
Compare
Previously a GossipChannel had one GossipSender per broadcast source peer. That sender would stall when any of the next-hop connections were blocked, causing broadcasts from the same source peer to be accumulate (via GossipData.Merge) locally even though it could be sent down the remaining unblocked connections. We can't just have a GossipSender per connection, because broadcast messages carry the source peer and hence only messages from the same source peer can be merged. The obvious fix would be for GossipChannels to have a GossipSender per combination of source peer and connection. But that is potentially O(n_peers * n_connections) GossipSenders, which in a fully connected mesh is O(n_peers^2). So instead we do the following: 1. We abandon the concept of GossipData. The gossip interfaces now deal in encoded gossip. This simplifies the interface and its usage quite a lot and also makes it more amendable to possible future exposure via, say, HTTP. 1. We get rid of the per broadcast source GossipSenders and instead send broadcast messages via the per connection GossipSenders. 2. Instead of single GossipData cell, GossipSenders contain a buffer of ProtocolMsgs. This buffer has a dynamic bound - the aggregate size of the encoded gossip must not exceed the size of the last complete gossip times a certain factor (currently set to 2). When the size is exceeded we clear the buffer and when the GossipSender is next ready to send it obtains a complete gossip and sends that. More on that below. 3. GossipSenders shut down when the sending fails. This ensures that we waste no cycles obtaining complete gossip data for dead connections. The buffering logic is best considered as a compression scheme. There are three types of gossip: complete, differential ("send on everything new I've learnt from receiving some gossip") and incremental (GossipBroadcast). There is no point buffering more than the complete gossip size of differential gossip - we might as well just send the complete gossip. The situation is the same for incremental gossip except that replacing a bunch of incremental gossip with complete gossip makes propagation less efficient, and can delay convergence, since incremental gossip propagates via optimal broadcast routing rather than neighbour gossiping. So we do give up something here. But it's better than either having unbounded buffers, or quadratic space complexity, or simply throwing away gossip when we cannot send it immediately. Note that we are assuming here that GossipBroadcast and Gossip talk about the same information. That is the case in all current uses, and documented in the Gossiper API by reference to a "state". There is one wart here: SurrogateGossiper. It has no idea what complete gossip sizes are since it does not generate any complete gossip. So instead just buffers everything. Which is exactly what it did previously, since it didn't know how to merge GossipData either. So no change, really. With the per-broadcast-source GossipSenders gone, we can get rid of GossipSender gc... Instead of maintaining a Connection->GossipSender map in the GossipChannels - which we then need to gc when connections get closed - we keep gossip_channel_name->GossipSender maps in LocalConnections. Access to the map is guarded by a separate lock, rather than re-using the LocalConnection lock. This reduces lock contention and risk of deadlock. In the GossipSenders, we replace the Stop() method with a signalling channel, i.e. senders stop when that channel gets closed. Handily, the LocalConnection.finished channel fits that description. So when a LocalConnection's run loop terminates, any existing and future associated GossipSenders will end up terminating too.
The code now does provide all the necessary guarantees. In particular, the GossipSender.sendAll mark ensures that any complete gossip sent is always at least as up to date as it was when any previous message was sent (or dropped). Fixes #1793.
909e1e6
to
5d1705b
Compare
A lot of the changes in here were supplanted by #1826, #1855 and #1856. The remaining substantial change here is the removal of I've pushed the branch to my own repo, in case we ever want to revive this. |
See commit messages for details.
This also fixes #1793.