FloodSub is not working as of b8a312f7d512c25f86bc16989cb588900483946d #843

echupriyanov · 2019-01-10T11:48:37Z

At present master (commit b8a312f)
FloodSub doesn't seem to work properly.

When I run provided chat example, I don't see messages being delivered between nodes.
Running example with debug enabled, I see, that dialed TCP connection is dropped immediately after successful negotiation.

Changing FloodsubHandler::connection_keep_alive() to always return true resolves this issue.

My guess is that this happens because:

libp2p closes idle (useless?) TCP connections
floodsub closes the substream after successful send.

So, we end up with no TCP connection between nodes.

The text was updated successfully, but these errors were encountered:

tomaka · 2019-01-10T12:50:44Z

Oh, I didn't think of that in #816
This is also strongly related to #713

tomaka · 2019-01-11T09:04:25Z

One possible fix that would make sense to me is to have connection_keep_alive() always return true if the remote is subscribed to a topic that we are also subscribed to.

echupriyanov · 2019-01-11T09:08:23Z

But do we have such information on ProtocolHandler level?

tomaka · 2019-01-11T09:22:13Z

We can have the NetworkBehaviour indicate this to the protocol handler.

echupriyanov · 2019-01-11T10:00:50Z

Maybe it makes sense to move connection_keep_alive() to the NetworkBehaviour? It seems to me more logical

tomaka · 2019-01-11T10:11:09Z

At the lower level each connection to a remote gets its own tokio task, supposedly asynchronous from the network behaviours.

If we put connection_keep_alive() in the NetworkBehaviour, then we need additional roundtrips between the behaviour and the protocol handler. Performance-wise, this is the same as if we do this manually specifically for floodsub.

echupriyanov · 2019-01-11T10:21:16Z

I see. In that case, I'd suggest to add something like keep_alive flag to the FloodsubHandler, which is initialized to true on start, and can be turned off by NetworkBehaviour with SendEvent action

drozdziak1 · 2019-01-15T10:56:59Z

@tomaka Couldn't this perhaps be implemented at the TcpConfig level? I mean, all TCP connections need to be kept alive no matter the layer above, is there a use case where someone might want the TCP transport to just destroy one?

tomaka · 2019-01-15T11:03:09Z

@drozdziak1 The issues right now is that we purposefully destroy TCP connections when they become unused. We don't want to stay connected to everyone forever, especially if you are in a network that is made of thousands of nodes.

drozdziak1 · 2019-01-15T11:14:57Z

Yeah, I mean having the TCP transport implement some form of smarter connection GC? Like a "last_used" timestamp with a configurable time window, like 15 minutes? Or a connection cap where you only keep up to N connections alive at a time and destroy the one with oldest "last_used" when you want to open another (N+1)th? Perhaps we could have some different modes for this, and maybe let the user specify a filter closure that could sort connections by a priority metric of their choosing?

tomaka · 2019-01-15T12:25:11Z

The "last used" thing is already implemented, except that it's configured to 5 seconds right now.

drozdziak1 · 2019-01-16T10:44:25Z

So when I run the chat example with RUST_LOG=debug, the connection appears to be broken up regardless of activity. Is there a way I could try and maybe bump that timeout upon Floodsub sending something over?

My general impression is that this is hard to fix mainly because of the mental overhead of the architecture. Is there any refactor options you find worth considering, which could help this issue?

echupriyanov · 2019-01-16T10:49:41Z

@drozdziak1
For a time being, until better solution is designed, I'm using fork of libp2p where FloodsubHandler::connection_keep_alive() always returns true.

Here s the fork - https://github.com/stegos/rust-libp2p/tree/stegos

drozdziak1 · 2019-01-16T10:57:58Z

Thanks @echupriyanov! I'm more looking to help fix this properly for everyone else though.

tomaka · 2019-01-16T12:16:05Z

I think that a proper fix goes in pair with #854.
The reason why the chat example works at all is because Mdns asks the swarm to connect to the nodes it discovers. The fact that mDNS automatically connects is debatable by itself, and should definitely not be the backbone of the example.

drozdziak1 · 2019-01-16T12:31:32Z

@tomaka Thanks! I'll try to understand that when I find the time.

tomaka · 2019-03-22T17:20:33Z

Should work fine now.

cbarraford · 2019-11-14T04:04:35Z

@tomaka what is the solution to this problem?? The code has changed quite a bit since stegos's fork. Thanks in advance!

tomaka added bug difficulty:moderate labels Jan 10, 2019

tomaka added the priority:important The changes needed are critical for libp2p, or are blocking another project label Jan 10, 2019

tomaka added this to the publish-v0.3 milestone Jan 16, 2019

tomaka mentioned this issue Jan 17, 2019

connection_keep_alive() should return an Instant #865

Closed

tomaka mentioned this issue Jan 25, 2019

Embed the topology in the NetworkBehaviour #889

Merged

tomaka closed this as completed Mar 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FloodSub is not working as of b8a312f7d512c25f86bc16989cb588900483946d #843

FloodSub is not working as of b8a312f7d512c25f86bc16989cb588900483946d #843

echupriyanov commented Jan 10, 2019

tomaka commented Jan 10, 2019

tomaka commented Jan 11, 2019

echupriyanov commented Jan 11, 2019

tomaka commented Jan 11, 2019

echupriyanov commented Jan 11, 2019

tomaka commented Jan 11, 2019

echupriyanov commented Jan 11, 2019

drozdziak1 commented Jan 15, 2019

tomaka commented Jan 15, 2019

drozdziak1 commented Jan 15, 2019

tomaka commented Jan 15, 2019

drozdziak1 commented Jan 16, 2019

echupriyanov commented Jan 16, 2019

drozdziak1 commented Jan 16, 2019

tomaka commented Jan 16, 2019

drozdziak1 commented Jan 16, 2019

tomaka commented Mar 22, 2019

cbarraford commented Nov 14, 2019

FloodSub is not working as of b8a312f7d512c25f86bc16989cb588900483946d #843

FloodSub is not working as of b8a312f7d512c25f86bc16989cb588900483946d #843

Comments

echupriyanov commented Jan 10, 2019

tomaka commented Jan 10, 2019

tomaka commented Jan 11, 2019

echupriyanov commented Jan 11, 2019

tomaka commented Jan 11, 2019

echupriyanov commented Jan 11, 2019

tomaka commented Jan 11, 2019

echupriyanov commented Jan 11, 2019

drozdziak1 commented Jan 15, 2019

tomaka commented Jan 15, 2019

drozdziak1 commented Jan 15, 2019

tomaka commented Jan 15, 2019

drozdziak1 commented Jan 16, 2019

echupriyanov commented Jan 16, 2019

drozdziak1 commented Jan 16, 2019

tomaka commented Jan 16, 2019

drozdziak1 commented Jan 16, 2019

tomaka commented Mar 22, 2019

cbarraford commented Nov 14, 2019