Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After idle, messages in web app are not sent #185

Closed
D4nte opened this issue May 31, 2021 · 13 comments
Closed

After idle, messages in web app are not sent #185

D4nte opened this issue May 31, 2021 · 13 comments
Labels
bug Something isn't working

Comments

@D4nte
Copy link
Contributor

D4nte commented May 31, 2021

Problem

If the web chat app tab is left open for a while, then it loses the ability to send new messages via relay.

In the console, the following errors appear:

libp2p:mplex:stream initiator stream 2 reset +33m common.js:111:9
libp2p:mplex:stream initiator stream 2 reset +0ms common.js:111:9
libp2p:mplex:stream initiator stream 2 reset +1ms

It fails exactly 10min in. I am guessing it is a TCP timeout, I also have seen scenarios where nginx was used as a reverse proxy and it would fail earlier due to some configurable nginx setting.

I was not able to reproduce the issues in libp2p-gossipubsub test. I tried to see if it was possible to change the TCP timeout in node.http but did not seem the case. Could not reproduce by waiting 10min in the test either.

However, when running web-chat locally, I can systematically reproduce the issue: stream closes after 10min, then sending messages over relay fails because it attempts to write on a closed stream.

Solution

The issue is that when the connection times out, the streams are closed but the closure is not handled.
Then when trying to send data, it tries to write on the closed stream instead of opening a new one.

Note that there are 2 streams per protocol per peer (one inbound, one outbound).
As stream closure is not handled, the peer connection status isn't updated either.

libp2p/js-libp2p#744 could help with the keep alive feature. However I am not convinced as the keep alive is merely workaround (to my current understanding) as the fact that relay should simply not try to write on a closed stream.

some work for this second point has been ongoing on js-libp2p side since April:

I am also not sure I fully grasped the issue so I am asking more questions:

In the mean time a ping over relay helps as it stops the connection and hence stream to be closed: #243.

ipfs/ping is implemented and been tried but has each protocol use a different stream it did not help prevent the closure of the relay outbound stream.

@D4nte
Copy link
Contributor Author

D4nte commented Jun 2, 2021

A strategy would be to send mepty messages as a keep alice like nim-waku does waku-org/nwaku#588

@D4nte D4nte added the blocked This issue is blocked by some other work label Jun 10, 2021
@D4nte
Copy link
Contributor Author

D4nte commented Jun 15, 2021

See waku-org/nwaku#621

@jm-clius
Copy link

This should be unblocked now with the merging of waku-org/nwaku#621. Note, however, that ping protocol is currently only available on test fleet nodes.

@D4nte D4nte removed the blocked This issue is blocked by some other work label Jun 16, 2021
@D4nte
Copy link
Contributor Author

D4nte commented Jun 20, 2021

Let's see if #210 fixes it.

@D4nte D4nte added the bug Something isn't working label Jun 20, 2021
@D4nte
Copy link
Contributor Author

D4nte commented Jun 24, 2021

Still happening, need to discuss with chainsafe and understand if they have same issues. I believe it to be a symptom of libp2p/js-libp2p#939

@jm-clius
Copy link

jm-clius commented Jun 24, 2021

Note, however, that ping protocol is currently only available on test fleet nodes.

Just double-checking that you're still aware of this - prod is running the release (v0.4) which does not mount or use ping.

@D4nte
Copy link
Contributor Author

D4nte commented Jun 25, 2021

Note, however, that ping protocol is currently only available on test fleet nodes.

Just double-checking that you're still aware of this - prod is running the release (v0.4) which does not mount or use ping.

oooooooooooooh!!!! Thanks! I need to test on test fleet then.

@D4nte
Copy link
Contributor Author

D4nte commented Jun 28, 2021

This should be unblocked now with the merging of status-im/nim-waku#621. Note, however, that ping protocol is currently only available on test fleet nodes.

Confirmed on test fleet, it does not help with the relay issue. Which is kind of expected as the relay issue is due to the relay stream closing, a ping would use a different stream (a stream is for one protocol only).

@D4nte
Copy link
Contributor Author

D4nte commented Jul 22, 2021

It fails exactly 10min in. I am guessing it is a TCP timeout, I also have seen scenarios where nginx was used as a reverse proxy and it would fail earlier due to some configurable nginx setting.

I was not able to reproduce the issues in libp2p-gossipubsub test. I tried to see if it was possible to change the TCP timeout in node.http but did not seem the case. Could not reproduce by waiting 10min in the test either.

However, when running web-chat locally, I can systematically reproduce the issue: stream closes after 10min, then sending messages over relay fails because it attempts to write on a closed stream.

IMO, the issue is that when the connection times out, the streams are closed but the closure is not handled.
Then when trying to send data, it tries to write on the closed stream instead of opening a new one.

Note that there are 2 stream per protocol per peer (one inbound, one outbound).
As stream closure is not handled, the peer connection status isn't updated either.

libp2p/js-libp2p#744 could help with the keep alive feature. However I am not convinced as the keep alive is merely workaround (to my current understanding) as the fact that relay should simply not try to write on a closed stream.

some work for this second point has been ongoing on js-libp2p side since April:

I am also not sure I fully grasped the issue so I m asking more questions:

In the mean time a ping over relay helps as it stops the connection and hence stream to be closed: #243.

ipfs/ping is implemented and been tried but has each protocol use a different stream it did not help prevent the closure of the relay outbound stream.

@jm-clius
Copy link

In the mean time I am attempting to implement a ping over relay.

Sounds like this might work. Crossing fingers! :) Assuming this will be viewed as a "normal" relay message by other clients? If there's a need we may have to expand the relay spec to accommodate relay pings.

@D4nte
Copy link
Contributor Author

D4nte commented Jul 25, 2021

SafeChain does not seem to encounter the issue due to a lack of idling:

https://discord.com/channels/593655374469660673/606584108449005598/867794263945510942

hmmm I don't think we've ever seen this bc our network is constantly sending gossip
iirc our stream handling is quite simple and doesn't do anything in the way of re-establishing connections
most of the stream handling stuff happens here: https://github.com/libp2p/js-libp2p-interfaces/tree/master/packages/interfaces/src/pubsub

@D4nte
Copy link
Contributor Author

D4nte commented Jul 27, 2021

Resolved with a relay keep alive: #243.
Keeping open to watch out for upstream solution libp2p/js-libp2p#744 .

@fryorcraken
Copy link
Collaborator

Not happening anymore, same than #751

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

No branches or pull requests

3 participants