-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
addObserver eventually stops observing #751
Comments
fyi, backgrounding tabs exacerbates the issue as forgrounding does not appear to gracefully recover the connection |
fyi, this may be related to PING/PONG frames on the ws connection. PING alone is not enough the other end must reply with a PONG frame as well. also note that PING/PONG frames are not required to keep a ws alive unless one end is depending on it.. the other must recognize a ping frame and respond with a pong frame. https://stackoverflow.com/questions/46111656/must-websockets-have-heartbeats |
fyi, addObserver is stable when used with node |
Thx, will investigate. |
I suspect that this is the same issue we're having in https://railway.xyz - at some point, clients stop receiving fees from relayers and have to refresh the page to get it going again. @D4nte if you have a workaround we can implement on the frontend in the meantime - eg a timer to check if observers are observing and force reinitialize them if not - I'd love to hear it. |
@dao i have been pondering a workaround myself... one way would be to use the queryHistory function on a timer that keeps track of startdate as a pointer. the nodes also apparently support rpc calls and there are some docs on that, but i havent done a deep dive on it yet. |
fyi, i discovered this also applies to relay.send when used in a browser. after 10m it too stops working the issue does not appear to manifest when used in node |
Note: I was surprised by this bug because we used to have the issue with websockify and/or nginx and I resolved it using keep alive. |
Yes, this fix in nim-libp2p addresses the 10 min timeout issue for native websocket connections: vacp2p/nim-libp2p#721 We still need to update the submodule to have an effect in nwaku. |
i am only specifying a connection for bootstrap when invoking create.. i got it from the fleets json (testnet). im not specifying a port anywhere else after that (im assuming the js-waku library handles such) |
We are doing the very same thing
|
Ok fleets.status.im currently contains the 8000 port so it means it's nim-websocket. Also note that we discourage using fleets.status.im. This is a just a dev tool that can go down at any time. |
Ok that's not great because it means you are using websockify (port 443) and getting the timeout. I'll need to investigate. |
ok gents, happy to report send and receive have been stable in two separate tabs for over 30mins now (also surviving tab backgrounding) not sure what changed on your end, but keep up the good work! (maybe mem leak reboots?) current env... <script src='https://unpkg.com/js-waku@latest/build/umd/js-waku.min.bundle.js'></script>Waku.create({bootstrap: {peers: ["/dns4/node-01.gc-us-central1-a.wakuv2.test.statusim.net/tcp/8000/wss/p2p/16Uiu2HAmJb2e28qLXxT5kZxVUUoJt72EMzNGXB47Rxx5hw3q4YjS",]}, |
@dmh1974 excellent news! It was a fix in an underlying library that caused timeouts in websocket connections. Glad to hear it's working as expected. |
one important note... the connection does not survive 'sleeping'... if i reopen my laptop after some time, the connection does not re-establish gracefully. the page needs refreshing. |
The timeout was on nwaku's side (e.g. on the fleets), has been fixed on the latest |
@jrmeurer i havent.. hopefully that will be handled gracefully by js-waku in the future. |
@jm-clius i know its all related, but just for completion sake I'll mention on this ticket that even currently with observers stable, there is still a degree of message loss |
This might be a fix to this issue: https://github.com/status-im/status-web/pull/288/files |
Good potential. Will this change be added to js-waku? |
I have encouraged it to be pushed directly upstream in the right libp2p dependency which seems to be the right thing to do 🙂 |
Maybe keep this issue open until the problem is resolved - we have been researching why we haz this err and it's useful to see this issue |
Hi @michelleplur, yes the process is to keep issues open until the problem is resolved. Note that the final solution would be to backport the quoted patch to upstream libp2p dependency. This is next on the list after the refactoring/doc update is done: #802
Let me know if you have any question. |
Autodial actually happens when the connection is lost after 10min but the wrong multiaddr is used:
Need to investigate why the multiaddr is not being stored in the peer store. |
investigating and reported to libp2p too libp2p/js-libp2p#1484 |
Issue found: libp2p/js-libp2p#1484 (comment) upstream fix is being tested. |
Upstream fix proposed: libp2p/js-libp2p#1485 Note that latest js-libp2p automatically attempt to reconnect to the remote node when the connection drops. Which means that the workaround https://github.com/status-im/status-web/pull/288/files is not needed any more. This is done in the form of the autodialer: https://github.com/libp2p/js-libp2p/blob/a3847f2d1725b1c92d5e0ef7bcdf840ea8428a75/src/connection-manager/auto-dialler.ts which only dials when the number of connections is under a threshold. Unfortunately, an other issue was corrupting the address book after the One work around of the latest issue would be to do |
Actually, issue in nwaku/nim-libp2p, see waku-org/nwaku#1427 |
To keep in mind: status-im/status-web#320 |
Playing locally with (eth-pm example)
By killing a local nwaku node. Next step:
|
I have seen some issue when reconnecting after internet loss. Noise handshake goes well but identify protocol fails. waku-org/nwaku#1485 open but will investigate more locally. |
I have ran the eth-pm example locally (uses relay) for over 10 minutes in Firefox. The example could still send and receiv messages after the lapse time. It seems that this specific bug is now resolved. Keeping the issue open until the reconnection problem is solved. |
No problem either when running in brave from the hosted example https://examples.waku.org/eth-pm keeping the tab open for over 10min. |
new error reported by @therealjmj is that the timeout happens after 30-40min. If it's 2 different issues, the 30-40min timeout is more important because it's more generic use case and should prioritized. If it's the same issue, then let's look at waku-org/nwaku#1485 (comment) and understand how js-waku and nwaku can handle idle connections etc. |
Not sure if it is related but the inconsistency with relay messages is seen here: waku-org/examples.waku.org#199 |
As discussed with @danisharora099, this can be closed with guidelines for user that experience it again (enable debug log, share full logs) |
⬆️ Closing this for now. |
so far i have only seen this happen with js-waku used in a browser (have not tested node)
..and i am currently using the testnet (since prod has been fritzy)
i can reproduce the results here.. usually stops around the 10min mark...
https://gateway.pinata.cloud/ipfs/QmUS84DM9wLhmXnVf1EB41uoGc4HobTPc9W2CMAUWQRMFs?topic=BTCUSDT
The text was updated successfully, but these errors were encountered: