Skip to content
This repository has been archived by the owner on Feb 12, 2024. It is now read-only.

inconsistent node startup behavior depending on attempted ws-star swarm binding #1619

Closed
parkan opened this issue Oct 4, 2018 · 18 comments · Fixed by #1793
Closed

inconsistent node startup behavior depending on attempted ws-star swarm binding #1619

parkan opened this issue Oct 4, 2018 · 18 comments · Fixed by #1793
Labels
exp/expert Having worked on the specific codebase is important kind/bug A bug in existing code (including security flaws) P1 High: Likely tackled by core team if no one steps up

Comments

@parkan
Copy link
Contributor

parkan commented Oct 4, 2018

  • Version: 0.32.3
  • Platform: node
  • Subsystem: js-libp2p-websockets

Type: Bug

Severity: High

Description:

In the browser, we generally have to use websocket-star swarm addresses (since binding TCP is not available), however these addresses are temperamental and depend on the signaling server being available. Node startup will abort if the specified swarm address cannot be bound:

>  const options = {
...    "config": {
.....      "Addresses": {
.......        "Swarm": [
.......                 "/dns4/example.com/tcp/443/wss/p2p-websocket-star/" // bad
.......        ]
.......      }
.....    },
...    "EXPERIMENTAL": {
.....      "pubsub": true
.....    }
...  }
undefined
> const node = new IPFS(options)
undefined
> Error: websocket error
    at WS.Transport.onError (/home/arkadiy/tmp/dweb-transport/node_modules/engine.io-client...
> node.isOnline()
false

or even if ANY ws failures occur before an address could be bound:

>  const options = {
...    "config": {
.....      "Addresses": {
.......        "Swarm": [
.......                 "/dns4/example.com/tcp/443/wss/p2p-websocket-star/", // bad
.......                 "/dns4/ws-star.discovery.libp2p.io/tcp/443/wss/p2p-websocket-star/" // good
.......
.......        ]
.......      }
.....    },
...    "EXPERIMENTAL": {
.....      "pubsub": true
.....    }
...  }
undefined
>
> const node = new IPFS(options)
undefined
> Error: websocket error
    at WS.Transport.onError (/home/arkadiy/tmp/dweb-transport/node_modules/engine.io-client/lib/transport.js:64:13)
    at WebSocket.ws.onerror (/home/arkadiy/tmp/dweb-transport/node_modules/engine.io-client/lib/transports/websocket.js:150:10)
    at WebSocket.onError (/home/arkadiy/tmp/dweb-transport/node_modules/engine.io-client/node_modules/ws/lib/EventTarget.js:109:16)
    at WebSocket.emit (events.js:182:13)
    at WebSocket.EventEmitter.emit (domain.js:442:20)
    at WebSocket.finalize (/home/arkadiy/tmp/dweb-transport/node_modules/engine.io-client/node_modules/ws/lib/WebSocket.js:182:41)
    at ClientRequest._req.on (/home/arkadiy/tmp/dweb-transport/node_modules/engine.io-client/node_modules/ws/lib/WebSocket.js:653:12)
    at ClientRequest.emit (events.js:182:13)
    at ClientRequest.EventEmitter.emit (domain.js:442:20)
    at HTTPParser.parserOnIncomingClient [as onIncoming] (_http_client.js:546:21)
> node.isOnline()
false

however, if we succeed before we fail, the node starts up fine(-ish) and even lies about what addresses it's listening on:

>  const options = {
...    "config": {
.....      "Addresses": {
.......        "Swarm": [
.......                 "/dns4/ws-star.discovery.libp2p.io/tcp/443/wss/p2p-websocket-star/", // good
.......                 "/dns4/example.com/tcp/443/wss/p2p-websocket-star/", // bad
.......
.......        ]
.......      }
.....    },
...    "EXPERIMENTAL": {
.....      "pubsub": true
.....    }
...  }
undefined
> const node = new IPFS(options)
undefined
> Swarm listening on /dns4/example.com/tcp/443/wss/p2p-websocket-star/ipfs/QmSmwDi3AmMm3pFbyvzmRZ3FfLtNAtYv5ie7ispER1kGUB // are you really?
Swarm listening on /dns4/ws-star.discovery.libp2p.io/tcp/443/wss/p2p-websocket-star/ipfs/QmSmwDi3AmMm3pFbyvzmRZ3FfLtNAtYv5ie7ispER1kGUB

> node.isOnline()
true
> node.swarm.addrs(a => console.log(a))
null

finally, if we specify an empty swarm address set the node happily goes online:

>  const options = {
...    "config": {
.....      "Addresses": {
.......        "Swarm": [
.......        ]
.......      }
.....    },
...    "EXPERIMENTAL": {
.....      "pubsub": true
.....    }
...  }
undefined
> const node = new IPFS(options)
undefined
> node.isOnline()
true

(the latter is useful if you want to dial another node directly, for example)

there seem to be at least two problems here:

  • assumptions about node startup success based on TCP (where failure to bind a port is, reasonably, a fatal error) do not apply in more fragile environments like ws relay
  • no swarm addresses and no successfully bound swarm addresses are not equivalent
  • logic with multiple ws-star addresses is broken

some related discussion in ipfs-shipyard/peer-pad-core#13, though the fix is probably at js-ipfs level

@parkan
Copy link
Contributor Author

parkan commented Oct 4, 2018

I should add that this can be particularly insidious in that if you've ever configured your node with a "delegated" ws-star address, it will fail to start up if that signaling server goes down! (as @mitra42 discovered)

@parkan
Copy link
Contributor Author

parkan commented Oct 15, 2018

not sure who the most relevant reviewers would be for this? @mkg20001? @diasdavid?

/cc @pgte

@mkg20001
Copy link
Contributor

mkg20001 commented Oct 15, 2018

Not sure if https://www.npmjs.com/package/libp2p-websocket-star-multi still works but that thing should fix the issue in case of down-time of the ws-star server (assuming it still works)
(This should also really be an offical module and used in js-ipfs instead of using ws-star directly. I have already mentioned this in a few places but didn't invest much effort into advertising this module because, well, there is other stuff on my todo list that needs to be done, too)
Edit: Turns out it does no longer work (ws-star had some semver-major updates) and I can't even work on a fix because my 64GB SSD doesn't have space for yet another aegir installation (though I could clear up space, which I'm also likely going to do this week, making aegir smaller would be a great help with that, too)

@parkan
Copy link
Contributor Author

parkan commented Oct 22, 2018

thanks @mkg20001, that looks like it addresses the issue! this should definitely be built in (and probably the default)

can you give me a ballpark estimate for amount of work needed to bring this up to date? I'm a bit reluctant to sink a ton of effort into this because we're moving away from ws-star in the long term, but the current behavior is very broken

if we decide not to invest in bringing this up to date/into js-ipfs, I propose we:

  • make attempting to bind more than one address throw (I've definitely seen code that tries to do this)
  • consider whether there's an easy way to configure handling WS.Transport errors

@mkg20001
Copy link
Contributor

I will likely have time for that this week, but can't give any ETAs as I'm pretty busy right now

This should be relatively easy to integrate. Basically: Filter out any ws-star addresses beforehand, pass them as argument to ws-star-multi instance, add a /p2p-ws-star address to peer-info.

Updating ws-star-multi should be fairly easy as it's just spawning a bunch of ws-star instances and then proxying the dials to the right instance. Can't be this much in terms of changes needed (I hope)

@mkg20001
Copy link
Contributor

I've updated libp2p-websocket-star-multi, it should now work again

@parkan
Copy link
Contributor Author

parkan commented Oct 30, 2018

you rock @mkg20001

I will open a new issue on getting libp2p-websocket-star-multi into core

@satazor
Copy link
Contributor

satazor commented Dec 21, 2018

I just want to reinforce that it would be really awesome to get this resolved. In workshop/talks scenarios, it would be awesome to be able to point to have 2 swarm entries: one for a local rendezvous server, and another for the hosted one. Therefore, people would be able to seamlessly start a IPFS node where the connection to the internet is either flaky or non-existent.

At the moment, the IPFS node refuses to start when one of the swarm entries in unreachable, which is unfortunate.

@daviddias
Copy link
Member

@satazor I believe what you are looking for is just to add peers to the Bootstrap list -- https://github.com/ipfs/js-ipfs/blob/master/src/core/runtime/config-nodejs.js#L21-L41 --

@mkg20001
Copy link
Contributor

@satazor This is planned with #61, for now I can only recommend replacing ws-star with ws-star-multi (https://www.npmjs.com/package/libp2p-websocket-star-multi)

With a few code changes ws-star could be swapped for ws-star-multi transparently in a libp2p node, thus only failing if none of the provided servers can be reached or optionally never at all (That would be a few lines of hacky code more, but considering that ws-star - aka the holy grail of a quick hack - is still around makes me think that it can't make things a lot worse)

@parkan
Copy link
Contributor Author

parkan commented Dec 21, 2018

@daviddias connecting to bootstrap nodes is not the same as binding ws-star signaling server addresses, which took me a while to wrap my head around initially -- we need this change for that scenario (or most real-world usage where a signaling server could fail) to work

discussion on how to accomplish that in libp2p/js-libp2p-websocket-star#61, as @mkg20001 mentioned

I guess we can update our examples/docs to point people to libp2p-websocket-star-multi, but I really don't see a good reason for that behavior not to be the default, it definitely cannot be any more broken than what currently happens

@mkg20001
Copy link
Contributor

I propose...

diff --git a/package.json b/package.json
index 7a4c204..0d94408 100644
--- a/package.json
+++ b/package.json
@@ -140,6 +140,7 @@
     "libp2p-tcp": "~0.13.0",
     "libp2p-webrtc-star": "~0.15.5",
     "libp2p-websocket-star": "~0.10.0",
+    "libp2p-websocket-star-multi": "^0.4.0",
     "libp2p-websockets": "~0.12.0",
     "lodash": "^4.17.11",
     "mafmt": "^6.0.2",
diff --git a/src/core/runtime/libp2p-browser.js b/src/core/runtime/libp2p-browser.js
index fec12fa..88fa7da 100644
diff --git a/src/core/runtime/libp2p-nodejs.js b/src/core/runtime/libp2p-nodejs.js
index 1b59c43..f9886f8 100644
--- a/src/core/runtime/libp2p-nodejs.js
+++ b/src/core/runtime/libp2p-nodejs.js
@@ -3,17 +3,20 @@
 const TCP = require('libp2p-tcp')
 const MulticastDNS = require('libp2p-mdns')
 const WS = require('libp2p-websockets')
-const WebSocketStar = require('libp2p-websocket-star')
+const WebSocketStarMulti = require('libp2p-websocket-star-multi')
 const Bootstrap = require('libp2p-bootstrap')
 const KadDHT = require('libp2p-kad-dht')
 const Multiplex = require('libp2p-mplex')
 const SECIO = require('libp2p-secio')
 const libp2p = require('libp2p')
 const defaultsDeep = require('@nodeutils/defaults-deep')
+const multiaddr = require('multiaddr')
 
 class Node extends libp2p {
   constructor (_options) {
-    const wsstar = new WebSocketStar({ id: _options.peerInfo.id })
+    const wsstarServers = _options.peerInfo.multiaddrs.toArray().map(String).filter(addr => ~addr.indexOf('p2p-websocket-star'))
+    _options.peerInfo.multiaddrs.replace(wsstarServers.map(multiaddr), '/p2p-websocket-star') // the ws-star-multi module will replace this with the chosen ws-star servers
+    const wsstar = new WebSocketStarMulti({ servers: wsstarServers, id: _options.peerInfo.id, ignore_no_online: !wsstarServers.length || !_options.wsStarIgnoreErrors })
 
     const defaults = {
       modules: {

...will likely create a pr tomorrow after catching some sleep 💤

@mkg20001
Copy link
Contributor

PR is here #1793 🎉

@alanshaw alanshaw added kind/bug A bug in existing code (including security flaws) status/in-progress In progress exp/expert Having worked on the specific codebase is important P1 High: Likely tackled by core team if no one steps up and removed status/ready Ready to be worked labels Jan 18, 2019
@ghost ghost removed the status/in-progress In progress label Jan 21, 2019
@parkan
Copy link
Contributor Author

parkan commented Jan 21, 2019

@mitra42 this is fixed, which means we should be able to avoid the "unable to connect" errors on startup!

@mitra42
Copy link

mitra42 commented Jan 22, 2019

:-) though we haven't seen them since unconfiguring that address and rebuilding repo as suggested.

@parkan
Copy link
Contributor Author

parkan commented Jan 22, 2019

@mitra42 yes, binding 0 addresses as in the current config does in fact work regardless of this, but moving away from the direct dialing will require binding 1+ websocket addresses, which was previously fragile (i.e. this is a needed step to move away from the direct client<>IA connection)

@mitra42
Copy link

mitra42 commented Jan 22, 2019

Ok thanks - lets sync up (not on this issue) when we are both on decent bandwidth about what the current best config would be given what is/isn't working in IPFS currently and our setup.

1 similar comment
@mitra42
Copy link

mitra42 commented Jan 22, 2019

Ok thanks - lets sync up (not on this issue) when we are both on decent bandwidth about what the current best config would be given what is/isn't working in IPFS currently and our setup.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
exp/expert Having worked on the specific codebase is important kind/bug A bug in existing code (including security flaws) P1 High: Likely tackled by core team if no one steps up
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants