Websocket connection errors on startup should be clearer and not crash IPFS #1326

shessenauer · 2018-04-25T18:52:06Z

Version: "ipfs": "^0.28.2"
Platform: Linux on AWS EC2 (Dockerized API)
Subsystem: /ipfs/src/core/boot.js or p2plib?

Type: Bug

Severity:Critical - System crash, application panic.

Description:

Upon start of the application, it starts normally - then as soon as the IPFS js node gets initialized, the application throws a critical error with the following:

App starting..

events.js:165
      throw er; // Unhandled 'error' event
      ^

Error: websocket error
    at WS.Transport.onError (/path/to/application/node_modules/engine.io-client/lib/transport.js:64:13)
    at WebSocket.ws.onerror (/path/to/application/node_modules/engine.io-client/lib/transports/websocket.js:150:10)
    at WebSocket.onError (/path/to/application/node_modules/ws/lib/EventTarget.js:109:16)
    at WebSocket.emit (events.js:180:13)
    at WebSocket.finalize (/path/to/application/node_modules/ws/lib/WebSocket.js:182:41)
    at ClientRequest._req.on (/path/to/application/node_modules/ws/lib/WebSocket.js:653:12)
    at ClientRequest.emit (events.js:180:13)
    at HTTPParser.parserOnIncomingClient [as onIncoming] (_http_client.js:539:21)
    at HTTPParser.parserOnHeadersComplete (_http_common.js:117:17)
    at TLSSocket.socketOnData (_http_client.js:444:20)
Emitted 'error' event at:
    at done (/path/to/application/node_modules/ipfs/src/core/boot.js:58:19)
    at /path/to/application/node_modules/async/internal/parallel.js:39:9
    at /path/to/application/node_modules/async/internal/once.js:12:16
    at iterateeCallback (/path/to/application/node_modules/async/internal/eachOfLimit.js:44:17)
    at /path/to/application/node_modules/async/internal/onlyOnce.js:12:16
    at /path/to/application/node_modules/async/internal/parallel.js:36:13
    at done (/path/to/application/node_modules/ipfs/src/core/components/start.js:15:16)
    at series (/path/to/application/node_moduless/ipfs/src/core/components/start.js:39:25)
    at /path/to/application/node_modules/async/internal/parallel.js:39:9
    at /path/to/application/node_modules/async/internal/once.js:12:16

In the configuration, I have a node.js application that uses IPFS-js. I have the application dockerized as well as a full ipfs dockerized node that websockets as a peer to the IPFS-js app. This works perfectly fine and I have had no problem until last night when every single one of my environments started throwing the same error out of nowhere. Now none of my team can spin up the api because it has the same issue. One odd thing is that in the gap of time before it crashes, we can spam the API with a curl and get back the IPFS results for a quick second, otherwise it fails. I have confirmed it is not a websocket issue on my end and it is infact something with the IPFS-js package in the nodeJS application. Guidance is much appreciated.

Steps to reproduce the error:

npm install ipfs
set the options for websockets

let options = {
  config: {
    Addresses: {
      Swarm: [
        // '/dns4/wrtc-star.discovery.libp2p.io/tcp/443/wss/p2p-webrtc-star'
        '/dns4/ws-star.discovery.libp2p.io/tcp/443/wss/p2p-websocket-star'
      ]
    }
  }
}

initialize the node
const ipfsNode = new IPFS(options)

4.  Connect to swarm
ipfsNode.on('ready', () => {
  // Your node is now ready to use \o/
  console.log('IPFS Online Status: ', ipfsNode.isOnline())
  ipfsNode.swarm.connect(ipfsPeer, (err, result) => {
    console.log('connecting to peers: ', result)
    ipfsNode.swarm.peers((err, peerCount) => {
      console.log('There are this many peers: ', peerCount)
    })
  })

The text was updated successfully, but these errors were encountered:

victorb · 2018-04-26T08:43:58Z

I just found this morning that the ws-star server was hung. It responded to normal http requests properly, meaning that our health-checks though everything was fine but the actual websocket endpoints didn't work.

I've restarted the server and confirmed it to be working again, currently adding better health-checks to it now. Could you please retry and report back if it's working now?

shessenauer · 2018-04-26T16:44:35Z

@victorbjelkholm It is resolved now due to that fix, thank you so much! Much appreciated.

Mr0grog · 2018-04-26T22:24:00Z

It does seem like we could surface a better error here, though, saying something like could not connect to websocket server <address> or something. It wouldn’t fix the problem of a lot of people all depending on a particular server that went down, but it would at least make it more clear what happened.

victorb · 2018-04-27T08:15:21Z

@Mr0grog That's a good point, have been raised before in: #804 (comment)

Basically, current implementation crashes when the node can't start listening on the swarm address (since the endpoint does not), which leads to this crash. While I agree that the error message could be better, I would also say that it should be a warning instead of a error, and the daemon should continue booting even if a swarm address fails to open.

Mr0grog · 2018-04-27T14:58:08Z

I think I’d agree with all those points 😄

shessenauer · 2018-05-21T19:27:24Z

@victorbjelkholm I am getting this same error again on all my applications. Potentially may need to re-kick off your ws-star server

Astrovicis · 2018-06-06T22:10:00Z

Coworker of shessenauer here: we're still getting this error. Are there other servers we can add that would be more stable? Even better, if there was an option to check that would allow for the automatic addition of swarm nodes connected to the nodes listed in options.config.Addresses.Swarm so that the list of servers is dynamic and thus maybe more resilient? That would be fantastic. Is that congruent with the idea of a Swarm?

Question: will this happen if any one of the endpoints in options.config.Addresses.Swarm is down? Another way of asking this question is: Do all of the servers in options.config.Addresses.Swarm have to be up to avoid this error?

Thanks guys.

Astrovicis · 2018-06-07T18:30:36Z

Ok so after looking at it a little more closely, the problem is that IPFS never gets out of the 'starting' state after a websocket connection error is thrown. Is there/can there be a reconnect option for this sort of thing?

Mr0grog · 2018-06-11T18:12:11Z

Re-opening (and re-titling) this to track things a little more clearly. There are two things to address here:

This error message needs to be more clear (e.g. “failed to connect to websocket ${address}” or something).
A connection failure (probably for any swarm connection?) should not be fatal for IPFS.

These are tied in with #1325, but are concrete enough we should be able to address them sooner and more directly.

alanshaw · 2019-03-12T11:51:52Z

This was resolved by #1793

shessenauer closed this as completed Apr 26, 2018

Mr0grog mentioned this issue May 4, 2018

fix: error event propagation ipfs/js-ipfsd-ctl#233

Merged

Mr0grog changed the title ~~Ipfs-js websocket error on startup~~ Websocket connection errors on startup should be clearer and not crash IPFS Jun 11, 2018

Mr0grog reopened this Jun 11, 2018

fsdiogo self-assigned this Jun 12, 2018

fsdiogo added kind/bug A bug in existing code (including security flaws) P1 High: Likely tackled by core team if no one steps up status/in-progress In progress exp/expert Having worked on the specific codebase is important labels Jun 12, 2018

fsdiogo mentioned this issue Jun 18, 2018

fix: better websocket errors libp2p/js-libp2p-switch#265

Closed

Mr0grog mentioned this issue Jun 18, 2018

Using IPFS with electron (or muon) ipfs/notes#256

Open

fsdiogo mentioned this issue Jun 20, 2018

Taking control of Error Handling, reducing js-ipfs Daemon crashes #1406

Closed

fsdiogo removed their assignment Jul 4, 2018

alanshaw mentioned this issue Oct 31, 2018

don't throw error when connect failure #1649

Closed

alanshaw closed this as completed Mar 12, 2019

ghost removed the status/in-progress In progress label Mar 12, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Websocket connection errors on startup should be clearer and not crash IPFS #1326

Websocket connection errors on startup should be clearer and not crash IPFS #1326

shessenauer commented Apr 25, 2018

victorb commented Apr 26, 2018

shessenauer commented Apr 26, 2018

Mr0grog commented Apr 26, 2018

victorb commented Apr 27, 2018

Mr0grog commented Apr 27, 2018

shessenauer commented May 21, 2018

Astrovicis commented Jun 6, 2018 •

edited

Loading

Astrovicis commented Jun 7, 2018

Mr0grog commented Jun 11, 2018 •

edited

Loading

alanshaw commented Mar 12, 2019

Websocket connection errors on startup should be clearer and not crash IPFS #1326

Websocket connection errors on startup should be clearer and not crash IPFS #1326

Comments

shessenauer commented Apr 25, 2018

Type: Bug

Severity:Critical - System crash, application panic.

Description:

Steps to reproduce the error:

victorb commented Apr 26, 2018

shessenauer commented Apr 26, 2018

Mr0grog commented Apr 26, 2018

victorb commented Apr 27, 2018

Mr0grog commented Apr 27, 2018

shessenauer commented May 21, 2018

Astrovicis commented Jun 6, 2018 • edited Loading

Astrovicis commented Jun 7, 2018

Mr0grog commented Jun 11, 2018 • edited Loading

alanshaw commented Mar 12, 2019

Astrovicis commented Jun 6, 2018 •

edited

Loading

Mr0grog commented Jun 11, 2018 •

edited

Loading