Skip to content
This repository has been archived by the owner on Feb 26, 2021. It is now read-only.

Bring js-libp2p-websocket-star-multi into core #61

Open
parkan opened this issue Nov 8, 2018 · 14 comments
Open

Bring js-libp2p-websocket-star-multi into core #61

parkan opened this issue Nov 8, 2018 · 14 comments

Comments

@parkan
Copy link

parkan commented Nov 8, 2018

mkg20001/js-libp2p-websocket-star-multi substantially improves rendezvous server handling by allowing multiple connections and not suiciding startup if one of them is down. This dramatically improves stability and usability and should be the default. The current behavior is broken in multiple not entirely obvious ways, see ipfs/js-ipfs#1619 for details.

Looks like we can just transplant the logic from here onto the existing createListener method, or else the -multi version can be shipped alongside the regular one at <100 loc (in which case we should make the non-multi method throw if passed > 1 server)

Big thank you to @mkg20001 for bringing that module up to date with libp2p-websocket-star!

@jacobheun
Copy link
Contributor

I think the right thing to do here is to update interface-transport to allow listener.listen to accept an array of multiaddrs. We already support listening on multiple addresses for a given transport, but that transport doesn't have the ability to make decisions and handle errors for multiple addresses in the way that libp2p-websocket-star-multi does, and they should.

Here's what I am thinking:

This will also help keep all addresses in the peerId object without needing to specify addresses in the options.servers property that libp2p-websocket-star-multi currently does.

@mkg20001 @vasco-santos thoughts on this approach?

@mkg20001
Copy link
Member

mkg20001 commented Nov 9, 2018

@jacobheun Integrating ws-star-multi's features into libp2p core sounds like a great idea, though I'm unsure how the "0/1 servers are up" case should be handled. Should there be an option to make it throw and otherwise ignore ws-star in that case? (e.g. ws-star.throwIfServerOffline, default false)

@jacobheun
Copy link
Contributor

jacobheun commented Nov 9, 2018

Maybe the transport should have an option for retrying with a basic backoff and jitter. If retrying is not specified and we have 0 servers then an error is passed back. Otherwise, if we fail to listen on any addresses we will do a graceful backoff to retry listening. This could be repeated in the event the server goes down while we're listening.

We could make the retry logic on by default instead of having to specify it.

@mkg20001
Copy link
Member

mkg20001 commented Nov 9, 2018

Maybe the transport should have an option for retrying with a basic backoff and jitter.

Great idea

If retrying is not specified and we have 0 servers then an error is passed back.

I intended to solve the current "ws-star being down breaks my app" kindof problem by making the errors silent, non-fatal by default and allowing users to set an option to make it fatal. Maybe also more ws-star servers should be provided by default.
I could provide a fallback server again as I've got some unutilised resources on my server again (the prize you've got to pay is to spread the word that the server exists - the other servers got shutdown as nobody really used them)

@jacobheun
Copy link
Contributor

I think this might be a good option to pass to the switch and let it handle errors passed back by the listeners. We could also set the level of fatal, (no fatal errors, fatal if all transports failed to listen, fatal if one transport failed to listen), so users could get further control. We could do no fatal by default.

If the transport has retry turned on, it would never return an error, so the switch options wouldn't matter in this case. But with other transports in use that becomes a lot more useful.

@mkg20001
Copy link
Member

mkg20001 commented Nov 9, 2018

We could also set the level of fatal, (no fatal errors, fatal if all transports failed to listen, fatal if one transport failed to listen), so users could get further control

Even better

We could do no fatal by default.

I'd rather set it to "fatal if all failed to listen" by default

I could provide a fallback server

I created a new one at ws-star.mkg20001.io (config see https://github.com/mkg20001/infrastructure/blob/master/deploy.d/ws-star.mkg20001.io.yaml)

@jacobheun
Copy link
Contributor

jacobheun commented Nov 9, 2018

I'd rather set it to "fatal if all failed to listen" by default

That sounds good to me.

I created a new one at ws-star.mkg20001.io

Great! I'll try to spread the word.

@mkg20001
Copy link
Member

mkg20001 commented Nov 9, 2018

Great! I'll try to spread the word.

Best way is to throw it into READMEs and examples once the multi-server support is added

Edit: Also report issues on https://github.com/mkg20001/infrastructure/ if it's down or somehow not working

@parkan
Copy link
Author

parkan commented Nov 14, 2018

+1 for

  • adding (and documenting) additional public infra -- we should really have multiple infra@ maintained nodes, rather than private contributions (as much as it is appreciated @mkg20001)
  • retrying with backoff (however I would advocate for getting a basic yes/no multi-server support merged first)

a key issue here appears to be whether particular transports should default to fatal binding failure (local TCP) or forgiving errors (ws-star signaling servers) so that part should probably go in right away -- does that make sense?

@parkan
Copy link
Author

parkan commented Nov 14, 2018

also, I should re-iterate this in terms of priority: right now the use of a single currently-faulty (no matter how transient!) remote ws-star binding completely suicides the node startup for a js-ipfs app (even if the app doesn't actually require that binding to be alive), which is a showstopper bug that rapidly alienates potential users, so getting a fix for that behavior is a relatively high priority

@parkan
Copy link
Author

parkan commented Nov 14, 2018

I'd rather set it to "fatal if all failed to listen" by default

That sounds good to me.

I created a new one at ws-star.mkg20001.io

Great! I'll try to spread the word.

looks like we actually have 3 servers, so we can add these:

"ws-star0.ams.dwebops.pub ws-star.discovery.libp2p.io"
"ws-star1.par.dwebops.pub"
"ws-star2.sjc.dwebops.pub"

@jacobheun
Copy link
Contributor

looks like we actually have 3 servers, so we can add these:
Oh nice!

retrying with backoff (however I would advocate for getting a basic yes/no multi-server support merged first)

Retry logic should be fairly straightforward, but I agree the focus should be to get the multi-server support first.

@mkg20001
Copy link
Member

Anyone willing to take this one on?

@daviddias
Copy link
Member

Parallel proposal: Sunset the *-star protocols 🌅

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants