Skip to content
This repository has been archived by the owner on Feb 12, 2024. It is now read-only.

offline boot: does not (re)bind #1837

Closed
pgte opened this issue Jan 22, 2019 · 13 comments
Closed

offline boot: does not (re)bind #1837

pgte opened this issue Jan 22, 2019 · 13 comments
Labels
exp/expert Having worked on the specific codebase is important kind/bug A bug in existing code (including security flaws) P1 High: Likely tackled by core team if no one steps up

Comments

@pgte
Copy link
Contributor

pgte commented Jan 22, 2019

When using js-ipfs v0.34.2 and websocket-star-multi with the ignore_no_online option, I'm able to start an js-ipfs embedded node while I have no internet connection.

But, if I regain connectivity, the transport doesn't try to bind to any of the given ws-star servers, rendering the current process unreachable.

What I would like to happen is to be able to bind (and rebind) if connectivity is intermitted, in spite of the node starting offline.

Is this something that should be handled by a) the app (and how?) b) libp2p or c) the ws-star-multi transport?

/cc @parkan

@parkan
Copy link
Contributor

parkan commented Jan 22, 2019

seems like this would need some combination of an ononline event listener and a periodic connectivity check with backoff

for whatever reason, healing connections for web apps seems to be a Hard Problem -- gmail, slack, etc all rely heavily on the manual checks (Retrying in 3s... Try now), which makes me suspect the navigator.isOnline/online events don't work very well

@parkan
Copy link
Contributor

parkan commented Jan 22, 2019

the API that I would like would be a libp2p level facility that imitates the browser online status, which the app can rely on regardless of the transport

overall, this feels like a libp2p concern, and the real fix probably requires implementing the optional addresses at that level as per #1793 (comment)

@parkan
Copy link
Contributor

parkan commented Jan 22, 2019

another idea would be to just periodically manually retry binding from inside the app as an interim solution, though I don't think there's an ipfs swarm command for doing that online right now

@alanshaw
Copy link
Member

cc @mkg20001

@alanshaw alanshaw added kind/bug A bug in existing code (including security flaws) exp/expert Having worked on the specific codebase is important status/ready Ready to be worked P1 High: Likely tackled by core team if no one steps up labels Jan 24, 2019
@mkg20001
Copy link
Contributor

mkg20001 commented Jan 25, 2019

I'm unsure whether the underlying logic for this should go into libp2p or ws-star-multi itself, but I feel like the app would still need to trigger it instead of it automatically happening but the app would still need a way of "forcing" some behaviours such as a manual .reconnect() function if the user is clicking a Reconnect Now button or a .cancelFurtherReconnects() for a Offline-Mode [×] toggle

Possible solutions:

  • ws-star-multi instance could emit reconnect(nullOrError, nullOrNextReconnectTime) events so the app could display disconnections from the ws-star network while automatically being handled by ws-star-multi itself
  • the above just in libp2p. existing close and listening events could be reused: libp2p would simply start reconnection if a listener would send a close event and libp2p isn't expecting this (when it's not trying to close the listener anyways) it would trigger the above re-connection cycle.
  • the above just in IPFS?

I'd go for either putting it into ws-star-multi (this could enable re-listening on servers that previously were down when the initial listen happened, but would require a hack to update the addresses in the peer-info) or putting it into libp2p core (possibly by mixing this with the proposal to make some addresses optional, so if a TCP address for e.x. is already used by another instance it would automatically start to listen on that address once the other instance is closed)

Opinions?

@mkg20001
Copy link
Contributor

So the pseudo-interface I made up so far:

class libp2p {
  // event(per-listener) reconnect(Error?: error, Number?: nextReconnectTime)
  // event(global) reconnect(Object?{[listener.address]: Error?}: errors, Number?: nextReconnectTime)
  set reconnectsEnabled (bool) {
    this._reconnectsEnabled = bool = Boolean(bool)
    if (bool) {
      recheckAndReconnect()
    } else {
      stopAllReconnects()
    }
  }

  get reconnectsEnabled() {
    return this._reconnectsEnabled
  }

  function recheckAndReconnect() {
    if (this._scheudledReconnect) {
      clearTimeout(this._scheudledReconnect)
      this._scheudledReconnect = null
      this._scheudledReconnectAt = null
    }
    iterateOverListeners(listener => {
      if (listener.needReconnect) {
        // somehow iterate over all listeners as key-value where address is key and do async listen
        // so it yields an object in the form of {"/ip4": Error("Something...")}
      }
    })
  }

  function _reconnectListener(listener) {
    listener.listen(listener.address, (err) => {
          if (err) {
            this._reconnectErrors[listener.address] = err // store errors for ui updates
            if (!this._scheudledReconnect) {
              const reconnectAt = Date.now() + this._reconnectCooldown // exponential growth? static? user specifiable growth function?
              this._scheudledReconnect = setTimeout(this.recheckAndReconnect.bind(this), reconnectAt) // scheudle reconnect
              this._scheudledReconnectTime = reconnectAt
            }

            this.emit('reconnect', this._reconnectErrors, this._scheudledReconnectTime) // notify app
            listener.emit('reconnect', err, this._scheudledReconnectTime) // notify per-listener watcher
          } else {
            delete this._reconnectErrors[listener.address]
            this.emit('reconnect', this._reconnectErrors, this._scheudledReconnectTime) // notify app
            listener.emit('reconnect') // notify per-listener watcher
          }
        })
  }

  /*
  listener.on('close', () => {
    if (!libp2pIsStopping) {
      if (this._reconnectsEnabled && !this._scheudledReconnect) {
        listener.needReconnect = true
      } else {
        this._reconnectListener(listener)
      }
    }
  })
  */
}

An app would hook it like this (but with actual react code, and not some random JSX-returning functions):

function offlineMode (enabled) {
  libp2p._reconnectsEnabled = enabled
}

libp2p.on('reconnect', (errors, nextReconnect) {
  if (errors) {
    return (<div>
      <h2>Reconnecting in {(nextReconnect - Date.now()) / 1000}...</h2>
      {Object.keys(errors).map(address => (<h4>{address} couldn't be reconnected: {errors[address]}))}
    </div>)
  } else {
    return (<div>Connected</div>)
  }
})

(Noticed that I missed events for when it begins reconnecing. This could either be a seperate event (reconnecting) or just a reconnect without nextReconnect variable set, addressess that are currently being dialed could have the error variable set to true to indicate a dial in progress - Also note that this was written without any type-checking just as a quick idea)

@pgte
Copy link
Contributor Author

pgte commented Jan 27, 2019

I think libp2p should try to do a good effort of keeping the peer connected:

  1. Delegating this to user-land makes app-development harder.
  2. Delegating this to each transport yields each transport more complex and error-prone.

Implementing this in libp2p would avoid these 2 problems, implementing reconnect logic in a generic way.

Also, I think that the reconnecting behaviour should be the default behaviour and not require special configuration. IMO, opting out of this behaviour should be the exception.

Here is to better offline / intermittent-connectivity support from libp2p! :)

@mkg20001
Copy link
Contributor

Implementing this in libp2p would avoid these 2 problems, implementing reconnect logic in a generic way.

Also, I think that the reconnecting behaviour should be the default behaviour and not require special configuration. IMO, opting out of this behaviour should be the exception.

Here is to better offline / intermittent-connectivity support from libp2p! :)

@pgte That's sort of what I've implemented in #1837 (comment)

Btw, could I get some feedback on that idea before I start implementing it?

@mkg20001
Copy link
Contributor

Also I've got a few questions that I need answered before continuing with this:

  • I assume that exponential backoff will be used on failures.
    • Should the user be able to specify their own backoff function?
    • Should the backoff be counted per-listener or globally?
  • Should there be a maximum amount of retries? Should there be an option for that?
  • How is reconnection state best communicated to an application that consumes the libp2p swarm? Via the reconnecting event as described above? Different method?
  • Should there be APIs like the ones described above to allow the application to change the reconnect behavior on the fly (such as forcing a reconnection before the end of the cooldown?)

@pgte
Copy link
Contributor Author

pgte commented Jan 28, 2019

@mkg20001 I'm not sure about the internals or the API (perhaps it's better to ask @jacobheun et al.), but in my opinion, this shouldn't require any new user-land API, reconnects should work out of the box.

One question (also to @jacobheun et al.) is what should be the transport interface that allows libp2p to manage dis/reconnects? I'm guessing the de-facto standard in node.js land would be emitting the events "connect" (once connected), "error", "reconnecting" and "end". I think that "connecting" can also be useful. Not sure how this plays with the current interface provided by a libp2p transport...

Internally, I'd like to see an exponential backoff (in the past I think I've used this package to achieve that), without a permanent failure (backoff ceils at about a few seconds).

IMO, all these backoff settings should be overridable.

@mkg20001
Copy link
Contributor

@pgte

Not sure how this plays with the current interface provided by a libp2p transport...

The equivalent for end would be close and connect would be listening for a libp2p transport

but in my opinion, this shouldn't require any new user-land API, reconnects should work out of the box.

In my proposal it did, but there are also extra APIs to force a reconnect at any point such as for adding a "Reconnect Now" button next to the countdown. IMO adding a few APIs to steer behavior is definitively a good idea.

Other than that I'd agree on the mentioned things.

Btw, @jacobheun could I get some feedback on my initial idea (See above)?

@jacobheun
Copy link
Contributor

I've started an issue in libp2p, libp2p/js-libp2p#312, to track this since it's been discussed in several places for various reasons and will require updates to several modules. I added my thoughts to the issue there.

@mkg20001 the main thing I can see becoming an issue there is with multiple transports and multiple addresses. Consuming reconnect type events from libp2p might make it hard for apps to understand the state. I think keeping the listening state for each listener in the switch and bubbling up the overall state of the switch would be easier to consume. Then apps would only need to care about the listening state of libp2p. We could still expose the ability to check the state of each transport, but I think in the majority of cases apps would only care if we are listening on at least 1 address.

@alanshaw
Copy link
Member

Please follow progress on this issue here libp2p/js-libp2p#312

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
exp/expert Having worked on the specific codebase is important kind/bug A bug in existing code (including security flaws) P1 High: Likely tackled by core team if no one steps up
Projects
None yet
Development

No branches or pull requests

5 participants