Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Swarm discovery on the gateway (readonly) interface #3540

Closed
wants to merge 2 commits into from
Closed

Swarm discovery on the gateway (readonly) interface #3540

wants to merge 2 commits into from

Conversation

wigy-opensource-developer

As part of adding the light client feature, a new command discover is added to the readonly interface that returns a subset of the information that would be returned by ipfs swarm peers. This command could be used to discover or monitor the IPFS network without starting up an IPFS node with a valid peer ID.

  • This saves resources on the whole network, because a crawler does not need to register itself as a node on the network.
  • This allows mobile or web applications to bootstrap the network from gateway.ipfs.io and then spread the load of downloading content to the whole network.

The implementation of the command returns at most 5 peers picked randomly from the connected peers.

License: MIT
Signed-off-by: Wigy <wigy_opensource_developer@yahoo.com>
License: MIT
Signed-off-by: Wigy <wigy_opensource_developer@yahoo.com>
Copy link
Member

@whyrusleeping whyrusleeping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! This seems pretty useful, but before we merge this I think we should discuss the security and privacy implications of having such a command.

@@ -476,6 +507,21 @@ func peersWithAddresses(addrs []string) (pis []pstore.PeerInfo, err error) {
return pis, nil
}

func getNode(req cmds.Request, res cmds.Response) *core.IpfsNode {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should probably be getOnlineNode or something, and lets go ahead and return the errors and handle them in the caller.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I see copy-paste, I have the urge to extract and name the thing. Since all commands in this file wanted an online node and handled the error the same way, this seemed to be the logical step to reduce coupling. I even played with the idea to look further into other commands and introduce a middle-man on the req itself, but I stopped myself.

I can get rid of the res parameter, but then I would return (*core.IpfsNode, *commands.Error) and add a func (*commands.response) SetError(*Error) if that is fine with you.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just use a regular error for that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot. The 2 errors need different commands.ErrorType set. So I either

  • create my own error type,
  • reuse the commands.Error or
  • have 3 return arguments (*core.IpfsNode, commands.ErrorType, error).

Alternatively I inline this method back and wash my hands in innocence.

@whyrusleeping whyrusleeping requested a review from Kubuxu January 5, 2017 15:38
@wigy-opensource-developer
Copy link
Author

To my understanding, the swarm interface provides the same information already.

If there is an attack that needs this query on the readonly API, it can be done today using the swarm interface. This new query would make it clean what is the intent of the client and would actually save resources compared to the query on the swarm interface.

@whyrusleeping
Copy link
Member

@wigy-opensource-developer The API does provide roughly this information yes, the concern is exposing it through the readonly interface, leaving a way to gain extra information about any node running a public gateway.

@Kubuxu
Copy link
Member

Kubuxu commented Jan 5, 2017

Or any web site for that matter.

@wigy-opensource-developer
Copy link
Author

wigy-opensource-developer commented Jan 5, 2017

  1. I can create a new peerID and connect to port 4001 of any gateway to gather this information. Am I wrong?
  2. The webpages you mentioned could actually get further nodes and spread the load from this public gateway into the whole IPFS network. It is not a secret currently who runs IPFS nodes. Also, running a js-ipfs or go-ipfs node you can already spread the load on the network. It would actually improve the network if all clients could use any nodes on the network. Like the SPV wallets in BitCoin. @jbenet?

@whyrusleeping
Copy link
Member

@wigy-opensource-developer Only if that peer has their dht enabled, which can be disabled currently with --routing=dhtclient

@Kubuxu
Copy link
Member

Kubuxu commented Jan 6, 2017

Right, it gives random addresses of peers, but it is still potentially leaking meta, as the request can be done X times to get all unique peers and also it will show local peers. I don't know how to feel about it, it is delicate.

@wigy-opensource-developer
Copy link
Author

wigy-opensource-developer commented Jan 6, 2017

OK. So if you were asked to design a feature where mobile/website clients who just want to download stuff from the network (ipfs cat, ipfs resolve), and they wanted to spread this load to all consenting nodes, how would you do that?

Today the only way is to rely on gateway.ipfs.io or run an ipfs node on the mobile / in the browser. The 1st is quite error and censorship prone. The 2nd is inefficient for both development complexity and client bandwidth.

(I need to reiterate that to my best knowledge IPFS DHT is a public infrastructure, where by running a node you accept to give favours to the network in order to get favours from others)

@whyrusleeping
Copy link
Member

@wigy-opensource-developer Heres how i think we should move forward with this feature:

Add a config option to opt out of this endpoint.

This will be for people who want to run a public gateway, but don't want their immediate peers to be discovered. Think of users who want to run a private network, but want to expose a gateway for people to view their content in a manner they can control (libraries often fall into this category).

Clearly document the privacy implications of this feature

Enabling this makes it easy for people to get a snapshot of who you are connected to, which might not always be information you want to share. Users should be aware of this.

Add a few sharness tests

We should test that this actually works by testing it in an iptb cluster sharness test (see test/sharness/t0130-multinode.sh for an example). We should test that it returns no peers for a node not connected to anyone, returns N peers when M are requested when N is less than M. Just wanna make sure we cover edge cases.

@hsanjuan
Copy link
Contributor

Hey, thanks for this, but after giving it some thought, I don't think our API should support a random shuffle on swarm peers and expose it in the gateway:

  • We can discuss if Gateway should expose swarm peers. I think Gateway is an endpoint for IPFS content, not for IPFS network topology, but I might be wrong. In private networks this should definitely be turned-off by default.
  • If it did, I see no reason to have an endpoint that limits output to 5 (why 5, why not 6, or 2) randomly selected entries. It seems like given a full list of peers (like swarm peers does), the client could do the shuffle themselves if they need to (and we wouldn't have to maintain the extra code). In any case, if we were to offer this, it should be generalized: length limit, shuffle, paging.... should be options for the new endpoint (or for the existing swarm peers endpoint). I can't say how big the peers list can grow, but I don't expect it to be such a big number that not giving a full list hurts the client so much (vs. giving just 5).

The webpages you mentioned could actually get further nodes and spread the load from this public gateway into the whole IPFS network

  • Nothing assures you that a peer in the network is also acting as gateway so that your mobile application can use it. In fact, they don't by default, so you probably cannot assume that you can request content from a random subset of peers connected to the gateways.

I do like the purpose of this PR (to make it easier for applications to integrate with IPFS), I'm just not sure this is the best approach. At least, it seems too use-case specific for my like and I'm not even convinced it addresses the issue.

@wigy-opensource-developer
Copy link
Author

wigy-opensource-developer commented Jan 13, 2017

Thanks for the feedback, guys. Based on your comments I have the following suggestion:

Let a node decide with a config flag whether it wants to be available for these light clients or not. By default, newly installed nodes would participate in this responsibility, but they can opt-out. Those nodes who allow light clients to use them need to know whether their peers allow light clients and list only those.

The limit of 5 is quite arbitrary that protects bootstrap nodes to spend too much bandwidth on listing hundreds of peers and this limit sends light clients faster away into further from the neighborhood of bootstraps. The random shuffling helps naive client implementations being distributed more evenly on the network as a whole. Exhaustive traversal of the network is made more difficult this way, but I anticipate that as the less frequent use-case. Maybe an --all option could list all peers that allow light clients to connect.

@whyrusleeping
Copy link
Member

@wigy-opensource-developer I do think we should provide a way for light clients to be able to more easily crawl the network. I think its probably worth refining our usecases here a bit more though. The light client only talks to gateways through the http api correct? In that case, why does it want to get swarm information from the gateway? Is it going to hope that those nodes might have a gateway running on port 8080?

@wigy-opensource-developer
Copy link
Author

Exactly. The client knows some seed nodes that run a gateway. Those answer this new command and reveal other nodes that run a gateway. And so on. This way a client is able to bootstrap a whole network of gateways that are able to serve readonly requests. This improves reliability, more resistant to censorship, hides metadata, and gives a break to gateway.ipfs.io to save on bandwidth.

For this to happen, as I understand, the gateway needs to know, which peers are open gateways and need to return their endpoints. Having hundreds of nodes in that response does not seem to be efficient, so I shuffled only 5 of them from all the peers.

The current code assumes that all peers are open gateways, which I learned in this PR is not true.

@whyrusleeping
Copy link
Member

@wigy-opensource-developer yeah, most nodes actually don't run a public gateway (its off by default). I definitely like the motivation here though and want to make this happen (light clients are an important thing to get).

Other than adding some way to signal capabilities outside of libp2p services, i'm not really sure what the best way forward here is.

@wigy-opensource-developer
Copy link
Author

wigy-opensource-developer commented Mar 22, 2017

Any gateway that wants to support this command must be able to decide whether its peers support it. I have 2 ideas how to know whether a peer supports it:

@ghost ghost self-requested a review March 25, 2017 02:22
Copy link
Member

@Kubuxu Kubuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure how to move it forward.

@ghost
Copy link

ghost commented Oct 9, 2017

Hey, sorry to jump in very late here. The use case for adding this from-the-outside-discovery command is not very convincing to me, especially since there's a lot of overlap with the existing ipfs dht findpeer command. I also think the swarm is an area where we clearly don't need or want interoperability with outside systems.

To summarize, the cost of this feature outweighs the gains, and given discussions at the last team meeting about only adding new features that absolutely essential (or trivially cheap), I'll put a 👎 on this PR.

Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👎 Not convinced we should add this feature. See my comment above for more info.

@wigy-opensource-developer
Copy link
Author

wigy-opensource-developer commented Oct 9, 2017

Well, okay. So the way you imagine it is to use a DHT client-only client running on the mobile phone and mapping out the P2P network from the inside as a peer? Because that is still possible. It is also possible with reinitializing the peer ID for every crawling.

On the other hand, if you want to close down the "crawlability" of the whole DHT (which kind of also make sense), then ipfs dht findpeer and ipfs swarm peers need a better trust-model.

@ghost
Copy link

ghost commented Oct 9, 2017

I don't even understand why you'd want to scan the network from a mobile client :)

We can discuss exposing ipfs swarm peers and ipfs dht findpeer under :8080/api though, as @hsanjuan suggested.

@wigy-opensource-developer
Copy link
Author

wigy-opensource-developer commented Oct 9, 2017

Exposing ipfs swarm peers on the gateway port would completely solve my use-case, although it would waste bandwidth around the bootstrap nodes. To save bandwidth, I would introduce a --max-count parameter to that command, which would do the random shuffle as in this PR.

As the light-client proposal stated, we want to avoid centralized gateways between the mobile phones and the storage nodes. Each mobile would start bootstrapping the IPFS network and would crawl to random storage nodes that it would use as a read-only gateway. This way if any gateway goes down, the mobile clients would still be able to read the contents on the network.

Currently the status quo is that mobile apps just rely on gateway.ipfs.io to read content, which is really bad for a decentralized storage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants