-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Swarm discovery on the gateway (readonly) interface #3540
Swarm discovery on the gateway (readonly) interface #3540
Conversation
License: MIT Signed-off-by: Wigy <wigy_opensource_developer@yahoo.com>
License: MIT Signed-off-by: Wigy <wigy_opensource_developer@yahoo.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! This seems pretty useful, but before we merge this I think we should discuss the security and privacy implications of having such a command.
@@ -476,6 +507,21 @@ func peersWithAddresses(addrs []string) (pis []pstore.PeerInfo, err error) { | |||
return pis, nil | |||
} | |||
|
|||
func getNode(req cmds.Request, res cmds.Response) *core.IpfsNode { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should probably be getOnlineNode
or something, and lets go ahead and return the errors and handle them in the caller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I see copy-paste, I have the urge to extract and name the thing. Since all commands in this file wanted an online node and handled the error the same way, this seemed to be the logical step to reduce coupling. I even played with the idea to look further into other commands and introduce a middle-man on the req itself, but I stopped myself.
I can get rid of the res parameter, but then I would return (*core.IpfsNode, *commands.Error)
and add a func (*commands.response) SetError(*Error)
if that is fine with you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just use a regular error for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot. The 2 errors need different commands.ErrorType set. So I either
- create my own error type,
- reuse the commands.Error or
- have 3 return arguments
(*core.IpfsNode, commands.ErrorType, error)
.
Alternatively I inline this method back and wash my hands in innocence.
To my understanding, the swarm interface provides the same information already. If there is an attack that needs this query on the readonly API, it can be done today using the swarm interface. This new query would make it clean what is the intent of the client and would actually save resources compared to the query on the swarm interface. |
@wigy-opensource-developer The API does provide roughly this information yes, the concern is exposing it through the readonly interface, leaving a way to gain extra information about any node running a public gateway. |
Or any web site for that matter. |
|
@wigy-opensource-developer Only if that peer has their dht enabled, which can be disabled currently with |
Right, it gives random addresses of peers, but it is still potentially leaking meta, as the request can be done X times to get all unique peers and also it will show local peers. I don't know how to feel about it, it is delicate. |
OK. So if you were asked to design a feature where mobile/website clients who just want to download stuff from the network (ipfs cat, ipfs resolve), and they wanted to spread this load to all consenting nodes, how would you do that? Today the only way is to rely on gateway.ipfs.io or run an ipfs node on the mobile / in the browser. The 1st is quite error and censorship prone. The 2nd is inefficient for both development complexity and client bandwidth. (I need to reiterate that to my best knowledge IPFS DHT is a public infrastructure, where by running a node you accept to give favours to the network in order to get favours from others) |
@wigy-opensource-developer Heres how i think we should move forward with this feature: Add a config option to opt out of this endpoint.This will be for people who want to run a public gateway, but don't want their immediate peers to be discovered. Think of users who want to run a private network, but want to expose a gateway for people to view their content in a manner they can control (libraries often fall into this category). Clearly document the privacy implications of this featureEnabling this makes it easy for people to get a snapshot of who you are connected to, which might not always be information you want to share. Users should be aware of this. Add a few sharness testsWe should test that this actually works by testing it in an iptb cluster sharness test (see |
Hey, thanks for this, but after giving it some thought, I don't think our API should support a random shuffle on
I do like the purpose of this PR (to make it easier for applications to integrate with IPFS), I'm just not sure this is the best approach. At least, it seems too use-case specific for my like and I'm not even convinced it addresses the issue. |
Thanks for the feedback, guys. Based on your comments I have the following suggestion: Let a node decide with a config flag whether it wants to be available for these light clients or not. By default, newly installed nodes would participate in this responsibility, but they can opt-out. Those nodes who allow light clients to use them need to know whether their peers allow light clients and list only those. The limit of 5 is quite arbitrary that protects bootstrap nodes to spend too much bandwidth on listing hundreds of peers and this limit sends light clients faster away into further from the neighborhood of bootstraps. The random shuffling helps naive client implementations being distributed more evenly on the network as a whole. Exhaustive traversal of the network is made more difficult this way, but I anticipate that as the less frequent use-case. Maybe an |
@wigy-opensource-developer I do think we should provide a way for light clients to be able to more easily crawl the network. I think its probably worth refining our usecases here a bit more though. The light client only talks to gateways through the http api correct? In that case, why does it want to get swarm information from the gateway? Is it going to hope that those nodes might have a gateway running on port 8080? |
Exactly. The client knows some seed nodes that run a gateway. Those answer this new command and reveal other nodes that run a gateway. And so on. This way a client is able to bootstrap a whole network of gateways that are able to serve readonly requests. This improves reliability, more resistant to censorship, hides metadata, and gives a break to gateway.ipfs.io to save on bandwidth. For this to happen, as I understand, the gateway needs to know, which peers are open gateways and need to return their endpoints. Having hundreds of nodes in that response does not seem to be efficient, so I shuffled only 5 of them from all the peers. The current code assumes that all peers are open gateways, which I learned in this PR is not true. |
@wigy-opensource-developer yeah, most nodes actually don't run a public gateway (its off by default). I definitely like the motivation here though and want to make this happen (light clients are an important thing to get). Other than adding some way to signal capabilities outside of libp2p services, i'm not really sure what the best way forward here is. |
Any gateway that wants to support this command must be able to decide whether its peers support it. I have 2 ideas how to know whether a peer supports it:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure how to move it forward.
Hey, sorry to jump in very late here. The use case for adding this from-the-outside-discovery command is not very convincing to me, especially since there's a lot of overlap with the existing To summarize, the cost of this feature outweighs the gains, and given discussions at the last team meeting about only adding new features that absolutely essential (or trivially cheap), I'll put a 👎 on this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👎 Not convinced we should add this feature. See my comment above for more info.
Well, okay. So the way you imagine it is to use a DHT client-only client running on the mobile phone and mapping out the P2P network from the inside as a peer? Because that is still possible. It is also possible with reinitializing the peer ID for every crawling. On the other hand, if you want to close down the "crawlability" of the whole DHT (which kind of also make sense), then |
I don't even understand why you'd want to scan the network from a mobile client :) We can discuss exposing |
Exposing As the light-client proposal stated, we want to avoid centralized gateways between the mobile phones and the storage nodes. Each mobile would start bootstrapping the IPFS network and would crawl to random storage nodes that it would use as a read-only gateway. This way if any gateway goes down, the mobile clients would still be able to read the contents on the network. Currently the status quo is that mobile apps just rely on gateway.ipfs.io to read content, which is really bad for a decentralized storage. |
As part of adding the light client feature, a new command
discover
is added to the readonly interface that returns a subset of the information that would be returned byipfs swarm peers
. This command could be used to discover or monitor the IPFS network without starting up an IPFS node with a valid peer ID.The implementation of the command returns at most 5 peers picked randomly from the connected peers.