Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to restrict gateway to only serve locally available / cluster content #5513

Open
rotemdan opened this issue Sep 23, 2018 · 12 comments
Open
Labels
topic/gateway Topic gateway

Comments

@rotemdan
Copy link

rotemdan commented Sep 23, 2018

I'm investigating the suitability of IPFS as a server side file storage and distribution medium.

I can manage, upload and pin files through the HTTP REST API (:5001). I would like to have the stored files available both through the IPFS network and through HTTP. The gateway seems like an easy, simple solution to provide the files directly through HTTP with good latency to the user (and would possibly be reverse proxied through NGINX and/or a third-party CDN as well).

The only issue is, I couldn't find a way to limit it to only provide locally pinned content. Making a custom intermediate server to filter out requests seems unnecessary and would require maintaining a duplicate (and probably inefficient) index into the IPFS datastore. Since it might serve millions of files, maintaining a gigantic ipns-based ipld document to index the files also seems wasteful and inefficient (and a possible privacy issue if directory content is exposed).

I'm not interested (at this time) in creating a private network, using --offline or using custom bootstrap nodes, since I want to data to be available through the public IPFS network as well.

I believe this "dual-stack" approach might be reasonably classified as "plausible" (given that IPFS matures to the point it provides sufficient value to be used in mainstream projects), so I decided to publish it here as a feature request (in case it is not already available! in that case I'd be really happy to know how to achieve this!)

(Edit: as a natural extension, it would probably also be useful to have an option to only serve content pinned by a cluster of servers -- thus any node in the cluster could act as a restricted IPFS gateway - that only serves content hosted within the cluster itself)

@magik6k
Copy link
Member

magik6k commented Sep 23, 2018

So I don't think there is an easy way to tell if a block is pinned without loading the whole pinned tree to the memory (which seems to be what GC currently does).

For a simpler solution - we could ad an option which makes the gateway not look for blocks in the network and instead only use what already is in the blockstore (so only pinned data and cached content (which can be removed with ipfs repo gc)), essentially making the gateway offline. Would that work for you?

@magik6k magik6k added the topic/gateway Topic gateway label Sep 23, 2018
@rotemdan
Copy link
Author

For the most part, given that the kind of node I'm describing would be dedicated to only storing content, and not retrieving it from the larger network (possibly aside from replicating other cluster nodes, which I'll describe next), almost all local content would be pinned anyway, so I guess simply checking for local availability, regardless of pin status, could work (if the performance would be significantly better I would probably choose this less restrictive option anyway I guess).

For a cluster, I guess it would mean that the gateway would be effectively "local" in relation to the cluster. I'm not very familiar with IPFS internals but I could imagine that the DHT would be queried in such a way to constrain the results to "whitelist" only sources originating from within the cluster. In any case, in the vast majority of requests, the hashes would be resolved very quickly, since the nodes would have very good network connectivity to each other, even if geographically disparate. In the rare cases when clients try to "abuse" the gateway by using it as a proxy to the larger IPFS network, the request would simply stall and timeout (I'm not sure if there's a DHT timeout setting for this type of query but it could possibly be set reasonably low to mitigate this scenario).

There are some interesting prospects to having something like this. It seems like a relatively simple/cheap way to run a highly-available CDN, where, since each node also acts as gateway, popular content is automatically fetched and cached by other cluster nodes (in addition to client nodes from outside the cluster) (of course all this would only be truly feasible given that the datastore is scalable and performant enough, and the software stable/mature enough etc.)

@rotemdan rotemdan changed the title Option to restrict gateway to only serve pinned content Option to restrict gateway to only serve locally available / cluster content Sep 23, 2018
@magik6k
Copy link
Member

magik6k commented Sep 24, 2018

For the cluster case - this would probably have to be implemented at bitswap level, where we'd filter from which peers we want to fetch content. We could do that at lower level, but:

  • If we made this on the multiplexer level (allow bitswap to only connect to some allowed set of peers), we wouldn't be able to send blocks to peers outside the allowed group / cluster
  • If we were to do this on the swarm level, this could just be a private network.

@rotemdan
Copy link
Author

rotemdan commented Sep 24, 2018

Thanks for looking at this. I've found an alternative approach to filter URLs using cryptography instead (for the cluster case mainly, since the local-only case is trivial to implement efficiently), though it requires additional intermediary filtering server (unless IPFS would support it as a part of the CID/URI spec) and has various other limitations:

Instead of links, being, say:

https://my-cdn.com/ipfs/<IPFS-CID>

Have them as

https://my-cdn.com/signed-ipfs/<IPFS-CID>-<HMAC(KEY, IPFS-CID)>

So every request would be required to include a signature that would be verified by an intermediary HTTP server (possibly running on each node), or, as I mentioned, the gateway itself.

Limitations:

  • CIDs must be known and signed ahead of time.
  • In case of deleted content, links cannot be revoked, unless some sort of expiration time is included in the CID (e.g. <IPFS-CID>-<EXP-TIME>-<HMAC(KEY, IPFS-CID, EXP-TIME)>, which in that case it needs to be periodically renewed - this might not be feasible in some cases.
  • The gateways would possibly retrieve content from any node that has it - including nodes running on client machines - yielding the somewhat paradoxical scenario where servers are pulling content from their own clients -- which may or may not be desirable (why would the servers waste their own's/client's bandwidth if they have great connectivity to their peer nodes? [with possibly cheaper/free bandwidth costs], this could also reduce performance if the connections to clients are too slow and IPFS needlessly chose to pull from them).

@ozars
Copy link

ozars commented Jan 17, 2019

A restricted gateway would be a quite useful feature for mirroring large datasets receiving frequent updates as well.

Is there any endpoint in API which returns information about whether an object path is pinned (e.g. /api/v0/pin/get)? If so, a reverse proxy to the gateway could be configured to filter requests to accept only if the requested object is pinned. It would be a lot better if this is implemented in the IPFS itself, but this might be a simple workaround until then.

@Stebalien
Copy link
Member

The next release (which I need to get out the door ASAP...) will have a Gateway.NoFetch option. However, that may not be sufficient for the cluster use-case.

See: #5649

@kyledrake
Copy link

This is a very good feature to have and I'm glad it's being released soon.

There's going to be a lot of use cases where people want to have an HTTP convenience gateway for their own pinned sites/content, but are unwilling to allow all content from everyone to be served from their HTTP servers as a side consequence.

@magik6k
Copy link
Member

magik6k commented Jan 18, 2019

Note that few read-only /api endpoints aren't yet covered by this option - see #5649 (comment) for the list

@Stebalien
Copy link
Member

@magik6k speaking of which, can you file an issue for that so we don't forget?

@magik6k
Copy link
Member

magik6k commented Jan 18, 2019

#5929

@KevinYum
Copy link

KevinYum commented Sep 7, 2022

Gateway.NoFetch has already been a great move for gateway!

For cluster use case, how about we could extend Gateway.NoFetch to something like Gateway.FetchOnlyFromSpecificPeers? My current practice would be use some load balancer in front of cluster nodes' gateway.

@Jorropo
Copy link
Contributor

Jorropo commented Sep 7, 2022

@ywk248248 this can be done by setting Routing:none, removing the bootstrap peers and peering with your specific peers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic/gateway Topic gateway
Projects
None yet
Development

No branches or pull requests

7 participants