Content Resolution And Gateway Performance #6383

Stebalien · 2019-05-28T08:28:20Z

Finding content on the IPFS network can slow at times. This meta-issue tracks the issue, the reasons behind it, and the work being done to fix it.

There are two main causes for not being able to find content on the IPFS network:

There are either no IPFS peers providing the content or those peers are unreachable (e.g., they're behind a NAT).
~~The DHT can be slow.~~
- While not as fast as it could get, the DHT now resolves content in seconds (as of writing this, 50% = 1s, 95% = ~6s).
The DHT is forgetful.
The gateway is a shared resource.

Unreachable Peers

The first issue is being addressed with better NAT traversal:

~~AutoRelay (Enable AutoRelay by default #6290): Automatically use a relay if the IPFS node is behind a NAT.~~
- Status: AutoRelay is still experimental but doesn't appear to scale well. We're exploring alternatives.
Hole Punching (TCP, UDP). (no tracking issue yet)
WebRTC transport: Support for WebRTC transport libp2p/go-libp2p#188
- There's now a pure-go implementation of WebRTC so we're hoping this will start making progress again.
Dialback protocol: Dialback Protocol libp2p/go-libp2p#925
Project Flare: Project Flare(decentralised Hole Punching) Phase1 Meta Issue libp2p/go-libp2p#1039

Slow Content Resolution on the DHT

This issue was addressed in go-ipfs 0.5.0.

The primary issue was that most of the nodes DHT were unreachable (behind NATs). That meant every DHT query spent a significant amount of time trying to contact unreachable peers. This situation was improved improved by preventing NATed nodes from joining the DHT (libp2p/go-libp2p-kad-dht#216, libp2p/go-libp2p-kad-dht#330). At the moment, a large portion of the DHT is still behind NATs because our solution relies on nodes upgrading to go-ipfs 0.5.0, but the situation will improve as more and more nodes upgrade.

The second issue was that the previous DHT implementation doesn't correctly implement the Kademlia protocol (libp2p/go-libp2p-kad-dht#291) and instead continued querying the DHT past the point where it should have stopped. This makes queries took even longer than they should.

Both of these issues have been addressed in go-ipfs 0.5.0, please update.

Slow Content Publishing on the DHT

If you're adding data to go-ipfs and it takes a while to "show up", this might be because your node hasn't yet advertised the content. At the moment, advertising content is a slow sequential process where each block advertised can take many seconds.

There's an experimental "accellerated DHT" feature that aims to address this issue. However, this feature is experimental for a reason and will likely not be stabilized in it's current form (it's performs a resource intensive operation in the background to maintain a global view of the entire DHT).

If this feature is successful, we hope to ship a lighter-weight version in the future.

Unreliable DHT

Many of the nodes in the DHT are ephemeral so the DHT forgets information over time. While provider records are republished every 12 hours and published to multiple (20) peers, network churn (nodes joining/leaving) may still cause the network to forget these values.

A future release will fix this by implementing libp2p/go-libp2p-kad-dht#323.

The gateway is a shared resource

The gateway is a shared resource used by many parties. It's not designed to be a reliable service for building high-load web services. If you need such a gateway, we recommend that you run one yourself or pay an "IPFS pinning" service to host your content.

Stebalien · 2019-05-28T08:28:37Z

cc @vyzo, @raulk am I missing anything here?

jasonzhouu · 2019-05-28T09:52:27Z

Maybe the suggestion in https://discuss.ipfs.io/t/proposal-peer-hint-uri-scheme/4649 should be considered, i.e. giving users an option to specify the location of file if the file is not popular yet, so as to make DHT resolution much faster.

Instead, a more rational way for this would be to optionally add to the URI a hint where the file is available, i.e. which peer is the originator. This would hasten the download while the file isn’t popular yet.

It can solve the issue of #6382 , as the user @voxsoftware knows where the file located exactly.

anshbansal · 2019-07-16T02:43:39Z

Maybe the suggestion in https://discuss.ipfs.io/t/proposal-peer-hint-uri-scheme/4649 should be considered, i.e. giving users an option to specify the location of file if the file is not popular yet, so as to make DHT resolution much faster.

Instead, a more rational way for this would be to optionally add to the URI a hint where the file is available, i.e. which peer is the originator. This would hasten the download while the file isn’t popular yet.

It can solve the issue of #6382 , as the user @voxsoftware knows where the file located exactly.

There is this pinning service called Pinata https://pinata.cloud/documentation#AddHashToPinQueue which allows me to specify upto 5 addresses where my content is already available. Maybe they have a partial solution to this problem already?

I have not worked on protocol levels so maybe I am completely wrong here. But I messaged the team in their slack in case that helps.

Stebalien · 2019-07-16T03:00:22Z

That just punts the problem. IPFS is supposed to be a decentralized, content-addressed network. I agree we should consider adding, e.g., a header to gateway requests that says "please connect to this specific peer when trying to find this content" but that doesn't really fix the underlying issue.

Basically, we won't consider this fixed until a user can ipfs add a file and then fetch it from ~~another node~~ the gateway.

jasonzhouu · 2019-07-16T03:00:49Z

There is this pinning service called Pinata https://pinata.cloud/documentation#AddHashToPinQueue which allows me to specify upto 5 addresses where my content is already available. Maybe they have a partial solution to this problem already?

@anshbansal Thank you for the reply. Yeah, specifying "multiaddresses" where the file already exists when pinning file in pinata is very useful. But it is used for the purpose that help Pinata find your content faster, which is not general enough. It will be better if this feature can be added to IPFS.

Stebalien · 2019-07-16T03:05:46Z

@jasonzhouu fixed that. My point is that the suggestion doesn't fix the case where an ordinary user adds a file to their IPFS node and then tries to view it on the gateway.

jasonzhouu · 2019-07-16T03:12:20Z

the suggestion doesn't fix the case where an ordinary user adds a file to their IPFS node and then tries to view it on the gateway.

@Stebalien Yeah, it's true. If we can specify "multiaddress for the node that already has the content" when request file from IPFS gateway, the gateway can fetch it faster, like what pinata does.

kivutar · 2019-07-16T04:15:59Z

I can confirm this issue. Testing with 3 nodes:

an EC2 VM based in US (port 4001 open)
a dedicated server based in Europe (port 4001 open)
Pinata (they could pin my hash and I can resolve the hash on their gateway)

I'm unable to access my hash on ipfs.io or ipfs.infura.io hours after.

The hash is QmNNCcCF4ZRyuHutumcP9GSAgPXbzjjx1m4uddLwNsoAFg (this is public domain content)

anshbansal · 2019-07-16T04:19:00Z

That just punts the problem. IPFS is supposed to be a decentralized, content-addressed network. I agree we should consider adding, e.g., a header to gateway requests that says "please connect to this specific peer when trying to find this content" but that doesn't really fix the underlying issue.

Basically, we won't consider this fixed until a user can ipfs add a file and then fetch it from ~~another node~~ the gateway.

That would help make things faster. A slight variation could be to add an option to ipfs companion extension that allows us to specify the value of the header for the requests sent to the ipfs gateway. Or perhaps an API in the local IPFS daemon that the browser extension can use to see if the hash is present and add my address to the header automatically instead of the routing option currently present? This would also be helpful for the gateways if this is standardised as the searching for content would be easier in that case which might make operating gateways cheaper due to the reduced bandwidth costs.

"we won't consider this fixed" sounds great. Ultimately having it at the protocol level is the perfect solution.

@jasonzhouu fixed that. My point is that the suggestion doesn't fix the case where an ordinary user adds a file to their IPFS node and then tries to view it on the gateway.

It does fix it to some extent. If the browser extension adds the header to automatically include my address and disable the routing to local node then it should make it faster as the gateway knows where the hash is present.

It makes it easier for the first impressions to be better. If I can tell the gateway that my content is on this IP then the gateway can possibly cache it. That would make it easier for anyone using that gateway to see my content much faster. Now, if there was a standard way for the gateways to have links between each other (some trackers?) the content discovery should be faster between all gateways.

It is not a perfect solution as you mentioned that "IPFS is supposed to be a decentralized, content-addressed network". What we are doing is allowing central authorities to be able to serve content easier. I guess It does not help the de-centralized part.

DonaldTsang · 2020-09-10T03:43:28Z

I am thinking of some kind of "reputation/trus algorithm" relevant to #6097 ?
https://arxiv.org/ftp/arxiv/papers/1411/1411.3294.pdf

aarshkshah1992 · 2020-10-28T17:36:57Z

@Stebalien What's the "status" of libp2p/go-libp2p#188 ? Is there anything blocking us from going ahead with it now that we pion has full fledged WebRTC support ?

aarshkshah1992 · 2020-10-28T17:50:08Z

@jacobheun Assigning this to myself as this mostly deals with NATs/connectivity.

Stebalien · 2020-10-28T18:18:32Z

I don't know bug @jacobheun probably does.

dokterbob · 2020-11-04T09:42:45Z

As I've stated in #5541, at ipfs-search.com we've been structurally seeing around 40-60% timeout rates for fresh hashes. Part of this is due to us not (yet) using streaming listing - but it seems big part does seem to relate to go-ipfs (still) not being able to find content (although, for a while, after switching to 0.5.0 we suddenly had a much better rate - I'm still trying to figure out why that was - but we need time to gather reliable data).

jacobheun · 2020-11-04T12:32:29Z

What's the "status" of libp2p/go-libp2p#188 ? Is there anything blocking us from going ahead with it now that we pion has full fledged WebRTC support ?

Circling back on this we'll be starting with QUIC hole punching work and then expanding from there. There are several things we need to do in libp2p to make this work well. Once we have other things in place like direct connection upgrading and quic hole punching, we'll start looking at webrtc to expand our ability to connect to nodes not using quic.

although, for a while, after switching to 0.5.0 we suddenly had a much better rate

Finding content query times and success rates on the network were drastically improved in 0.5 and later, being able to retrieve that content is the next step we're working on solving which should create another large boost to getting content.

eminence · 2020-11-26T02:29:35Z

Some additional debugging tools would be useful I think. I have two nodes running 0.7.0, and both are directly connectable (i.e. they are either not behind a NAT, or they are behind a NAT with port forwarding). Running ipfs ls on one node (to get content available on the other node) has been stuck for hours. If I manually connect the nodes (with ipfs swarm connect), the content is fetch within seconds. This makes me think something funny is going on in the DHT, but I don't really have the tools to debug it.

dokterbob · 2020-12-01T09:31:24Z

@eminence Direct peering should provide a workaround for such issues https://github.com/ipfs/go-ipfs/blob/master/docs/config.md#peering

SomajitDey · 2021-06-08T13:32:16Z

I experienced significant increase in content resolution and gateway performance using the tcp reverse proxy provided by ngrok. After exposing my laptop's tcp://localhost:4001 through ngrok, the two gateways: https://ipfs.io/ipfs and https://gateway.pinata.cloud/ipfs/ found my locally hosted file within minutes after doing ipfs add.

The method is detailed at https://gist.github.com/SomajitDey/25f2f7f2aae8ef722f77a7e9ea40cc7c

Stebalien added topic/perf Performance topic/meta Topic meta labels May 28, 2019

This was referenced May 28, 2019

What is the reason for very slow #6382

Closed

Performace, or How IPFS will be better than BitTorrent #6342

Closed

Stebalien mentioned this issue Jun 4, 2019

why gateway timeout constantly? ipfs/ipfs#403

Closed

Stebalien mentioned this issue Jun 13, 2019

Content can sometimes only be found after hours or days #6385

Closed

lanzafame added the topic/gateway Topic gateway label Jul 16, 2019

Stebalien mentioned this issue Jul 16, 2019

Content routing hint via HTTP headers #6515

Open

This was referenced Jul 25, 2019

ipfs swarm2 got error on dial, etc #6357

Closed

Bug: Send objects example doesn't work ipfs-inactive/docs#139

Closed

Stebalien mentioned this issue Aug 21, 2019

Direct file transfer is slow #6599

Open

Stebalien mentioned this issue Sep 9, 2019

go-ipfs on gateways gets extremely slow #6564

Closed

benhylau mentioned this issue Sep 15, 2019

Slow IPFS content resolution tomeshnet/ipfs-live-streaming#99

Open

Stebalien pinned this issue Mar 2, 2020

Stebalien mentioned this issue Mar 26, 2020

Bootstrapping IPFS takes over 1 minute on a fresh repository #6658

Closed

jacobheun unpinned this issue Oct 6, 2020

aarshkshah1992 self-assigned this Oct 28, 2020

Stebalien mentioned this issue Mar 30, 2021

My file is not found after a while, after a restart it works again. #7905

Closed

Stebalien unassigned aarshkshah1992 Apr 22, 2021

Stebalien mentioned this issue Oct 6, 2021

is dnslink and ipfs is supposed to be slow at the first load time? #8476

Closed

3 tasks

vielmetti mentioned this issue Dec 27, 2021

superhighway84 hangs on first launch / can't seem to retrieve default database mrusme/superhighway84#1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Content Resolution And Gateway Performance #6383

Content Resolution And Gateway Performance #6383

Stebalien commented May 28, 2019 •

edited

Loading

Stebalien commented May 28, 2019

jasonzhouu commented May 28, 2019 •

edited

Loading

anshbansal commented Jul 16, 2019 •

edited

Loading

Stebalien commented Jul 16, 2019 •

edited

Loading

jasonzhouu commented Jul 16, 2019 •

edited

Loading

Stebalien commented Jul 16, 2019

jasonzhouu commented Jul 16, 2019 •

edited

Loading

kivutar commented Jul 16, 2019 •

edited

Loading

anshbansal commented Jul 16, 2019 •

edited

Loading

DonaldTsang commented Sep 10, 2020

aarshkshah1992 commented Oct 28, 2020

aarshkshah1992 commented Oct 28, 2020

Stebalien commented Oct 28, 2020

dokterbob commented Nov 4, 2020

jacobheun commented Nov 4, 2020

eminence commented Nov 26, 2020

dokterbob commented Dec 1, 2020

SomajitDey commented Jun 8, 2021

Content Resolution And Gateway Performance #6383

Content Resolution And Gateway Performance #6383

Comments

Stebalien commented May 28, 2019 • edited Loading

Unreachable Peers

Slow Content Resolution on the DHT

Slow Content Publishing on the DHT

Unreliable DHT

The gateway is a shared resource

Stebalien commented May 28, 2019

jasonzhouu commented May 28, 2019 • edited Loading

anshbansal commented Jul 16, 2019 • edited Loading

Stebalien commented Jul 16, 2019 • edited Loading

jasonzhouu commented Jul 16, 2019 • edited Loading

Stebalien commented Jul 16, 2019

jasonzhouu commented Jul 16, 2019 • edited Loading

kivutar commented Jul 16, 2019 • edited Loading

anshbansal commented Jul 16, 2019 • edited Loading

DonaldTsang commented Sep 10, 2020

aarshkshah1992 commented Oct 28, 2020

aarshkshah1992 commented Oct 28, 2020

Stebalien commented Oct 28, 2020

dokterbob commented Nov 4, 2020

jacobheun commented Nov 4, 2020

eminence commented Nov 26, 2020

dokterbob commented Dec 1, 2020

SomajitDey commented Jun 8, 2021

Stebalien commented May 28, 2019 •

edited

Loading

jasonzhouu commented May 28, 2019 •

edited

Loading

anshbansal commented Jul 16, 2019 •

edited

Loading

Stebalien commented Jul 16, 2019 •

edited

Loading

jasonzhouu commented Jul 16, 2019 •

edited

Loading

jasonzhouu commented Jul 16, 2019 •

edited

Loading

kivutar commented Jul 16, 2019 •

edited

Loading

anshbansal commented Jul 16, 2019 •

edited

Loading