-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Massive DHT traffic on adding big datasets #2828
Comments
The solution i'm investigating here is batching together outgoing provide operations, combining findpeer queries and outgoing 'puts'. The issue is that it might require making some backwards incompatible changes to the dht protocol, which isnt really a huge problem since we have multistream |
|
@magik6k Thanks! I will keep you posted as we make improvements to this. |
Similar report, while adding ~70,000 objects (~100k ea) we were maxing out our instance's traffic (70MBit in, 160MBit out) for over 14 hours (!!!), at which point someone killed the IPFS daemon. Back of the envelope, this is at least 1TB outgoing for ~10GB files. One particular problem: because the daemon was killed before the process completed, it seems that almost none of the files were pinned. Running
(side note, the chunk counts seem way too high too, almost 40x per image instead of expected 4x) |
@parkan which version of ipfs are you using? This is definitely a known issue, but it has been improved slightly since 0.4.2 |
@whyrusleeping 0.4.3-rc3 |
@whyrusleeping my intuition is that there's some kind of context that's not being released before a queue is drained, which is supported by none of the files being pinned successfully after killing the daemon. Does this seem plausible? |
@parkan are you killing the daemon before the add call is complete? |
@whyrusleeping no, these are all individual add calls that complete reasonably quickly, load/network traffic happens w/o client interaction |
That's a very serious bug to figure out. consistency is paramout here and that's much more important to get right.
This is caused by providing random access to all content, as discussed via email. for the large use case, we should look at recompiling go-ipfs without creating a provider record for each object, and leverage direct connections. (FWIW, orbit on js-ipfs works without the DHT entirely, and i imagine your use case here could do very well if you make sure to connect the relevant nodes together so there's no discovery to happen. This is the way to do it before pubsub connects them for you). |
You can check which objects are pinned after an add with: It also seems really weird that your repo size is listed as |
@parkan also, what is the process for adding files? is it just a single call to |
@parkan on chunk size, i'd be interested (separately) how well Rabin Fingerprint based chunking ( |
@whyrusleeping ipfs pin ls was showing <500 objects pinned successfully, I don't think any of them are from this add attempt also, small correction to my previous statement: not all
|
@whyrusleeping these are many separate adds, we basically write a temporary file (incidentally, being able to pin from stdin would be great), |
@parkan Ah, yeah. The pins arent written until the add is complete. An ipfs add call will only pin the root hash of the result of any given call to |
|
When the adds slow down that far could you get me some of the performance debugging info from https://github.com/ipfs/go-ipfs/blob/master/docs/debug-guide.md ? |
@whyrusleeping sorry, maybe I was being unclear: for 10k+ files the "add" part was completed (i.e. the call to re: stdin, I meant |
So, you did:
Right? I'm also confused about what is meant by 'adds slow down' if the add call completed by that point. Was this a subsequent call to re: stdin: you can indeed just pipe binary data to ipfs add :)
|
here's the parsed dump:
|
@whyrusleeping no, we are calling so, again:
The slowdown happens about 10k images into the loop |
Thanks for the stack dump, that just confirms for me that the problems are caused by the overproviding. The 2000 gorotuines stuck in runtime_Semacquire are somewhat odd though, could I get the full stack dump to investigate that one further? As for the adds, after each |
@whyrusleeping I'm trying to repro this behavior, but I am quite certain that we ended up with 10,000+ added but not pinned files in yesterday's run Full stax: https://ipfs.io/ipfs/QmP344ugRiyZRKA2djLgE371MzW2FhwbghnJyH7hmbmAeQ |
@casey you don't have to incur any traffic when adding data to ipfs. Adds can be done with the Also, the outbound traffic usage without that should be far lower than when this issue was created. cc @magik6k and @Stebalien for other input |
It should also get much lower in the upcoming release due to libp2p/go-libp2p-kad-dht#182. |
@whyrusleeping Thanks for the response! If I add data with |
Not unless they're already connected to your node, no. |
How do you invoke the |
It's a global flag, which, unfortunately, isn't added to the command-local help. You can find the help by running |
Unfortunately this doesn't help our use case. Nodes must be able to discover one another in the DHT. |
@Stebalien ah thank you! @casey you could look at using a circuit relay that might be able to solve the connectivity issue |
@postables Unfortunately, running relays between nodes is not feasible. We need connectivity between arbitrary nodes, and don't have the resources to expend on dedicated relays. |
With relay:
Without relay:
Haven't tried this but it should theoretically work in the mean time until these issues are resolved. |
As stated, relay nodes are not feasible. Without relay isn't feasible either, since all nodes are the same. This is a bittorrent-like use case, where all nodes are homogeneous peers, with no nodes having any particular role. As such, it's unclear how to determine which nodes would not use |
Not sure how this is unfeasible, all you would need is a single node that runs with If none of these options are desirable for you, then IMO IPFS is not suitable for your use case at this time, and it sounds like you should stick with BitTorrent. |
I think that's probably about right. It's unfortunate, since we'd really like to be compatible with the IPFS ecosystem. |
IPFS is still fairly young, and didn't start gaining wide spread attraction until this year, so you might just have to wait a bit 👍 IPFS has been progressing pretty nicely lately so definitely keep an eye out. You could also perhaps look into leveraging LibP2P |
Note: We are working on improving the provider (that's what this is called) logic as it obviously doesn't scale. However, it's on the back-burner at the moment while we finish up some of our in-progress endeavors. |
Do you have a link to the go code for the provider? Would be interesting to take a look at it. |
It's kind of all over the place. This PR tries to improve that a bit: #4333 |
Has there been any progress on this? Is this considered to be an issue, or is inserting/looking up each block of all file in the DHT going to be the behavior going forward? |
This is considered an issue. The short term plan is to just announce fewer blocks (requires some changes to bitswap to avoid issues with restarting downlaods). The long-term plan is to introduce additional, more efficient content routing mechanisms. |
I think a simple fix would be to mirror BitTorrent, in the way that the hash of the info dict serves as the "topic" marker. BT clients look up peers via the infohash, and then expect those peers to have blocks. IPFS could do the same thing, use the hash of a "topic" block that contains the hashes, directly or transitively, and expect peers that they find with the topic block hash will have content blocks. |
That's the "announce fewer blocks" plan. However, unlike bittorrent, there isn't really a true "root" hash. The best we can do is use the file root and the root of the request (e.g., use
|
My understanding of this is shaky, but if files are split into chunks automatically, then could the hash of the file serve as the topic hash for all the chunks? That would get get us down to one DHT lookup/insert per file. |
@magik6k was trying to add cdnjs (something I did few months ago) he have hit #2823 but also reported that adding the dataset which was about 21GB created about 3TB of traffic.
@magik6k could you give more precise data on this?
The text was updated successfully, but these errors were encountered: