-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bitswap Sessions #3786
Comments
If we already assume that the "active set" peers likely have all children of the wanted root, couldn't we tell them to "consider all children of $hash wanted by me"? This would be a super light wantlist update :) And in case they don't have all children of the wanted root, we can adjust reactively just like you're saying. If we don't receive any more wanted blocks from the session (all children considered wanted), then we remove it from the active set, and try |
@lgierth That should be possible later with ipld selectors, but for now it would be a breaking change to the protocol (and would make the duplicate blocks problem likely worse) |
Is the duplicate blocks problem the primary issue? Are there other considerations to weigh? (Writing from an uninformed perspective and curious about the scalability challenges.) |
@btc It depends on the scenario, There are really two main scalability issues that i'm looking to address here. The first is duplicate blocks, which are a pretty bad problem when you are fetching content that multiple peers in the network have. The second problem is that of broadcasting every wantlist update to every connected peer, that bit gets pretty expensive when fetching large items, even from a single other peer. My proposal (and WIP implementation in #3867 ) reduce both issues, at a small cost to the usecase of fetching a single large graph from a disparate set of peers. |
@whyrusleeping the idea sound solid, did you implement: "split up requests amongst peers"? |
@kevina no, not in my live PR. That bits coming next |
I think this is a great idea which should reduce the current overhead issue by leaps-and-bounds. One improvement I'd make would be to gossip more. It should generally be possible to gossip without breaking anything as adding fields to protocol buffers shouldn't break anything (and gossip should always be safe to ignore).
One way to (sort-of) do this without sending blocks up-front would be to gossip one of:
Bitswap would then make sure to ask peers that have already indicated that they have a block before spamming the rest of the peers in the session. This doesn't help pipeline deep and narrow trees as IPLD selectors would but it should help further reduce the number of unnecessary requests. Another improvement I'd try would be to add in a bit of cross-session gossip. Basically, whenever one talks to a bitswap peer, disregarding sessions, gossip a compressed wantlist (e.g., truncated hashes of the oldest N blocks on our want-list) and the set of blocks we have that we think they want (based on the last compressed wantlist they sent us). If we tune the parameters correctly, this may help ensure we have some potential peers lined up in case we can't find a block within a session (hopefully without introducing too much overhead). However, this could be rather tricky to get right. Unrelated to gossip, we may also want to (persistently?) cache some of this session information to speed up future requests. For example, given:
If I receive |
@Stebalien Thanks for the feedback! Some thoughts: on gossip, I really like the idea, though implementing this would be difficult. The issue is that checking 'do i have all the objects in this graph?' isnt cheap, and its hard to cache as it ends up being a lot of data. A simpler tactic (related to your suggestions) would be to simply list the indexes of the links of this object that you also have. you could even extend this to do your third suggestion by doing index paths. This would work because the set of all links for a given node is deterministically ordered. The compressed wantlist idea is interesting, though sounds like it would get complicated fast. I've already put a significant amount of effort into making sure we keep track of who we tell about wanting what (to help avoid receiving duplicate blocks), spreading the wantlist more generally ups that complexity quite a bit. |
I agree.
The compressed wantlist idea is about telling peers about what we may want from them, not what we want right now, so that they can gossip what they have do have in case we we want it in the future; we wouldn't expect them to send the actual blocks up-front. Basically, the general idea of all this gossiping is to learn of where to find what we may want in the near future. That is, we gossip:
This way, when we go to fetch a block, we will likely already have a list of candidate peers. Note: By gossip, I really do mean gossip. This shouldn't trigger any additional background messages, just some extra bytes tacked onto existing messages. |
|
Hi guys, I just want to know if bitswap session cloud solve my problem? From the git log I'm afraid not. But this issue is opened, so... https://discuss.ipfs.io/t/big-file-transferring-problem/3819 Thanks! |
So do I understand this correctly that at the moment, when some piece of data is available on a lot of peers, that actually makes things worse because the poor requester will be flooded with answers from all the providers? This would seem to be a serious problem. Any recent work on this? |
@rklaehn Yes. From all connected providers that have block. Work is here: ipfs/go-bitswap#8 |
@hannahhoward is this one of the things you were planning to start on next in Bitswap land? |
I think this issue has been addressed by bitswap session improvements or superseded by discussion of GraphSync. |
Bitswap is pretty inefficient right now. There is a lot we can do to make it better (i would say it would be hard to make it worse, but i know for a fact that statement is false). I've spent a lot of time thinking of ways forward, and have come to a decent (in my opinion) 'next step'.
This proposal is to add the concept of sessions to bitswap. The idea expands on an existing optimization we do, where we only find providers for the first block in a
GetMany
request, and make moreFindProviders
calls later if nobody is sending us blocks.When starting a request for a portion of a graph (
ipfs refs -r
,ipfs pin
,ipfs cat
, etc) we would create a new session with the hash in question as the session 'root'. Bitswap would then broadcast that ref to every bitswap partner. Any node that responds with the block (even if its a duplicate) gets added to that sessions 'active set'. Further hashes requested that are children of the session root will be broadcasted just to those peers (we can even split up requests amongst peers). If a requested hash isnt received in a given time frame, we start aFindProviders
call for that hash, and adds the resultant peers to the sessions 'active set' (or perhaps add them to a 'to try set' and to the 'active set' when a block is received from them).Doing things this way will allow us to not spam uninterested bitswap peers with our wantlist updates, and also allow us to reduce duplicate blocks by splitting up request amongst peers we know are likely to respond.
Feedback and requests for elaboration will be much appreciated.
The text was updated successfully, but these errors were encountered: