Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bitswap Sessions #3786

Closed
whyrusleeping opened this issue Mar 15, 2017 · 15 comments
Closed

Bitswap Sessions #3786

whyrusleeping opened this issue Mar 15, 2017 · 15 comments
Labels
kind/enhancement A net-new feature or improvement to an existing feature P0 Critical: Tackled by core team ASAP status/in-progress In progress

Comments

@whyrusleeping
Copy link
Member

Bitswap is pretty inefficient right now. There is a lot we can do to make it better (i would say it would be hard to make it worse, but i know for a fact that statement is false). I've spent a lot of time thinking of ways forward, and have come to a decent (in my opinion) 'next step'.

This proposal is to add the concept of sessions to bitswap. The idea expands on an existing optimization we do, where we only find providers for the first block in a GetMany request, and make more FindProviders calls later if nobody is sending us blocks.

When starting a request for a portion of a graph (ipfs refs -r, ipfs pin, ipfs cat, etc) we would create a new session with the hash in question as the session 'root'. Bitswap would then broadcast that ref to every bitswap partner. Any node that responds with the block (even if its a duplicate) gets added to that sessions 'active set'. Further hashes requested that are children of the session root will be broadcasted just to those peers (we can even split up requests amongst peers). If a requested hash isnt received in a given time frame, we start a FindProviders call for that hash, and adds the resultant peers to the sessions 'active set' (or perhaps add them to a 'to try set' and to the 'active set' when a block is received from them).

Doing things this way will allow us to not spam uninterested bitswap peers with our wantlist updates, and also allow us to reduce duplicate blocks by splitting up request amongst peers we know are likely to respond.

Feedback and requests for elaboration will be much appreciated.

@whyrusleeping whyrusleeping added the kind/enhancement A net-new feature or improvement to an existing feature label Mar 15, 2017
@ghost
Copy link

ghost commented Mar 15, 2017

If we already assume that the "active set" peers likely have all children of the wanted root, couldn't we tell them to "consider all children of $hash wanted by me"? This would be a super light wantlist update :)

And in case they don't have all children of the wanted root, we can adjust reactively just like you're saying. If we don't receive any more wanted blocks from the session (all children considered wanted), then we remove it from the active set, and try FindProviders.

@whyrusleeping
Copy link
Member Author

@lgierth That should be possible later with ipld selectors, but for now it would be a breaking change to the protocol (and would make the duplicate blocks problem likely worse)

@btc
Copy link
Contributor

btc commented Apr 25, 2017

Is the duplicate blocks problem the primary issue? Are there other considerations to weigh?

(Writing from an uninformed perspective and curious about the scalability challenges.)

@whyrusleeping
Copy link
Member Author

@btc It depends on the scenario, There are really two main scalability issues that i'm looking to address here. The first is duplicate blocks, which are a pretty bad problem when you are fetching content that multiple peers in the network have. The second problem is that of broadcasting every wantlist update to every connected peer, that bit gets pretty expensive when fetching large items, even from a single other peer.

My proposal (and WIP implementation in #3867 ) reduce both issues, at a small cost to the usecase of fetching a single large graph from a disparate set of peers.

@kevina
Copy link
Contributor

kevina commented May 2, 2017

@whyrusleeping the idea sound solid, did you implement: "split up requests amongst peers"?

@whyrusleeping
Copy link
Member Author

@kevina no, not in my live PR. That bits coming next

@Stebalien
Copy link
Member

Stebalien commented Jul 3, 2017

I think this is a great idea which should reduce the current overhead issue by leaps-and-bounds.

One improvement I'd make would be to gossip more. It should generally be possible to gossip without breaking anything as adding fields to protocol buffers shouldn't break anything (and gossip should always be safe to ignore).

That should be possible later with ipld selectors, but for now it would be a breaking change to the protocol (and would make the duplicate blocks problem likely worse)

One way to (sort-of) do this without sending blocks up-front would be to gossip one of:

  1. An assertion that this peer has none of the children of the sent block.
  2. An assertion that this peer all of the children of the sent block.
  3. A short list of children blocks that the peer has (for space efficiency, we can even just send a bitmask indicating which children we have).

Bitswap would then make sure to ask peers that have already indicated that they have a block before spamming the rest of the peers in the session. This doesn't help pipeline deep and narrow trees as IPLD selectors would but it should help further reduce the number of unnecessary requests.

Another improvement I'd try would be to add in a bit of cross-session gossip. Basically, whenever one talks to a bitswap peer, disregarding sessions, gossip a compressed wantlist (e.g., truncated hashes of the oldest N blocks on our want-list) and the set of blocks we have that we think they want (based on the last compressed wantlist they sent us). If we tune the parameters correctly, this may help ensure we have some potential peers lined up in case we can't find a block within a session (hopefully without introducing too much overhead). However, this could be rather tricky to get right.


Unrelated to gossip, we may also want to (persistently?) cache some of this session information to speed up future requests. For example, given:

   A
 /   \
B     C

If I receive A in a session with peers {S}, I can record that {S} may have B and C and start any sessions for fetching B and/or C with {S} ^ {connected} (skipping the "ask everyone" step). This specific optimization would be really useful when lazily streaming files/directories.

@whyrusleeping
Copy link
Member Author

@Stebalien Thanks for the feedback!

Some thoughts:

on gossip, I really like the idea, though implementing this would be difficult. The issue is that checking 'do i have all the objects in this graph?' isnt cheap, and its hard to cache as it ends up being a lot of data. A simpler tactic (related to your suggestions) would be to simply list the indexes of the links of this object that you also have. you could even extend this to do your third suggestion by doing index paths. This would work because the set of all links for a given node is deterministically ordered.

The compressed wantlist idea is interesting, though sounds like it would get complicated fast. I've already put a significant amount of effort into making sure we keep track of who we tell about wanting what (to help avoid receiving duplicate blocks), spreading the wantlist more generally ups that complexity quite a bit.

@Stebalien
Copy link
Member

The compressed wantlist idea is interesting, though sounds like it would get complicated fast.

I agree.

I've already put a significant amount of effort into making sure we keep track of who we tell about wanting what (to help avoid receiving duplicate blocks), spreading the wantlist more generally ups that complexity quite a bit.

The compressed wantlist idea is about telling peers about what we may want from them, not what we want right now, so that they can gossip what they have do have in case we we want it in the future; we wouldn't expect them to send the actual blocks up-front. Basically, the general idea of all this gossiping is to learn of where to find what we may want in the near future.

That is, we gossip:

  • What we predict we may want from our peers (in the near future): the compressed maybe-wantlist, etc.
  • What we predict each of our peers may want (in the near future): cids of children of blocks on their wantlist/maybe-wantlist, things on their maybe-wantlist that we have, etc.

This way, when we go to fetch a block, we will likely already have a list of candidate peers.

Note: By gossip, I really do mean gossip. This shouldn't trigger any additional background messages, just some extra bytes tacked onto existing messages.

@whyrusleeping whyrusleeping added P0 Critical: Tackled by core team ASAP status/ready Ready to be worked labels Oct 17, 2017
@Stebalien
Copy link
Member

@peteryj
Copy link

peteryj commented Sep 12, 2018

Hi guys, I just want to know if bitswap session cloud solve my problem?

From the git log I'm afraid not. But this issue is opened, so...

https://discuss.ipfs.io/t/big-file-transferring-problem/3819

Thanks!

@rklaehn
Copy link

rklaehn commented Sep 21, 2018

So do I understand this correctly that at the moment, when some piece of data is available on a lot of peers, that actually makes things worse because the poor requester will be flooded with answers from all the providers? This would seem to be a serious problem. Any recent work on this?

@ivan386
Copy link
Contributor

ivan386 commented Sep 21, 2018

@rklaehn Yes. From all connected providers that have block. Work is here: ipfs/go-bitswap#8

@momack2
Copy link
Contributor

momack2 commented Dec 3, 2018

@hannahhoward is this one of the things you were planning to start on next in Bitswap land?

@Stebalien Stebalien removed the status/ready Ready to be worked label Dec 18, 2018
@Stebalien Stebalien added status/deferred Conscious decision to pause or backlog status/in-progress In progress and removed status/deferred Conscious decision to pause or backlog labels Dec 18, 2018
@eingenito
Copy link
Contributor

I think this issue has been addressed by bitswap session improvements or superseded by discussion of GraphSync.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement A net-new feature or improvement to an existing feature P0 Critical: Tackled by core team ASAP status/in-progress In progress
Projects
None yet
Development

No branches or pull requests

9 participants