This repository has been archived by the owner on Aug 28, 2021. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 266
Idea: Simplify Sync #3384
Comments
cmasone-attic
added a commit
to cmasone-attic/noms
that referenced
this issue
May 22, 2017
In an NBS world, bulk 'has' checks are waaaay cheaper than they used to be. In light of this, we can toss out the complex logic we were using in Pull() -- which basically existed for no reason other than to avoid doing 'has' checks. Now, the code basically just descends down a tree of chunks breadth-first, using HasMany() at each level to figure out which chunks are not yet in the sink all at once, and GetMany() to pull them from the source in bulk. Fixes attic-labs#3182, Towards attic-labs#3384
Merged
cmasone-attic
added a commit
to cmasone-attic/noms
that referenced
this issue
May 22, 2017
In an NBS world, bulk 'has' checks are waaaay cheaper than they used to be. In light of this, we can toss out the complex logic we were using in Pull() -- which basically existed for no reason other than to avoid doing 'has' checks. Now, the code basically just descends down a tree of chunks breadth-first, using HasMany() at each level to figure out which chunks are not yet in the sink all at once, and GetMany() to pull them from the source in bulk. Fixes attic-labs#3182, Towards attic-labs#3384
cmasone-attic
added a commit
that referenced
this issue
May 22, 2017
In an NBS world, bulk 'has' checks are waaaay cheaper than they used to be. In light of this, we can toss out the complex logic we were using in Pull() -- which basically existed for no reason other than to avoid doing 'has' checks. Now, the code basically just descends down a tree of chunks breadth-first, using HasMany() at each level to figure out which chunks are not yet in the sink all at once, and GetMany() to pull them from the source in bulk. Fixes #3182, Towards #3384
This is done \o/ |
This issue was closed.
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
In light of #3177, I was thinking about the basic approach to syncing, which has worked reasonably well but was designed with some assumptions which aren't a true any more.
The general form of the sync algorithm is:
However, there's a heuristic optimization that essentially tries to reduce the number of
has
checks which are performed. That optimization tries to "walk" the "old" value more or less in parallel with the "new" one, using height to keep things in sync.The optimization made sense in a world where each Has() check is basically a full block read. In the simple case of updating the spine of a deep/wide prolly tree, the trade off is reading Log(N) blocks from "sink" instead of doing Log(N)*~200 Has() checks, or something like 200x more reads.
However, with NBS, bulk Has() checks are dramatically less expensive and basically never hit disk. Even in the case of a very large index, the suffix index is generally in locality order, so there's a good chance that the same N refs contained in the block that would have been read, can be read from the index on disk in a single block.
I think this means that for "pulling" (the sink is local), the chunk-sync heuristic optimization can't ever really be a win any more.
In the case of a "push" (the sink is remote), it's a bit fuzzier because the bulk
has
need to go across the wire and in the case of a thin pipe, it might be preferable to do N local reads rather than 200Nhas
because the number of hashes that need to be sent might be large.One option to address this is this: when we implement batched writing, you can imaging the
writeValue
endpoint returning the set of dangling refs which aren't already contained in the target database after the write value. Given this, "push" can justwriteValue
each "level" and field the set of dangling pointers which are returned and write them in the nextwriteValue
. This would effectively achieve the thing we observed in the past: that the kind of ideal thing for "push" is to ask the other side to "pull".An additional benefit of changing things to work this way is it gets rid of the writing pattern for sync of writing all chunks of a given height in lockstep. The problem with this is it doesn't generate the locality that we'd like. The "correct" locality arises out of doing a BFS from the root value. Removing the height-order processing would make it correct and straight forward to have sync basically be:
The text was updated successfully, but these errors were encountered: