Idea: Simplify Sync #3384

ghost · 2017-04-13T04:48:08Z

In light of #3177, I was thinking about the basic approach to syncing, which has worked reasonably well but was designed with some assumptions which aren't a true any more.

The general form of the sync algorithm is:

for len(refsToSync) > 0 {
  for r in refsToSync {
    if !sink.has(r) {
      c := src.get(r)
      sink.put(r)
      r.walkRefs((sr) refsToSync.Insert(sr))
    }
  }
}

However, there's a heuristic optimization that essentially tries to reduce the number of has checks which are performed. That optimization tries to "walk" the "old" value more or less in parallel with the "new" one, using height to keep things in sync.

The optimization made sense in a world where each Has() check is basically a full block read. In the simple case of updating the spine of a deep/wide prolly tree, the trade off is reading Log(N) blocks from "sink" instead of doing Log(N)*~200 Has() checks, or something like 200x more reads.

However, with NBS, bulk Has() checks are dramatically less expensive and basically never hit disk. Even in the case of a very large index, the suffix index is generally in locality order, so there's a good chance that the same N refs contained in the block that would have been read, can be read from the index on disk in a single block.

I think this means that for "pulling" (the sink is local), the chunk-sync heuristic optimization can't ever really be a win any more.

In the case of a "push" (the sink is remote), it's a bit fuzzier because the bulk has need to go across the wire and in the case of a thin pipe, it might be preferable to do N local reads rather than 200N has because the number of hashes that need to be sent might be large.

One option to address this is this: when we implement batched writing, you can imaging the writeValue endpoint returning the set of dangling refs which aren't already contained in the target database after the write value. Given this, "push" can just writeValue each "level" and field the set of dangling pointers which are returned and write them in the next writeValue. This would effectively achieve the thing we observed in the past: that the kind of ideal thing for "push" is to ask the other side to "pull".

An additional benefit of changing things to work this way is it gets rid of the writing pattern for sync of writing all chunks of a given height in lockstep. The problem with this is it doesn't generate the locality that we'd like. The "correct" locality arises out of doing a BFS from the root value. Removing the height-order processing would make it correct and straight forward to have sync basically be:

for len(refsToWrite) > 0 {
  chunks := source.GetMany(refsToWrite)
  refsToWrite = sink.PutMany(chunks) // returns unresolved refs
}

The text was updated successfully, but these errors were encountered:

ghost · 2017-04-13T04:48:22Z

cc @cmasone-attic

In an NBS world, bulk 'has' checks are waaaay cheaper than they used to be. In light of this, we can toss out the complex logic we were using in Pull() -- which basically existed for no reason other than to avoid doing 'has' checks. Now, the code basically just descends down a tree of chunks breadth-first, using HasMany() at each level to figure out which chunks are not yet in the sink all at once, and GetMany() to pull them from the source in bulk. Fixes attic-labs#3182, Towards attic-labs#3384

In an NBS world, bulk 'has' checks are waaaay cheaper than they used to be. In light of this, we can toss out the complex logic we were using in Pull() -- which basically existed for no reason other than to avoid doing 'has' checks. Now, the code basically just descends down a tree of chunks breadth-first, using HasMany() at each level to figure out which chunks are not yet in the sink all at once, and GetMany() to pull them from the source in bulk. Fixes #3182, Towards #3384

ghost · 2017-05-30T20:10:40Z

This is done \o/

ghost added P0 Sync labels Apr 13, 2017

cmasone-attic mentioned this issue May 22, 2017

Simplify Pull() #3490

Merged

ghost closed this as completed May 30, 2017

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Idea: Simplify Sync #3384

Idea: Simplify Sync #3384

ghost commented Apr 13, 2017 •

edited by ghost

Loading

ghost commented Apr 13, 2017

ghost commented May 30, 2017

Idea: Simplify Sync #3384

Idea: Simplify Sync #3384

Comments

ghost commented Apr 13, 2017 • edited by ghost Loading

ghost commented Apr 13, 2017

ghost commented May 30, 2017

ghost commented Apr 13, 2017 •

edited by ghost

Loading