Skip to content
This repository has been archived by the owner on Aug 28, 2021. It is now read-only.

(NBS) *Prevent* noms sync from actively *hurting* chunk locality #2968

Closed
ghost opened this issue Dec 21, 2016 · 5 comments · Fixed by #2983
Closed

(NBS) *Prevent* noms sync from actively *hurting* chunk locality #2968

ghost opened this issue Dec 21, 2016 · 5 comments · Fixed by #2983

Comments

@ghost
Copy link

ghost commented Dec 21, 2016

#2880 focused on offering guarantees about good chunk locality (related chunks being stored near each other in NBS), but it's now clear that two mechanisms in noms are actively working against this goal: (1) writing over http and (2) sync.

While we work on a design for retaining and enforcing locality throughout noms (including over http & sync), I think there's some low hanging fruit which would be compatible with the existing wire protocol, and that is to just start with preventing writing over http from destroying locality.

If we can do this, at least we will accomplish the common case of writing to a remote server the first time having decent locality.

Right now writing to a remote server involves temporarily storing chunks in a ldb instances and then shipping them all over on commit. The trouble is that what the code currently does is store them in ldb ordered by ascending ref height and then ascending hash. This has the effect of essentially shuffling all chunks of a given height on their way over to the server.

So here's my idea: just for the case of writing to a remote noms server, simply buffer out chunks in the order they were written (e.g. to a file), and then send them up to the server in exactly that order. The in-process validation will already retain the one required wire invariant: that no chunk shall be written before the server has seen all referenced chunks.

Implementation idea for this: essentially use a local nbs store rather than ldb. As we are writing keep a in-memory list of hashes written. On commit, just read them out in that order (which will be ideal read-order for nbs) as we ship them over to the server.

If this works, I think something slightly more complicated, but similar could work for the case of sync.

cc @cmasone-attic

@ghost ghost added NBS P0 Perf labels Dec 21, 2016
@cmasone-attic
Copy link
Contributor

hmmm...I think this works, yes. There's a nagging concern about this in-memory list of hashes blowing up RAM, but it's probably OK to TODO that for now, yes?

@ghost
Copy link
Author

ghost commented Dec 21, 2016

Yup. I think todo is fine. Also, it might be literally just as much work to add a feature to nbs to just iterate all chunks in "store" (insertion) order

@ghost
Copy link
Author

ghost commented Dec 21, 2016

Now that I think about this more, it might actually have the happy side effect of improving the perf of writing over http (even if the server is backed by ldb) because the tmp ldb store is currently doing a ton of pointless compactions as we write.

@ghost ghost closed this as completed Dec 21, 2016
@ghost ghost reopened this Dec 21, 2016
@cmasone-attic cmasone-attic self-assigned this Dec 21, 2016
cmasone-attic added a commit to cmasone-attic/noms that referenced this issue Dec 23, 2016
NBS benefits from related chunks being near one another. Initially,
let's use write-order as a proxy for "related".

This patch contains a pretty heinous hack to allow sync to continue
putting chunks into httpBatchStore top-down without breaking
server-side validation. Work to fix this is tracked in attic-labs#2982

This patch fixes attic-labs#2968, at least for now
cmasone-attic added a commit to cmasone-attic/noms that referenced this issue Dec 23, 2016
NBS benefits from related chunks being near one another. Initially,
let's use write-order as a proxy for "related".

This patch contains a pretty heinous hack to allow sync to continue
putting chunks into httpBatchStore top-down without breaking
server-side validation. Work to fix this is tracked in attic-labs#2982

This patch fixes attic-labs#2968, at least for now
cmasone-attic added a commit that referenced this issue Dec 23, 2016
…2983)

NBS benefits from related chunks being near one another. Initially,
let's use write-order as a proxy for "related".

This patch contains a pretty heinous hack to allow sync to continue
putting chunks into httpBatchStore top-down without breaking
server-side validation. Work to fix this is tracked in #2982

This patch fixes #2968, at least for now

* Introduces PullWithFlush() to allow noms sync to explicitly
pull chunks over and flush directly after. This allows UpdateRoot
to behave as before.

Also clears out all the legacy batch-put machinery. Now, Flush()
just directly calls sendWriteRequests().
@ghost ghost reopened this Dec 23, 2016
@ghost ghost changed the title (NBS) *Prevent* noms from actively *hurting* chunk locality (NBS) *Prevent* noms sync from actively *hurting* chunk locality Dec 23, 2016
@ghost
Copy link
Author

ghost commented Dec 23, 2016

Re-opened and modified the bug title to reflect what's left of this. I think sync still effectively randomizes chunks within each height level.

@ghost
Copy link
Author

ghost commented Jan 21, 2017

Here's a half an idea for this. Sync progressively writes chunks of lower & lower heights. That means as soon as a chunk of a given height X seen, we can learn nothing more about the locality of chunks of greater height. Thus, it seems reasonable to

-Keep of buffer of incoming chunks
-Record the lowest height seen
-When a chunk of height < lowest is encountered, flush all buffered.

This doesn't explain how we organize buffered yet, but it's a start.

@ghost ghost mentioned this issue Feb 11, 2017
@ghost ghost added the Sync label Feb 11, 2017
cmasone-attic added a commit to cmasone-attic/noms that referenced this issue Aug 31, 2017
The new version of this tool now estimates the locality of a DB
written using the "grandchild" strategy implemented by
types.ValueStore. It does do by dividing each level of the graph
up into groups that are roughly the size of the branching factor
of that level, and then calculating how many physical reads are
needed to read each group.

In the case of perfect locality, each group could be read in a
single physical read, so that's what the tool uses as its estimate
of the optimal case.

Toward attic-labs#2968
cmasone-attic added a commit to cmasone-attic/noms that referenced this issue Aug 31, 2017
The new version of this tool now estimates the locality of a DB
written using the "grandchild" strategy implemented by
types.ValueStore. It does do by dividing each level of the graph
up into groups that are roughly the size of the branching factor
of that level, and then calculating how many physical reads are
needed to read each group.

In the case of perfect locality, each group could be read in a
single physical read, so that's what the tool uses as its estimate
of the optimal case.

Toward attic-labs#2968
cmasone-attic added a commit to cmasone-attic/noms that referenced this issue Aug 31, 2017
The new version of this tool now estimates the locality of a DB
written using the "grandchild" strategy implemented by
types.ValueStore. It does do by dividing each level of the graph
up into groups that are roughly the size of the branching factor
of that level, and then calculating how many physical reads are
needed to read each group.

In the case of perfect locality, each group could be read in a
single physical read, so that's what the tool uses as its estimate
of the optimal case.

Toward attic-labs#2968
cmasone-attic added a commit to cmasone-attic/noms that referenced this issue Aug 31, 2017
This patch implements a new strategy for Pull() that pulls the chunks
from a given level of the graph over in the order they'll be
encountered by clients reading the graph.

Fixes attic-labs#2968
cmasone-attic added a commit that referenced this issue Sep 11, 2017
The new version of this tool now estimates the locality of a DB
written using the "grandchild" strategy implemented by
types.ValueStore. It does do by dividing each level of the graph
up into groups that are roughly the size of the branching factor
of that level, and then calculating how many physical reads are
needed to read each group.

In the case of perfect locality, each group could be read in a
single physical read, so that's what the tool uses as its estimate
of the optimal case.

Toward #2968
cmasone-attic added a commit that referenced this issue Sep 11, 2017
This patch implements a new strategy for Pull() that pulls the chunks
from a given level of the graph over in the order they'll be
encountered by clients reading the graph.

Fixes #2968
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant