Modify httpBatchStore so that writing values maintains some locality #2983

cmasone-attic · 2016-12-23T00:30:48Z

NBS benefits from related chunks being near one another. Initially,
let's use write-order as a proxy for "related".

This patch contains a pretty heinous hack to allow sync to continue
putting chunks into httpBatchStore top-down without breaking
server-side validation. Work to fix this is tracked in #2982

This patch fixes #2968, at least for now

NBS benefits from related chunks being near one another. Initially, let's use write-order as a proxy for "related". This patch contains a pretty heinous hack to allow sync to continue putting chunks into httpBatchStore top-down without breaking server-side validation. Work to fix this is tracked in attic-labs#2982 This patch fixes attic-labs#2968, at least for now

cmasone-attic · 2016-12-23T00:46:23Z

PTAL

Now that I've gone all the way through the patch, I'm not sure that having NomsBlockCache be distinct from NomsBlockStore makes a lot of sense. On one hand, it's a nice place to put the code that creates and deletes the temp directory that the cache data is in, and it means that ExtractChunks is not part of the NomsBlockStore public API. On the other hand...it's a pretty thin wrapper, and the guts of extractChunks really needs to be inside NomsBlockStore so it can get at nbs.tables and the mutex that protects it.

I'm trying a little bit of further cleanup, but I broke something while doing so. I'll continue trying to chase that down, but please go ahead and review

ghost

BTU.

In general, I think this is good. If you want to address some of these issues subsequently, I'm ok with that.

I think there's the opportunity to simplify things here quite a bit.

Unless I'm missing something, I don't think the Insert-return-whether inserted semantics are needed any more since we only ever send one write batch.

Also, it's a minor bummer that local sync (i.e. sync) still uses the ordered_put_cache.

I guess my temptation would be to go ahead and just replace the ordered_put_cache implementation with what is here called the NomsBlockCache, which just uses the normal NBS apis (the extract functions would need to become public).

ghost · 2016-12-23T13:10:27Z

go/nbs/table_set.go

+		}
+		return
+	}
+	for i := len(ts.chunkSources) - 1; i >= 0; i-- {


suggestion. i find reverse iteration eaiser to visual parse as

count := len(thing) for count-- > 0 { // do something at index |count| }

so would I!

Go, however, apparently doesn't. Increment and decrement aren't expressions in Go (statements? whatever the right term is. You can't use them inline like that :-/)

ghost · 2016-12-23T13:15:11Z

go/nbs/table_reader.go

+	d.Chk.True(uint64(n) == chunkLen)
+
+	sendChunk := func(i uint32) {
+		localOffset := tr.offsets[i] - tr.offsets[0]


this is an interesting point. i think we have code elsewhere which assumes that chunk 0 is at offset zero in the file. this is kind of a nice defensive move. should we look for other places are calculating physical offset in the table?

Opened #2986

ghost · 2016-12-23T13:17:28Z

go/nbs/table.go

-   | Prefix Map | Sizes | Suffixes |
-   +------------+-------+----------+
+   +------------+---------+----------+
+   | Prefix Map | Lengths | Suffixes |


ghost · 2016-12-23T13:20:55Z

go/nbs/mem_table.go

@@ -79,6 +79,20 @@ func (mt *memTable) getMany(reqs []getRecord, wg *sync.WaitGroup) (remaining boo
 	return
 }

+func (mt *memTable) extract(order EnumerationOrder, chunks chan<- extractRecord) {
+	sort.Sort(hasRecordByOrder(mt.order)) // ensure "insertion" order for extraction


this is a little surprising. how could mt.order not be in insertion order on invocation?

I think it will be in insertion order, but we don't document that as a requirement, so I was programming defensively. I can just require that that be an invariant

ghost · 2016-12-23T13:55:01Z

go/nbs/cache.go

+	dbDir  string
+}
+
+// Insert stores c in the cache. If c is successfully added to the cache,


I don't actually think these semantics are provided. addChunk returns true if the chunk was added or already contained to the current memTable, but doesn't tell you anything about whether the chunk was contained in the store.

you're right. Maybe I don't actually need these semantics if I can get rid of all the goroutine machinery

ghost · 2016-12-23T14:01:23Z

go/datas/http_batch_store.go

 		return
 	}

 	bhcs.requestWg.Add(1)
-	bhcs.writeQueue <- writeRequest{c.Hash(), hints, false}
+	bhcs.writeQueue <- writeRequest{hints, false}


I'm wondering whether the machinery around the write queue (the channel, the persistent goroutine that are servicing the write-queue and the flush queue) are just unneeded complexity now. this stuff is a hold over from the before time when we didn't have any validation and thus were writing in larger batches, but we haven't done that in a while.

i think it's possible now to just have SchedulePut Put into the nbs (not care about whether the chunk was contained), and add its hints to the hint set, all inside the mutex, and have Flush just send everything over.

i'm probably forgetting about some critical thing, right?

No, I think this will work. As long as AddHints() uses that mutex also, I think it will work.

ghost · 2016-12-23T14:11:05Z

go/datas/http_batch_store.go

@@ -448,8 +470,7 @@ func (bhcs *httpBatchStore) Root() hash.Hash {
 // UpdateRoot flushes outstanding writes to the backing ChunkStore before updating its Root, because it's almost certainly the case that the caller wants to point that root at some recently-Put Chunk.
 func (bhcs *httpBatchStore) UpdateRoot(current, last hash.Hash) bool {
 	// POST http://<host>/root?current=<ref>&last=<ref>. Response will be 200 on success, 409 if current is outdated.
-	bhcs.Flush()
-
+	bhcs.flushInternal(nbs.InsertOrder)


So is it the case that the sync user always sends over chunks by calling Flush and the simple write user always calls UpdateRoot()?

If so, I guess I'm ok with this, but it is a bit mysterious. If so, let's comment it somewhere and maybe put more detail in the bug you opened that the specific problem is that the API of http_batch_store is now kind of wonky.

As an alternative, maybe BatchStore gets a temporary method called "MarkFlushReverseOrder", or something that only sync calls before it starts writing top-down?

This is the thing I was working on changing after I sent this patch, because I also don't like that this relies on making noms sync only thing that calls BatchStore.Flush(). This patch made that so, but there's no guarantee at all that it would remain that way.

Something mysteriously broke when I changed things that way, so I'll need to chase that down

codecov-io · 2016-12-23T17:22:48Z

Diff Coverage	File Path
0%	new go/nbs/cache.go
0%	go/nbs/store.go
•••••••• 80%	go/nbs/compacting_chunk_source.go
••••••••• 96%	go/datas/http_batch_store.go
•••••••••• 100%	go/nbs/table_set.go
•••••••••• 100%	go/datas/remote_database_client.go
•••••••••• 100%	go/nbs/table_reader.go
•••••••••• 100%	go/nbs/table.go
•••••••••• 100%	go/types/batch_store.go
•••••••••• 100%	go/nbs/mem_table.go

Powered by Codecov. Last update ed1e386...0b3e3a8

ghost

Nice. LGTM

ghost · 2016-12-23T18:38:20Z

go/nbs/table_set.go

+		}
+		return
+	}
+	for i := len(ts.chunkSources) - 1; i >= 0; i-- {


pull chunks over and flush directly after. This allows UpdateRoot to behave as before. Also clears out all the legacy batch-put machinery. Now, Flush() just directly calls sendWriteRequests().

cmasone-attic · 2016-12-23T19:32:14Z

Actually, can you look again? I've squashed so that the second commit includes all the changes since yesterday, which includes getting rid of the old batch-put machinery, that flushInternal() hack, and some other things.

ghost · 2016-12-23T19:41:21Z

Still LG

cmasone-attic assigned ghost Dec 23, 2016

ghost reviewed Dec 23, 2016

View reviewed changes

ghost approved these changes Dec 23, 2016

View reviewed changes

go/nbs/table_set.go

}

return

}

for i := len(ts.chunkSources) - 1; i >= 0; i-- {

Copy link

ghost Dec 23, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:-)

cmasone-attic force-pushed the issue2968 branch 2 times, most recently from c8b9c58 to c076224 Compare December 23, 2016 19:27

Introduces PullWithFlush() to allow noms sync to explicitly

88712ee

pull chunks over and flush directly after. This allows UpdateRoot to behave as before. Also clears out all the legacy batch-put machinery. Now, Flush() just directly calls sendWriteRequests().

cmasone-attic force-pushed the issue2968 branch from c076224 to 88712ee Compare December 23, 2016 19:31

cmasone-attic merged commit 22d8e17 into attic-labs:master Dec 23, 2016

cmasone-attic deleted the issue2968 branch January 3, 2017 00:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modify httpBatchStore so that writing values maintains some locality #2983

Modify httpBatchStore so that writing values maintains some locality #2983

cmasone-attic commented Dec 23, 2016

cmasone-attic commented Dec 23, 2016

ghost left a comment

ghost Dec 23, 2016

cmasone-attic Dec 23, 2016

ghost Dec 23, 2016

ghost Dec 23, 2016

cmasone-attic Dec 23, 2016

ghost Dec 23, 2016

ghost Dec 23, 2016

cmasone-attic Dec 23, 2016

ghost Dec 23, 2016

cmasone-attic Dec 23, 2016

ghost Dec 23, 2016

cmasone-attic Dec 23, 2016

ghost Dec 23, 2016

cmasone-attic Dec 23, 2016

codecov-io commented Dec 23, 2016

ghost left a comment

ghost Dec 23, 2016

cmasone-attic commented Dec 23, 2016

ghost commented Dec 23, 2016

Modify httpBatchStore so that writing values maintains some locality #2983

Modify httpBatchStore so that writing values maintains some locality #2983

Conversation

cmasone-attic commented Dec 23, 2016

cmasone-attic commented Dec 23, 2016

ghost left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-io commented Dec 23, 2016

ghost left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cmasone-attic commented Dec 23, 2016

ghost commented Dec 23, 2016