Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use bulk.SSTWriter to write SSTs on the receiver side #39802

Closed
wants to merge 3 commits into from

Conversation

jeffrey-xiao
Copy link
Contributor

@jeffrey-xiao jeffrey-xiao commented Aug 21, 2019

The first commit implements the engine.Writer interface for bulk.SSTWriter and also adapts it to accept an writeCloseSyncer. This change allows users to specify if they want to write directly to the disk or to an in-memory buffer.

The second commit creates a new package in engine called sst that stores sst.Iterator and sst.Writer.

The third commit uses bulk.SSTWriter to write SSTs when receiving snapshots.

Using the same recovery evaluation metric in #38932, I found that there was a nice win in terms of recovery time for an average range size of 128 MiB.

name         old s/op   new s/op   delta
AvgBytes121  38.8 ±19%  32.6 ± 7%  -16.03%  (p=0.011 n=8+5)

This change should be beneficial across the board since we are avoiding the cost of a Cgo call per Put. We are also reducing the memory cost since we are eliminating the in-memory buffer when creating the SST chunks incrementally.

Depends on petermattis/pebble#240.

Fixes #39748.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@jeffrey-xiao jeffrey-xiao changed the title [DNM[ Use bulk.SSTWriter to write SSTs on the receiver side [DNM] Use bulk.SSTWriter to write SSTs on the receiver side Aug 21, 2019
@jeffrey-xiao jeffrey-xiao requested a review from dt August 21, 2019 18:54
@jeffrey-xiao jeffrey-xiao force-pushed the sst-writer branch 7 times, most recently from 4adb77a to f96efa2 Compare August 22, 2019 19:58
Copy link
Member

@dt dt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I communicated a couple high-level suggestions offline (making a new sst package instead of adding more to the engine package and a couple naming nits), but general approach LGTM (for the bulk parts -- I didn't look at the core parts)

@jeffrey-xiao jeffrey-xiao force-pushed the sst-writer branch 3 times, most recently from 8841696 to e3e048e Compare August 26, 2019 04:26
@jeffrey-xiao jeffrey-xiao changed the title [DNM] Use bulk.SSTWriter to write SSTs on the receiver side Use bulk.SSTWriter to write SSTs on the receiver side Aug 26, 2019
@jeffrey-xiao jeffrey-xiao force-pushed the sst-writer branch 2 times, most recently from 011c511 to 8a5396e Compare August 26, 2019 16:10
@jeffrey-xiao
Copy link
Contributor Author

Rebased and implemented @dt's suggestions. RFAL.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! Thanks @jeffrey-xiao. It's worth mentioning that another benefit of this change is that you avoid an extra memcpy during the SST writing process because there's no longer an intermediate buffer.

Reviewed 1 of 3 files at r1, 2 of 2 files at r2, 12 of 12 files at r3, 2 of 3 files at r4.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jeffrey-xiao and @nvanbenschoten)


pkg/storage/store_snapshot.go, line 133 at r4 (raw file):

}

type sstFile struct {

Explain what the purpose of this type is. Something about it being an adapter to allow SSTSnapshotStorageFile to implement


pkg/storage/store_snapshot.go, line 157 at r4 (raw file):

}

func (ssf *sstFile) Sync() error {

I'm probably missing it, but when is this called? Also, how are we ensuring that the file is synced before closing?


pkg/storage/store_snapshot.go, line 273 at r4 (raw file):

	defer func() {
		// Closing the same SST multiple times is idempotent.
		msstw.Close()

defer msstw.Close()


pkg/storage/bulk/sst_writer.go, line 157 at r2 (raw file):

}

// ApplyBatchRepr implements the Writer interface.

Here and below: engine.Writer interface, because we're not in the engine package.


pkg/storage/bulk/sst_writer.go, line 189 at r2 (raw file):

	}
	fw.DataSize += int64(len(start.Key)) + int64(len(end.Key))
	fw.scratch1 = engine.EncodeKeyToBuf(fw.scratch1[:0], start)

You might as well just use the same scratch buffer instead of introducing a new one:

fw.scratch = engine.EncodeKeyToBuf(fw.scratch[:0], start)
startScratch := fw.scratch
fw.scratch = engine.EncodeKeyToBuf(fw.scratch, end)
endScratch := fw.scratch[len(startScratch):]
return fw.fw.DeleteRange(startScratch, endScratch)

pkg/storage/engine/sst/writer.go, line 29 at r3 (raw file):

)

// MakeIntTableKVs returns MVCC key values for a number of keys with given

Does this need to be part of the package's external interface? What about the next two functions? It looks like they came from a test file, so I don't see a reason why they need to be exported or even included in this file instead of writer_test.go.


pkg/storage/engine/sst/writer.go, line 93 at r4 (raw file):

}

type writeCloseSyncer interface {

I'm a little surprised govet isn't complaining that this isn't exported even though it's part of the external interface of the package. Either way, you'll want to export it.

Copy link
Member

@dt dt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jeffrey-xiao and @nvanbenschoten)


pkg/storage/engine/sst/writer.go, line 29 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Does this need to be part of the package's external interface? What about the next two functions? It looks like they came from a test file, so I don't see a reason why they need to be exported or even included in this file instead of writer_test.go.

they're reused bulk's tests as well. I get that _test files aren't linked in non-test, but how I wish go would let other test binaries link them. I think they could get Testing added somewhere in their names though.


pkg/storage/engine/sst/writer.go, line 93 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I'm a little surprised govet isn't complaining that this isn't exported even though it's part of the external interface of the package. Either way, you'll want to export it.

This us just re-stating an un-exported interface that is used in the Pebble public API. I think maybe govet lets you get away with it for interfaces since they're structurally instead of nominatively typed anyway?

Copy link
Contributor Author

@jeffrey-xiao jeffrey-xiao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/storage/store_snapshot.go, line 157 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I'm probably missing it, but when is this called? Also, how are we ensuring that the file is synced before closing?

This is passed into Pebble's sstable.Writer and it's called when the sstable is closed. I just realized we would need logic to call Sync when a "chunk" has been written. I've added this to multiSSTWriter.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 15 of 15 files at r5, 12 of 12 files at r6, 2 of 3 files at r7.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/storage/store_snapshot.go, line 157 at r4 (raw file):

This is passed into Pebble's sstable.Writer and it's called when the sstable is closed.

Where?

Also, I worry about the performance of calling limitBulkIOWrite on every single key write now. Do you think it would make sense to move that to where we call Sync instead?


pkg/storage/bulk/sst_writer.go, line 189 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

You might as well just use the same scratch buffer instead of introducing a new one:

fw.scratch = engine.EncodeKeyToBuf(fw.scratch[:0], start)
startScratch := fw.scratch
fw.scratch = engine.EncodeKeyToBuf(fw.scratch, end)
endScratch := fw.scratch[len(startScratch):]
return fw.fw.DeleteRange(startScratch, endScratch)

This isn't quite what we want. Since we're passing fw.scratch[len(startScratch):] to EncodeKeyToBuf, we're going to throw away part of the buffer if it needs to be resized. Instead, we want to append to the entire thing and then slice it up later.


pkg/storage/engine/sst/writer.go, line 29 at r3 (raw file):

Previously, dt (David Taylor) wrote…

they're reused bulk's tests as well. I get that _test files aren't linked in non-test, but how I wish go would let other test binaries link them. I think they could get Testing added somewhere in their names though.

Well at a minimum we should also move these to the bottom of this file. I'd lean towards moving them into their own writer_testing.go file.

Copy link
Contributor Author

@jeffrey-xiao jeffrey-xiao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/storage/store_snapshot.go, line 157 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This is passed into Pebble's sstable.Writer and it's called when the sstable is closed.

Where?

Also, I worry about the performance of calling limitBulkIOWrite on every single key write now. Do you think it would make sense to move that to where we call Sync instead?

The sync is called here: https://github.com/petermattis/pebble/blob/53f7531cb726299eb33b5849ae6935501f357442/sstable/writer.go#L649-L652

Makes sense to me. SSTSnapshotFile would have to keep track of the bytes written before a Sync in that case. Does that sound right?


pkg/storage/bulk/sst_writer.go, line 189 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This isn't quite what we want. Since we're passing fw.scratch[len(startScratch):] to EncodeKeyToBuf, we're going to throw away part of the buffer if it needs to be resized. Instead, we want to append to the entire thing and then slice it up later.

Hmm, I'm not sure I fully understand. If we pass fw.scratch to engine.EncodeKeyToBuf when encoding the end key, it's overwriting the buffer instead of append to it. Am I missing something?


pkg/storage/engine/sst/writer.go, line 29 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Well at a minimum we should also move these to the bottom of this file. I'd lean towards moving them into their own writer_testing.go file.

Ack. Will move them to writer_testing.go when I make the other changes.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about letting this sit @jeffrey-xiao. We're ramping up for the next release and it's all hands on deck to stabilize things until then.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/storage/store_snapshot.go, line 157 at r4 (raw file):

The sync is called here: https://github.com/petermattis/pebble/blob/53f7531cb726299eb33b5849ae6935501f357442/sstable/writer.go#L649-L652

Ok, so it's the pebble sst writer itself that calls sync. That's worth a comment right above if err := fw.fw.Close(); err != nil {.

SSTSnapshotFile would have to keep track of the bytes written before a Sync in that case. Does that sound right?

You mean multiSSTWriter or sstFile? I think it would be most natural to switch the state around to be bytesSinceLastSync and then to maintain this on sstFile. You would increment it in sstFile.Write and clear it in sstFile.Sync. This state is a property of a specific file, not themultiSSTWriter, so it's strange that it lives there right now.

Does that make sense?


pkg/storage/bulk/sst_writer.go, line 189 at r2 (raw file):

Previously, jeffrey-xiao (Jeffrey Xiao) wrote…

Hmm, I'm not sure I fully understand. If we pass fw.scratch to engine.EncodeKeyToBuf when encoding the end key, it's overwriting the buffer instead of append to it. Am I missing something?

Oh, I see. It's idiomatic for this style of function to append instead of overwrite/re-allocate for exactly this reason. Now we don't know whether the resulting slice is shorter or longer than the original one, so it's not clear whether we want to replace fw.scratch or not. It looks like this was already causing confusion because the existing caller to this was passing fw.scratch[:0], which isn't necessary because the length is ignored and only the capacity is consulted.

If you're feeling inspired, I think the right thing to do here would be to have EncodeKeyToBuf append the key to the provided buffer instead of doing what it's doing now.


pkg/storage/engine/sst/writer.go, line 29 at r3 (raw file):

Previously, jeffrey-xiao (Jeffrey Xiao) wrote…

Ack. Will move them to writer_testing.go when I make the other changes.

👍

Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/storage/store_snapshot.go, line 157 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

The sync is called here: https://github.com/petermattis/pebble/blob/53f7531cb726299eb33b5849ae6935501f357442/sstable/writer.go#L649-L652

Ok, so it's the pebble sst writer itself that calls sync. That's worth a comment right above if err := fw.fw.Close(); err != nil {.

SSTSnapshotFile would have to keep track of the bytes written before a Sync in that case. Does that sound right?

You mean multiSSTWriter or sstFile? I think it would be most natural to switch the state around to be bytesSinceLastSync and then to maintain this on sstFile. You would increment it in sstFile.Write and clear it in sstFile.Sync. This state is a property of a specific file, not themultiSSTWriter, so it's strange that it lives there right now.

Does that make sense?

Also take a look at vfs.NewSyncingFile which syncs periodically as a file is being written. Writing 64 MB of data and then calling Sync() can cause large latency spikes for other concurrent IO operations. We configure RocksDB and Pebble to sync every 512KB.

@jeffrey-xiao jeffrey-xiao force-pushed the sst-writer branch 3 times, most recently from 54c4f70 to 5a80d7b Compare September 23, 2019 20:28
This commit moves SSTWriter and sstIterator to a new package engine.sst.

Release note: None
This change should be beneficial across the board since we are avoiding
the cost of a Cgo call per Put. We are also reducing the memory cost
since we are eliminating the in-memory buffer when creating the SST
chunks incrementally.

Release note: None
craig bot pushed a commit that referenced this pull request Nov 11, 2019
42252: storage: Move SSTWriter to engine, use it instead of Rocks-based writer r=itsbilal a=itsbilal

Currently, we use `RocksDBSstFileWriter` for writing SSTs when receiving cockroach snapshots. This incurs some extra performance overhead with cgo crossings that we can avoid by using the `sstable.Writer` in pebble.

This PR moves a wrapper for the Pebble-based writer that currently resides in `bulk` to `engine`, then adds methods to it so it could be used as a drop-in replacement (almost) for the RocksDB writer. In the second commit, all non-test uses of `RocksDBSstFileWriter` are moved to `SSTWriter`.

Parts taken from #39802 , and fixes #41657.

Co-authored-by: Bilal Akhtar <bilal@cockroachlabs.com>
@tbg tbg added the X-noremind Bots won't notify about PRs with X-noremind label May 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
X-noremind Bots won't notify about PRs with X-noremind
Projects
None yet
Development

Successfully merging this pull request may close these issues.

storage: use pebble sst builder to write ssts for snapshots
6 participants