-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use bulk.SSTWriter to write SSTs on the receiver side #39802
Conversation
4adb77a
to
f96efa2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I communicated a couple high-level suggestions offline (making a new sst
package instead of adding more to the engine
package and a couple naming nits), but general approach LGTM (for the bulk
parts -- I didn't look at the core parts)
8841696
to
e3e048e
Compare
011c511
to
8a5396e
Compare
Rebased and implemented @dt's suggestions. RFAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! Thanks @jeffrey-xiao. It's worth mentioning that another benefit of this change is that you avoid an extra memcpy during the SST writing process because there's no longer an intermediate buffer.
Reviewed 1 of 3 files at r1, 2 of 2 files at r2, 12 of 12 files at r3, 2 of 3 files at r4.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jeffrey-xiao and @nvanbenschoten)
pkg/storage/store_snapshot.go, line 133 at r4 (raw file):
} type sstFile struct {
Explain what the purpose of this type is. Something about it being an adapter to allow SSTSnapshotStorageFile
to implement
pkg/storage/store_snapshot.go, line 157 at r4 (raw file):
} func (ssf *sstFile) Sync() error {
I'm probably missing it, but when is this called? Also, how are we ensuring that the file is synced before closing?
pkg/storage/store_snapshot.go, line 273 at r4 (raw file):
defer func() { // Closing the same SST multiple times is idempotent. msstw.Close()
defer msstw.Close()
pkg/storage/bulk/sst_writer.go, line 157 at r2 (raw file):
} // ApplyBatchRepr implements the Writer interface.
Here and below: engine.Writer interface
, because we're not in the engine
package.
pkg/storage/bulk/sst_writer.go, line 189 at r2 (raw file):
} fw.DataSize += int64(len(start.Key)) + int64(len(end.Key)) fw.scratch1 = engine.EncodeKeyToBuf(fw.scratch1[:0], start)
You might as well just use the same scratch buffer instead of introducing a new one:
fw.scratch = engine.EncodeKeyToBuf(fw.scratch[:0], start)
startScratch := fw.scratch
fw.scratch = engine.EncodeKeyToBuf(fw.scratch, end)
endScratch := fw.scratch[len(startScratch):]
return fw.fw.DeleteRange(startScratch, endScratch)
pkg/storage/engine/sst/writer.go, line 29 at r3 (raw file):
) // MakeIntTableKVs returns MVCC key values for a number of keys with given
Does this need to be part of the package's external interface? What about the next two functions? It looks like they came from a test file, so I don't see a reason why they need to be exported or even included in this file instead of writer_test.go
.
pkg/storage/engine/sst/writer.go, line 93 at r4 (raw file):
} type writeCloseSyncer interface {
I'm a little surprised govet
isn't complaining that this isn't exported even though it's part of the external interface of the package. Either way, you'll want to export it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jeffrey-xiao and @nvanbenschoten)
pkg/storage/engine/sst/writer.go, line 29 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Does this need to be part of the package's external interface? What about the next two functions? It looks like they came from a test file, so I don't see a reason why they need to be exported or even included in this file instead of
writer_test.go
.
they're reused bulk
's tests as well. I get that _test files aren't linked in non-test, but how I wish go would let other test binaries link them. I think they could get Testing
added somewhere in their names though.
pkg/storage/engine/sst/writer.go, line 93 at r4 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I'm a little surprised
govet
isn't complaining that this isn't exported even though it's part of the external interface of the package. Either way, you'll want to export it.
This us just re-stating an un-exported interface that is used in the Pebble public API. I think maybe govet lets you get away with it for interfaces since they're structurally instead of nominatively typed anyway?
8a5396e
to
fbf8b51
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/storage/store_snapshot.go, line 157 at r4 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I'm probably missing it, but when is this called? Also, how are we ensuring that the file is synced before closing?
This is passed into Pebble's sstable.Writer
and it's called when the sstable is closed. I just realized we would need logic to call Sync
when a "chunk" has been written. I've added this to multiSSTWriter
.
fbf8b51
to
6738b26
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 15 of 15 files at r5, 12 of 12 files at r6, 2 of 3 files at r7.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/storage/store_snapshot.go, line 157 at r4 (raw file):
This is passed into Pebble's sstable.Writer and it's called when the sstable is closed.
Where?
Also, I worry about the performance of calling limitBulkIOWrite
on every single key write now. Do you think it would make sense to move that to where we call Sync
instead?
pkg/storage/bulk/sst_writer.go, line 189 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
You might as well just use the same scratch buffer instead of introducing a new one:
fw.scratch = engine.EncodeKeyToBuf(fw.scratch[:0], start) startScratch := fw.scratch fw.scratch = engine.EncodeKeyToBuf(fw.scratch, end) endScratch := fw.scratch[len(startScratch):] return fw.fw.DeleteRange(startScratch, endScratch)
This isn't quite what we want. Since we're passing fw.scratch[len(startScratch):]
to EncodeKeyToBuf
, we're going to throw away part of the buffer if it needs to be resized. Instead, we want to append to the entire thing and then slice it up later.
pkg/storage/engine/sst/writer.go, line 29 at r3 (raw file):
Previously, dt (David Taylor) wrote…
they're reused
bulk
's tests as well. I get that _test files aren't linked in non-test, but how I wish go would let other test binaries link them. I think they could getTesting
added somewhere in their names though.
Well at a minimum we should also move these to the bottom of this file. I'd lean towards moving them into their own writer_testing.go
file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/storage/store_snapshot.go, line 157 at r4 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
This is passed into Pebble's sstable.Writer and it's called when the sstable is closed.
Where?
Also, I worry about the performance of calling
limitBulkIOWrite
on every single key write now. Do you think it would make sense to move that to where we callSync
instead?
The sync is called here: https://github.com/petermattis/pebble/blob/53f7531cb726299eb33b5849ae6935501f357442/sstable/writer.go#L649-L652
Makes sense to me. SSTSnapshotFile
would have to keep track of the bytes written before a Sync
in that case. Does that sound right?
pkg/storage/bulk/sst_writer.go, line 189 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
This isn't quite what we want. Since we're passing
fw.scratch[len(startScratch):]
toEncodeKeyToBuf
, we're going to throw away part of the buffer if it needs to be resized. Instead, we want to append to the entire thing and then slice it up later.
Hmm, I'm not sure I fully understand. If we pass fw.scratch
to engine.EncodeKeyToBuf
when encoding the end key, it's overwriting the buffer instead of append to it. Am I missing something?
pkg/storage/engine/sst/writer.go, line 29 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Well at a minimum we should also move these to the bottom of this file. I'd lean towards moving them into their own
writer_testing.go
file.
Ack. Will move them to writer_testing.go
when I make the other changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about letting this sit @jeffrey-xiao. We're ramping up for the next release and it's all hands on deck to stabilize things until then.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/storage/store_snapshot.go, line 157 at r4 (raw file):
The sync is called here: https://github.com/petermattis/pebble/blob/53f7531cb726299eb33b5849ae6935501f357442/sstable/writer.go#L649-L652
Ok, so it's the pebble sst writer itself that calls sync. That's worth a comment right above if err := fw.fw.Close(); err != nil {
.
SSTSnapshotFile would have to keep track of the bytes written before a Sync in that case. Does that sound right?
You mean multiSSTWriter
or sstFile
? I think it would be most natural to switch the state around to be bytesSinceLastSync
and then to maintain this on sstFile
. You would increment it in sstFile.Write
and clear it in sstFile.Sync
. This state is a property of a specific file, not themultiSSTWriter
, so it's strange that it lives there right now.
Does that make sense?
pkg/storage/bulk/sst_writer.go, line 189 at r2 (raw file):
Previously, jeffrey-xiao (Jeffrey Xiao) wrote…
Hmm, I'm not sure I fully understand. If we pass
fw.scratch
toengine.EncodeKeyToBuf
when encoding the end key, it's overwriting the buffer instead of append to it. Am I missing something?
Oh, I see. It's idiomatic for this style of function to append instead of overwrite/re-allocate for exactly this reason. Now we don't know whether the resulting slice is shorter or longer than the original one, so it's not clear whether we want to replace fw.scratch
or not. It looks like this was already causing confusion because the existing caller to this was passing fw.scratch[:0]
, which isn't necessary because the length is ignored and only the capacity is consulted.
If you're feeling inspired, I think the right thing to do here would be to have EncodeKeyToBuf
append the key to the provided buffer instead of doing what it's doing now.
pkg/storage/engine/sst/writer.go, line 29 at r3 (raw file):
Previously, jeffrey-xiao (Jeffrey Xiao) wrote…
Ack. Will move them to
writer_testing.go
when I make the other changes.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/storage/store_snapshot.go, line 157 at r4 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
The sync is called here: https://github.com/petermattis/pebble/blob/53f7531cb726299eb33b5849ae6935501f357442/sstable/writer.go#L649-L652
Ok, so it's the pebble sst writer itself that calls sync. That's worth a comment right above
if err := fw.fw.Close(); err != nil {
.SSTSnapshotFile would have to keep track of the bytes written before a Sync in that case. Does that sound right?
You mean
multiSSTWriter
orsstFile
? I think it would be most natural to switch the state around to bebytesSinceLastSync
and then to maintain this onsstFile
. You would increment it insstFile.Write
and clear it insstFile.Sync
. This state is a property of a specific file, not themultiSSTWriter
, so it's strange that it lives there right now.Does that make sense?
Also take a look at vfs.NewSyncingFile
which syncs periodically as a file is being written. Writing 64 MB of data and then calling Sync()
can cause large latency spikes for other concurrent IO operations. We configure RocksDB and Pebble to sync every 512KB.
2f963f0
to
fdf82b9
Compare
fdf82b9
to
0771573
Compare
Release note: None
54c4f70
to
5a80d7b
Compare
This commit moves SSTWriter and sstIterator to a new package engine.sst. Release note: None
This change should be beneficial across the board since we are avoiding the cost of a Cgo call per Put. We are also reducing the memory cost since we are eliminating the in-memory buffer when creating the SST chunks incrementally. Release note: None
5a80d7b
to
3e2401b
Compare
42252: storage: Move SSTWriter to engine, use it instead of Rocks-based writer r=itsbilal a=itsbilal Currently, we use `RocksDBSstFileWriter` for writing SSTs when receiving cockroach snapshots. This incurs some extra performance overhead with cgo crossings that we can avoid by using the `sstable.Writer` in pebble. This PR moves a wrapper for the Pebble-based writer that currently resides in `bulk` to `engine`, then adds methods to it so it could be used as a drop-in replacement (almost) for the RocksDB writer. In the second commit, all non-test uses of `RocksDBSstFileWriter` are moved to `SSTWriter`. Parts taken from #39802 , and fixes #41657. Co-authored-by: Bilal Akhtar <bilal@cockroachlabs.com>
The first commit implements the
engine.Writer
interface forbulk.SSTWriter
and also adapts it to accept anwriteCloseSyncer
. This change allows users to specify if they want to write directly to the disk or to an in-memory buffer.The second commit creates a new package in
engine
calledsst
that storessst.Iterator
andsst.Writer
.The third commit uses
bulk.SSTWriter
to write SSTs when receiving snapshots.Using the same recovery evaluation metric in #38932, I found that there was a nice win in terms of recovery time for an average range size of 128 MiB.
This change should be beneficial across the board since we are avoiding the cost of a Cgo call per
Put
. We are also reducing the memory cost since we are eliminating the in-memory buffer when creating the SST chunks incrementally.Depends on petermattis/pebble#240.
Fixes #39748.