-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: Move SSTWriter to engine, use it instead of Rocks-based writer #42252
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, though my comments below indicate that there will be work to do in a follow-on PR.
Reviewed 7 of 7 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajkr, @itsbilal, @petermattis, and @sumeerbhola)
pkg/ccl/importccl/load.go, line 365 at r2 (raw file):
} if err := base.WriteFile(ctx, filename, bytes.NewReader(sstContents)); err != nil {
We build the sstable in memory, only to immediately write it to disk. More evidence that there is further API cleanup needed here (not for this PR).
pkg/storage/engine/sst_writer.go, line 27 at r1 (raw file):
f *memFile // DataSize tracks the total key and value bytes added so far. DataSize uint64
What is this used for? sstable.Writer.EstimatedSize()
will give a more accurate estimation of the final sstable size.
pkg/storage/engine/sst_writer.go, line 42 at r1 (raw file):
MergerName: "nullptr", TablePropertyCollectors: PebbleTablePropertyCollectors, }
Better to use DefaultPebbleOptions().MakeWriterOptions(0)
here and then override TableFormat
to be LevelDB. That way, changes to the default options show up here.
pkg/storage/engine/sst_writer.go, line 85 at r1 (raw file):
// Truncate returns the in-memory buffer from the previous call to Truncate // (or the start), until now. func (fw *SSTWriter) Truncate() ([]byte, error) {
What is this about? Seems like an odd method to exist here.
Oh, it is used by multiSSTWriter
to stream the data out to disk. That is just convoluted. We should instead have SSTWriter
write directly to disk. Can you add a TODO that this method should be removed and SSTWriter
should take an optional path to write to disk.
pkg/storage/engine/sst_writer.go, line 173 at r1 (raw file):
} type memFile struct {
We should expose a writable version of vfs.NewMemFile
, though I suppose you don't get access to the data
.
pkg/storage/engine/sst_writer.go, line 195 at r1 (raw file):
} func (f *memFile) ReadAt(p []byte, off int64) (int, error) {
Do you need Read
and ReadAt
? I'm pretty sure sstable.Writer
doesn't need those methods.
53a1161
to
9c7b84c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just went down the rabbit hole and actually updated engine.SSTWriter to take file handles, in the third commit. The stuff in store_snapshot.go
became a lot simpler as a result. PTAL!
I could also do a fourth commit where I move over the last remaining RocksDBSstFileWriter uses to the pebble one (all in tests), except for the one test that compares the two.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajkr, @itsbilal, @petermattis, and @sumeerbhola)
pkg/ccl/importccl/load.go, line 365 at r2 (raw file):
Previously, petermattis (Peter Mattis) wrote…
We build the sstable in memory, only to immediately write it to disk. More evidence that there is further API cleanup needed here (not for this PR).
Done. TODO added. This is a particularly annoying instance, the interface that base belongs to doesn't let us get a file handle easily that we could just pass to SSTWriter.
pkg/storage/engine/sst_writer.go, line 27 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
What is this used for?
sstable.Writer.EstimatedSize()
will give a more accurate estimation of the final sstable size.
It's used to determine whether we wrote anything of use to the SSTable, and if we should call SSSS.NewFile()
There are times when EstimatedSize would return a non-zero number while this wouldn't, so it's probably worth keeping.
pkg/storage/engine/sst_writer.go, line 42 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
Better to use
DefaultPebbleOptions().MakeWriterOptions(0)
here and then overrideTableFormat
to be LevelDB. That way, changes to the default options show up here.
Done.
pkg/storage/engine/sst_writer.go, line 85 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
What is this about? Seems like an odd method to exist here.
Oh, it is used by
multiSSTWriter
to stream the data out to disk. That is just convoluted. We should instead haveSSTWriter
write directly to disk. Can you add a TODO that this method should be removed andSSTWriter
should take an optional path to write to disk.
Removed.
pkg/storage/engine/sst_writer.go, line 195 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
Do you need
Read
andReadAt
? I'm pretty suresstable.Writer
doesn't need those methods.
Done. Removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the changes to bulk IO and KV code, @dt and @nvanbenschoten should give this a quick review to make sure we're not messing anything up.
Reviewed 12 of 12 files at r3.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @ajkr, @itsbilal, @petermattis, and @sumeerbhola)
pkg/ccl/importccl/load.go, line 367 at r5 (raw file):
// TODO(itsbilal): Pass a file handle into SSTWriter instead of writing to a // MemFile first.
Might be able to snipe @dt or someone else from Bulk IO into doing this. ExternalStorage
needs to be able to provide a Writer
interface for uploading data. Shouldn't be difficult.
pkg/storage/replica_raftstorage.go, line 882 at r5 (raw file):
} if unreplicatedSST.DataSize > 0 { if err := inSnap.SSSS.WriteSST(ctx, unreplicatedSSTFile.Data()); err != nil {
This deserves a TODO as it maintains the same strange behavior as previously where we build the sstable in memory and then write it to a file. Better to just build the sstable directly on disk.
pkg/storage/engine/sst_writer.go, line 195 at r1 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
Done. Removed.
Read()
is still present. Is it needed for something?
pkg/storage/engine/sst_writer_test.go, line 171 at r5 (raw file):
_ = makePebbleSST(b, kvs) } b.StopTimer()
How does the benchmark time for building sstables compare between Pebble and RocksDB?
9c7b84c
to
bf69283
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajkr, @itsbilal, @petermattis, and @sumeerbhola)
pkg/storage/replica_raftstorage.go, line 882 at r5 (raw file):
Previously, petermattis (Peter Mattis) wrote…
This deserves a TODO as it maintains the same strange behavior as previously where we build the sstable in memory and then write it to a file. Better to just build the sstable directly on disk.
Done.
pkg/storage/engine/sst_writer.go, line 195 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
Read()
is still present. Is it needed for something?
Nope, don't know why I thought it was being used as part of writeCloseSyncer
. Removed.
pkg/storage/engine/sst_writer_test.go, line 171 at r5 (raw file):
Previously, petermattis (Peter Mattis) wrote…
How does the benchmark time for building sstables compare between Pebble and RocksDB?
Comparing just the SSTWriters themselves (which, it's important to note, is only part of the picture though a pretty important part), where old = RocksDBSstFileWriter:
name old time/op new time/op delta
WriteSSTable-12 3.95µs ± 3% 2.48µs ± 4% -37.37% (p=0.008 n=5+5)
name old speed new speed delta
WriteSSTable-12 301MB/s ± 3% 481MB/s ± 4% +59.71% (p=0.008 n=5+5)
Any benchmarks around streaming / applying snapshots would also be interesting.
bf69283
to
64946a2
Compare
This commit does two things: It moves the SSTWriter from the bulk package to the engine package, and makes any related import changes. The tests file for SSTWriter is also moved accordingly. In addition, it adds any methods necessary to SSTWriter to make it adhere to the engine.Writer interface - Put, ClearRange, etc. This is useful in the next commit. SSTWriter.Add() is removed as it is effectively replaced by Put(). Release note: None
Now that SSTFileWriter is in the engine package and implements Writer, it can be used as a drop-in replacement for the RocksDB variant. Fixes cockroachdb#41657 Release note: None
64946a2
to
832c889
Compare
Resolved merge conflicts, and added a 4th commit that updates all tests using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a quick scan of the bulk-IO parts and LGTM.
Reviewed 1 of 12 files at r6, 1 of 11 files at r8.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajkr, @dt, @itsbilal, @nvanbenschoten, @petermattis, and @sumeerbhola)
pkg/ccl/importccl/load.go, line 367 at r5 (raw file):
Previously, petermattis (Peter Mattis) wrote…
Might be able to snipe @dt or someone else from Bulk IO into doing this.
ExternalStorage
needs to be able to provide aWriter
interface for uploading data. Shouldn't be difficult.
Hm, the Reader
argument to uploads mostly comes from the cloud APIs. Some of the current implementations also rely on being able to Seek
in the passed content to retry files on failure (in some cases we added this because of flaky cloud providers, others include it internally). Even if we could stick a buffer in the middle to turn a Writer into a Reader, if we switched to a more streaming model where we upload as we write, retries will be tricky (i.e. we might need to build another layer of reties, over the entire request, if we can no longer retry just uploading the result).
Happy to have my name added to the TODO, but I don't know if it'll happen right away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 12 files at r3, 12 of 12 files at r6, 6 of 6 files at r7, 11 of 11 files at r8, 5 of 5 files at r9.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @ajkr, @itsbilal, @petermattis, and @sumeerbhola)
pkg/storage/replica_sst_snapshot_storage.go, line 109 at r8 (raw file):
// the provided SST when it is finished using it. If the provided SST is empty, // then no file will be created and nothing will be written. func (ssss *SSTSnapshotStorageScratch) WriteSST(ctx context.Context, data []byte) error {
If we're taking a context in NewFile
, why does this still also take a context?
pkg/storage/replica_sst_snapshot_storage.go, line 212 at r8 (raw file):
return nil } if sssf.chunkSize > 0 && len(sssf.buffer) > 0 {
Remove the sssf.chunkSize > 0
condition. We don't care about that here.
pkg/storage/store_snapshot.go, line 122 at r8 (raw file):
ssss *SSTSnapshotStorageScratch currSST engine.SSTWriter currSSTFile *SSTSnapshotStorageFile
Do you mind checking that we still need this field? It seems to be used once in multiSSTWriter.Close
, but I'd expect engine.SSTWriter.Close
to have already closed the file. Is that incorrect?
832c889
to
c33926a
Compare
Instead of always having a MemFile internally, this changes the SST Writer to take a fileCloseSyncer parameter that could potentially be a non-memfile. Also update multiSSTWriter to use a chunkSize-aware SSTSnapshotStorageFile instead of handling truncation inside it. Release note: None
Update any tests (other than those that explicitly test for equivalence with RocksDB-built SSTs) to use the pebble-based engine.SSTWriter instead of RocksDBSstFileWriter. Release note: None
c33926a
to
47396f3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @ajkr, @itsbilal, @nvanbenschoten, @petermattis, and @sumeerbhola)
pkg/storage/replica_sst_snapshot_storage.go, line 109 at r8 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
If we're taking a context in
NewFile
, why does this still also take a context?
Because we call NewFile
in this method. The context is stored on a per-file basis, which we could probably change if we want.
pkg/storage/replica_sst_snapshot_storage.go, line 212 at r8 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Remove the
sssf.chunkSize > 0
condition. We don't care about that here.
Done.
pkg/storage/store_snapshot.go, line 122 at r8 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Do you mind checking that we still need this field? It seems to be used once in
multiSSTWriter.Close
, but I'd expectengine.SSTWriter.Close
to have already closed the file. Is that incorrect?
That's correct, it's not necessary. Thanks for discovering! Removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 7 files at r10, 5 of 5 files at r11.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @ajkr, @itsbilal, @petermattis, and @sumeerbhola)
pkg/storage/replica_sst_snapshot_storage.go, line 109 at r8 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
Because we call
NewFile
in this method. The context is stored on a per-file basis, which we could probably change if we want.
Oh I see, I thought this was a method on SSTSnapshotStorageFile
, not on SSTSnapshotStorageScratch
.
Thanks for the review! bors r+ |
42252: storage: Move SSTWriter to engine, use it instead of Rocks-based writer r=itsbilal a=itsbilal Currently, we use `RocksDBSstFileWriter` for writing SSTs when receiving cockroach snapshots. This incurs some extra performance overhead with cgo crossings that we can avoid by using the `sstable.Writer` in pebble. This PR moves a wrapper for the Pebble-based writer that currently resides in `bulk` to `engine`, then adds methods to it so it could be used as a drop-in replacement (almost) for the RocksDB writer. In the second commit, all non-test uses of `RocksDBSstFileWriter` are moved to `SSTWriter`. Parts taken from #39802 , and fixes #41657. Co-authored-by: Bilal Akhtar <bilal@cockroachlabs.com>
Build succeeded |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @itsbilal, @petermattis, and @sumeerbhola)
pkg/storage/replica_sst_snapshot_storage.go, line 174 at r11 (raw file):
// SSTSnapshotStorageScratch. Writing empty contents is okay and is treated as // a noop. The file must have not been closed. func (sssf *SSTSnapshotStorageFile) Write(contents []byte) (int, error) {
Where is the int return value used?
pkg/storage/replica_sst_snapshot_storage.go, line 174 at r11 (raw file): Previously, ajkr (Andrew Kryczka) wrote…
It's used inside |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @itsbilal, @petermattis, and @sumeerbhola)
pkg/storage/replica_sst_snapshot_storage.go, line 174 at r11 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
It's used inside
sstable.Writer.writeRawBlock
in Pebble. Also it's part of the io.Writer interface so we're forced to supply it as a result.
Understood, thanks.
Currently, we use
RocksDBSstFileWriter
for writing SSTs when receiving cockroach snapshots. This incurs some extra performance overhead with cgo crossings that we can avoid by using thesstable.Writer
in pebble.This PR moves a wrapper for the Pebble-based writer that currently resides in
bulk
toengine
, then adds methods to it so it could be used as a drop-in replacement (almost) for the RocksDB writer. In the second commit, all non-test uses ofRocksDBSstFileWriter
are moved toSSTWriter
.Parts taken from #39802 , and fixes #41657.