-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
backupccl: add prototype metadata.sst #76705
Conversation
567f382
to
54bf5d4
Compare
Been trying to think forward to next version and about how we'll write this SST -- sorted -- for an unbounded number of files during a backup. For now of course we assume all file metadata fits ram to write the proto, so we can also sort it to write the sst, but if we wanted to change that later, this might be tricky? we could of course record file (in rows in some job state table perhaps) and then read them sorted to make the SST all at once. Another idea though that might motivate changing this now would be to pull the �I suppose we don't need to do that now: if we flushed the smaller files as we went later, we could always read them back in a file merge phase of the backup to produce this SST as-is, with it's contract being that it is the complete list. So maybe nothing we need to do now? |
I think it makes sense to implement this now. Is the change to have a list of ssts in the main manifest sst for filenames that contain file data? What about the other fields that may be large, maybe spans or stats? |
I think spans and stats are OK since we can iterate over all the tables we're backing up and add their span, or their stats, to the SST builder, in order, flush, and move on to the next, without holding them all in memory at once. Files is the hard one, I think, because they're produced out of order but need to go into the SST in-order, thus the forced buffering stage to sort. |
pkg/ccl/backupccl/backup_metadata.go
Outdated
key := bi.Iter.UnsafeKey() | ||
resWrapper.key.Key = resWrapper.key.Key[:0] | ||
resWrapper.value = resWrapper.value[:0] | ||
resWrapper.key.Key = append(resWrapper.key.Key, key.Key...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have a Clone() method on key these days?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
iterError error | ||
} | ||
|
||
func makeBytesIter( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm probably missing something, but it seems like we already open the SST in makeMetadata, to read the fixed-set of details (nee manifest) so we could just pass that one instead of re-opening it for each of these typed iterators?
And then perhaps the typed iterators, if they just seek and read from it directly, could potentially also avoid some cloning of unsafe value just to pass it to unmarshal anyway, so maybe we wouldn't even need the intermedia bytesIter at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to avoid the case where someone might use multiple of these iterators in a nested manner, or instantiate another iterator without exhausting the the current one, e.g. in something like this where spans is nested inside descriptors:
cockroach/pkg/ccl/backupccl/restore_planning.go
Line 1915 in 4f4f2ac
if index.Adding() && spans.ContainsKey(p.ExecCfg().Codec.IndexPrefix(uint32(table.ID), uint32(index.GetID()))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can Seek() the iterator around to different key ranges if we want to read some things here, then jump to some there, and if someone knows they really want to consume a little this and a little that side-by-side at the same time, they could choose themselves to just open duplicate iterators to avoid the seeks, so I don't know if I'd prescribe that ahead of time by forcing creation of type-specific readers.
// and false if there are no more elements or if an error was encountered. When | ||
// Next returns false, the user should call the Err method to verify the | ||
// existence of an error. | ||
func (si *SpanIterator) Next(span *roachpb.Span) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have any particular objection to these write-to-value-at-pointer style, but I think pebble and pkg/storage tend to do separate-method-to-read-current-value, e.g. for i.Next() { x := i.Val(); use(x); }
. I don't see a particular difference though.
This adds writing of an additional file to the completion of BACKUP. This new file is an sstable that contains the same metadata currently stored in the BACKUP_MANIFEST file and statistics files, but organizes that data differently. The current BACKUP_MANIFEST file contains a single binary-encoded protobuf message of type BackupManifest, that in turn has several fields some of which are repeated to contain e.g. the TableDescriptor for every table backed up, or every revision to every table descriptor backed up. This can result in these manifests being quite large in some cases, which is potentially concerning because as a single protobuf message, one has to read and unmarshal the entire struct into memory to read any field(s) of it. Organizing this metadata into an SSTable where repeated fields are instead stored as separate messages under separate keys should instead allow reading it incrementally: one can seek to a particular key or key prefix and then scan, acting on whatever data is found as it is read, without loading the entire file at once (when opened using the same seek-ing remote SST reader we use to read backup data ssts). Thi initial prototype adds only the writer -- RESTORE does not rely on, or even open, this new file at this time. Release note: none.
This is just a debugging tool for inspecting a new backup metadata SST. Release note: none.
Release note: None
I changed the KVs in the main SST to store paths to additional SSTs that contain the actual |
bors r+ |
Build succeeded: |
This adds writing of an additional file to the completion of BACKUP. This new file
is an sstable that contains the same metadata currently stored in the BACKUP_MANIFEST
file and statistics files, but organizes that data differently.
The current BACKUP_MANIFEST file contains a single binary-encoded protobuf message
of type BackupManifest, that in turn has several fields some of which are repeated
to contain e.g. the TableDescriptor for every table backed up, or every revision to
every table descriptor backed up. This can result in these manifests being quite large
in some cases, which is potentially concerning because as a single protobuf message,
one has to read and unmarshal the entire struct into memory to read any field(s) of it.
Organizing this metadata into an SSTable where repeated fields are instead stored as
separate messages under separate keys should instead allow reading it incrementally:
one can seek to a particular key or key prefix and then scan, acting on whatever data
is found as it is read, without loading the entire file at once (when opened using the
same seek-ing remote SST reader we use to read backup data ssts).
This initial prototype adds only the writer -- RESTORE does not rely on, or even open,
this new file at this time.
Release note: none.