Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backupccl: add prototype metadata.sst #76705

Merged
merged 3 commits into from
Mar 3, 2022
Merged

Conversation

rhu713
Copy link
Contributor

@rhu713 rhu713 commented Feb 16, 2022

This adds writing of an additional file to the completion of BACKUP. This new file
is an sstable that contains the same metadata currently stored in the BACKUP_MANIFEST
file and statistics files, but organizes that data differently.

The current BACKUP_MANIFEST file contains a single binary-encoded protobuf message
of type BackupManifest, that in turn has several fields some of which are repeated
to contain e.g. the TableDescriptor for every table backed up, or every revision to
every table descriptor backed up. This can result in these manifests being quite large
in some cases, which is potentially concerning because as a single protobuf message,
one has to read and unmarshal the entire struct into memory to read any field(s) of it.

Organizing this metadata into an SSTable where repeated fields are instead stored as
separate messages under separate keys should instead allow reading it incrementally:
one can seek to a particular key or key prefix and then scan, acting on whatever data
is found as it is read, without loading the entire file at once (when opened using the
same seek-ing remote SST reader we use to read backup data ssts).

This initial prototype adds only the writer -- RESTORE does not rely on, or even open,
this new file at this time.

Release note: none.

@rhu713 rhu713 requested review from dt and a team February 16, 2022 21:03
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@rhu713 rhu713 requested a review from a team as a code owner February 17, 2022 20:26
@rhu713 rhu713 force-pushed the manifest-sst branch 2 times, most recently from 567f382 to 54bf5d4 Compare February 22, 2022 17:05
@dt
Copy link
Member

dt commented Feb 22, 2022

Been trying to think forward to next version and about how we'll write this SST -- sorted -- for an unbounded number of files during a backup. For now of course we assume all file metadata fits ram to write the proto, so we can also sort it to write the sst, but if we wanted to change that later, this might be tricky? we could of course record file (in rows in some job state table perhaps) and then read them sorted to make the SST all at once. Another idea though that might motivate changing this now would be to pull the files info out of "the" manifest and instead allow having many file list SSTs, e.g. manifest/files.123.sst, manifest/files.124.sst, etc which need to be combined to get the total set of all files ? Initially we could just write one, with everything just like it'd be in the files keys of this SST, but later switch to flush smaller, men-bound file lists as we go?

�I suppose we don't need to do that now: if we flushed the smaller files as we went later, we could always read them back in a file merge phase of the backup to produce this SST as-is, with it's contract being that it is the complete list. So maybe nothing we need to do now?

@rhu713
Copy link
Contributor Author

rhu713 commented Feb 22, 2022

I suppose we don't need to do that now: if we flushed the smaller files as we went later, we could always read them back in a file merge phase of the backup to produce this SST as-is

I think it makes sense to implement this now. Is the change to have a list of ssts in the main manifest sst for filenames that contain file data? What about the other fields that may be large, maybe spans or stats?

@dt
Copy link
Member

dt commented Feb 22, 2022

I think spans and stats are OK since we can iterate over all the tables we're backing up and add their span, or their stats, to the SST builder, in order, flush, and move on to the next, without holding them all in memory at once. Files is the hard one, I think, because they're produced out of order but need to go into the SST in-order, thus the forced buffering stage to sort.

key := bi.Iter.UnsafeKey()
resWrapper.key.Key = resWrapper.key.Key[:0]
resWrapper.value = resWrapper.value[:0]
resWrapper.key.Key = append(resWrapper.key.Key, key.Key...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have a Clone() method on key these days?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

iterError error
}

func makeBytesIter(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm probably missing something, but it seems like we already open the SST in makeMetadata, to read the fixed-set of details (nee manifest) so we could just pass that one instead of re-opening it for each of these typed iterators?

And then perhaps the typed iterators, if they just seek and read from it directly, could potentially also avoid some cloning of unsafe value just to pass it to unmarshal anyway, so maybe we wouldn't even need the intermedia bytesIter at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to avoid the case where someone might use multiple of these iterators in a nested manner, or instantiate another iterator without exhausting the the current one, e.g. in something like this where spans is nested inside descriptors:

if index.Adding() && spans.ContainsKey(p.ExecCfg().Codec.IndexPrefix(uint32(table.ID), uint32(index.GetID()))) {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can Seek() the iterator around to different key ranges if we want to read some things here, then jump to some there, and if someone knows they really want to consume a little this and a little that side-by-side at the same time, they could choose themselves to just open duplicate iterators to avoid the seeks, so I don't know if I'd prescribe that ahead of time by forcing creation of type-specific readers.

pkg/ccl/backupccl/backup_metadata.go Outdated Show resolved Hide resolved
// and false if there are no more elements or if an error was encountered. When
// Next returns false, the user should call the Err method to verify the
// existence of an error.
func (si *SpanIterator) Next(span *roachpb.Span) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any particular objection to these write-to-value-at-pointer style, but I think pebble and pkg/storage tend to do separate-method-to-read-current-value, e.g. for i.Next() { x := i.Val(); use(x); }. I don't see a particular difference though.

pkg/ccl/backupccl/backup_metadata.go Show resolved Hide resolved
dt and others added 3 commits February 24, 2022 17:09
This adds writing of an additional file to the completion of BACKUP. This new file
is an sstable that contains the same metadata currently stored in the BACKUP_MANIFEST
file and statistics files, but organizes that data differently.

The current BACKUP_MANIFEST file contains a single binary-encoded protobuf message
of type BackupManifest, that in turn has several fields some of which are repeated
to contain e.g. the TableDescriptor for every table backed up, or every revision to
every table descriptor backed up. This can result in these manifests being quite large
in some cases, which is potentially concerning because as a single protobuf message,
one has to read and unmarshal the entire struct into memory to read any field(s) of it.

Organizing this metadata into an SSTable where repeated fields are instead stored as
separate messages under separate keys should instead allow reading it incrementally:
one can seek to a particular key or key prefix and then scan, acting on whatever data
is found as it is read, without loading the entire file at once (when opened using the
same seek-ing remote SST reader we use to read backup data ssts).

Thi initial prototype adds only the writer -- RESTORE does not rely on, or even open,
this new file at this time.

Release note: none.
This is just a debugging tool for inspecting a new backup metadata SST.

Release note: none.
@rhu713
Copy link
Contributor Author

rhu713 commented Feb 24, 2022

I changed the KVs in the main SST to store paths to additional SSTs that contain the actual BackupManifest_Files

@rhu713
Copy link
Contributor Author

rhu713 commented Mar 3, 2022

bors r+

@craig craig bot merged commit 960f2b4 into cockroachdb:master Mar 3, 2022
@craig
Copy link
Contributor

craig bot commented Mar 3, 2022

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants