Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DeletionVectorDescriptor to represent DV metadata in DeltaLog #1528

Closed
wants to merge 3 commits into from

Conversation

vkorukanti
Copy link
Collaborator

Description

This PR is part of the feature: Support reading Delta tables with deletion vectors (more details at #1485)

It adds new class called DeletionVectorDescriptor which is used to represent the deletion vector for a given file (AddFile) in DeltaLog. The format of the metadata is described in the protocol.

How was this patch tested?

New test suite is added

This PR is part of the feature: Support reading Delta tables with deletion vectors (more details of supporting DV table reads at delta-io#1485)

This PR adds a new class called `DeletionVectorDescriptor` which is used to represent the deletion vector for a given file (`AddFile`) in `DeltaLog`. The format of the metadata is described in the [protocol](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#deletion-vector-descriptor-schema).

GitOrigin-RevId: 59979dea64d7c938d33b886eaebe0b30b2f8dd6c
Comment on lines 134 to 140
/**
* Produce a copy of this DV, with `pathOrInlineDv` replaced by a relative path based on `id`
* and `randomPrefix`.
*/
def copyWithNewRelativePath(id: UUID, randomPrefix: String): DeletionVectorDescriptor = {
this.copy(storageType = UUID_DV_MARKER, pathOrInlineDv = encodeUUID(id, randomPrefix))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a noop copy for inline DVs, since we shouldn't write them out to disk ever anyway.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense. Will update it.

* Optionally, prepend a `prefix` to the name.
*/
def assembleDeletionVectorPath(targetParentPath: Path, id: UUID, prefix: String = ""): Path = {
val core = DELETION_VECTOR_FILE_NAME_CORE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really worthwhile to assign a field to a (shorter) field? Why not just use the static field directly?

Copy link
Contributor

@larsk-db larsk-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vkorukanti vkorukanti deleted the dv4 branch October 2, 2023 05:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants