-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add DeletionVectorDescriptor
to represent DV metadata in DeltaLog
#1528
Conversation
This PR is part of the feature: Support reading Delta tables with deletion vectors (more details of supporting DV table reads at delta-io#1485) This PR adds a new class called `DeletionVectorDescriptor` which is used to represent the deletion vector for a given file (`AddFile`) in `DeltaLog`. The format of the metadata is described in the [protocol](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#deletion-vector-descriptor-schema). GitOrigin-RevId: 59979dea64d7c938d33b886eaebe0b30b2f8dd6c
/** | ||
* Produce a copy of this DV, with `pathOrInlineDv` replaced by a relative path based on `id` | ||
* and `randomPrefix`. | ||
*/ | ||
def copyWithNewRelativePath(id: UUID, randomPrefix: String): DeletionVectorDescriptor = { | ||
this.copy(storageType = UUID_DV_MARKER, pathOrInlineDv = encodeUUID(id, randomPrefix)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a noop copy for inline DVs, since we shouldn't write them out to disk ever anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense. Will update it.
* Optionally, prepend a `prefix` to the name. | ||
*/ | ||
def assembleDeletionVectorPath(targetParentPath: Path, id: UUID, prefix: String = ""): Path = { | ||
val core = DELETION_VECTOR_FILE_NAME_CORE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it really worthwhile to assign a field to a (shorter) field? Why not just use the static field directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Description
This PR is part of the feature: Support reading Delta tables with deletion vectors (more details at #1485)
It adds new class called
DeletionVectorDescriptor
which is used to represent the deletion vector for a given file (AddFile
) inDeltaLog
. The format of the metadata is described in the protocol.How was this patch tested?
New test suite is added