-
Notifications
You must be signed in to change notification settings - Fork 466
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
persist: Add FixedSizeBytesStats
#27856
persist: Add FixedSizeBytesStats
#27856
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC we discussed using a part-level version number to manage migrations at the actual data level; wondering if that would make sense here.
Personally I kind of appreciate the PackedBytesStats
idea... while in many ways it is similar to the atomic stats as you mention, there are important differences: it has a meaningful sort, and it can be computed off the columnar representation instead of the raw data. (One option would be to call it FixedLengthBytesStats
or whatever, which stresses that while the proto types are similar we're dealing with a different sort of thing at the arrow level.)
For sure! I thought about adding a version number to stats, and it definitely should be possible, just a bit heavier weight. In the meantime I realized at the moment the only type of stats that we really need to evolve are the
Totally fair, and it is a cleaner separation between the older stats types and the newer types. I'll update this PR to add a |
682cf9f
to
5849291
Compare
FixedSizeBytesStats
FixedSizeBytesStats
FixedSizeBytesStats
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🙏
This PR adds a new variant of
BytesStats
calledFixedSizedBytesStats
. It's very similar toAtomicBytesStats
in that they can't be trimmed, but they're explicitly for types that implement the newFixedSizeCodec
.Motivation
Related to #24830
As we evolve our encodings in Persist, what bytes are recorded will change. For example, today stats for
ScalarType::Time
areAtomicBytesStats
of aProtoNaiveTime
, but with our new columnar encodings it will be aPackedNaiveTime
. We need a way to distinguish between the two.An alternative is adding a "kind" field to
AtomicBytesStats
, I originally implemented this but we determined the better approach would be a whole separate type which allows for a cleaner separation.Checklist
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way), then it is tagged with aT-proto
label.