persist: Add `FixedSizeBytesStats` #27856

ParkMyCar · 2024-06-24T18:21:45Z

This PR adds a new variant of BytesStats called FixedSizedBytesStats. It's very similar to AtomicBytesStats in that they can't be trimmed, but they're explicitly for types that implement the new FixedSizeCodec.

Motivation

Related to #24830

As we evolve our encodings in Persist, what bytes are recorded will change. For example, today stats for ScalarType::Time are AtomicBytesStats of a ProtoNaiveTime, but with our new columnar encodings it will be a PackedNaiveTime. We need a way to distinguish between the two.

An alternative is adding a "kind" field to AtomicBytesStats, I originally implemented this but we determined the better approach would be a whole separate type which allows for a cleaner separation.

Checklist

This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
This PR includes the following user-facing behavior changes:

bkirwi

IIRC we discussed using a part-level version number to manage migrations at the actual data level; wondering if that would make sense here.

Personally I kind of appreciate the PackedBytesStats idea... while in many ways it is similar to the atomic stats as you mention, there are important differences: it has a meaningful sort, and it can be computed off the columnar representation instead of the raw data. (One option would be to call it FixedLengthBytesStats or whatever, which stresses that while the proto types are similar we're dealing with a different sort of thing at the arrow level.)

src/repr/src/row/encoding.rs

ParkMyCar · 2024-06-25T15:12:34Z

IIRC we discussed using a part-level version number to manage migrations at the actual data level; wondering if that would make sense here.

For sure! I thought about adding a version number to stats, and it definitely should be possible, just a bit heavier weight. In the meantime I realized at the moment the only type of stats that we really need to evolve are the AtomicBytesStats and just adding a "kind" here was faster than adding a version.

Personally I kind of appreciate the PackedBytesStats idea... while in many ways it is similar to the atomic stats as you mention, there are important differences: it has a meaningful sort, and it can be computed off the columnar representation instead of the raw data. (One option would be to call it FixedLengthBytesStats or whatever, which stresses that while the proto types are similar we're dealing with a different sort of thing at the arrow level.)

Totally fair, and it is a cleaner separation between the older stats types and the newer types. I'll update this PR to add a PackedBytesStats instead of adding a "kind" to AtomicBytesStats

bkirwi

🙏

ParkMyCar requested a review from bkirwi June 24, 2024 18:21

ParkMyCar requested review from a team as code owners June 24, 2024 18:21

ParkMyCar mentioned this pull request Jun 24, 2024

persist: Stats for the new Columnar encoders #27857

Merged

5 tasks

bkirwi reviewed Jun 25, 2024

View reviewed changes

src/repr/src/row/encoding.rs Outdated Show resolved Hide resolved

start, add FixedSizeBytesStats

5849291

ParkMyCar force-pushed the persist/stats-add-kind-to-atomic-byte branch from 682cf9f to 5849291 Compare June 25, 2024 15:54

ParkMyCar changed the title ~~perist: Add "kind" to AtomicBytesStats~~ perist: Add FixedSizeBytesStats Jun 25, 2024

fix tests and lints

485c244

antiguru changed the title ~~perist: Add FixedSizeBytesStats~~ persist: Add FixedSizeBytesStats Jun 26, 2024

bkirwi approved these changes Jun 26, 2024

View reviewed changes

ParkMyCar merged commit 32430f0 into MaterializeInc:main Jun 26, 2024
76 checks passed

materialize-bot mentioned this pull request Jun 27, 2024

release: v0.106.0 required reviews #27930

Closed

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

persist: Add `FixedSizeBytesStats` #27856

persist: Add `FixedSizeBytesStats` #27856

ParkMyCar commented Jun 24, 2024 •

edited

Loading

bkirwi left a comment

ParkMyCar commented Jun 25, 2024

bkirwi left a comment

persist: Add FixedSizeBytesStats #27856

persist: Add FixedSizeBytesStats #27856

Conversation

ParkMyCar commented Jun 24, 2024 • edited Loading

Motivation

Checklist

bkirwi left a comment

Choose a reason for hiding this comment

ParkMyCar commented Jun 25, 2024

bkirwi left a comment

Choose a reason for hiding this comment

persist: Add `FixedSizeBytesStats` #27856

persist: Add `FixedSizeBytesStats` #27856

ParkMyCar commented Jun 24, 2024 •

edited

Loading