Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parquet should encode RowSets more natively #5262

Closed
rcaudy opened this issue Mar 19, 2024 · 1 comment
Closed

Parquet should encode RowSets more natively #5262

rcaudy opened this issue Mar 19, 2024 · 1 comment
Assignees
Labels
2023_unscheduled core Core development tasks feature request New feature or request parquet Related to the Parquet integration query engine
Milestone

Comments

@rcaudy
Copy link
Member

rcaudy commented Mar 19, 2024

Right now, we encode and decode RowSets using RowSetCodec, delegating to ExternalizableRowSetUtils. We should consider whether we can replace this with an array-formatted column of longs (following the same strategy for representing individual row keys and row key ranges as we currently employ) and a new ColumnTypeInfo.SpecialType. This might produce comparable compression while allowing the column data to at least make some sense to external tools.

Note that we should still preserve RowSetCodec for backwards-compatibility purposes.

@rcaudy rcaudy added feature request New feature or request query engine core Core development tasks parquet Related to the Parquet integration labels Mar 19, 2024
@rcaudy rcaudy added this to the 4. Unscheduled milestone Mar 19, 2024
@pete-petey pete-petey modified the milestones: 4. Unscheduled, 5. Backlog Aug 26, 2024
@malhotrashivam
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2023_unscheduled core Core development tasks feature request New feature or request parquet Related to the Parquet integration query engine
Projects
None yet
Development

No branches or pull requests

3 participants