Parquet: Support identifying hierarchical partitioning and schema from Spark/Dask metadata files #868

rcaudy · 2021-07-19T23:14:47Z

We can and should be able to filter partitions according to the values they contain when thrift metadata files are present.
Part of #294, maybe.

rcaudy · 2021-08-04T18:23:43Z

I'm moving the requirements for this issue a bit. We've got good support for partitioning columns driven by by metadata files, but we're lacking support for predicate pushdown to the row group level or below. See #968 for future work.

rcaudy added feature request New feature or request core Core development tasks parquet Related to the Parquet integration labels Jul 19, 2021

rcaudy added this to the July 2021 milestone Jul 19, 2021

rcaudy self-assigned this Jul 19, 2021

rcaudy mentioned this issue Jul 22, 2021

EPIC: Parquet Support #294

Closed

rcaudy changed the title ~~Support range indices derived from parquet metadata files~~ Parquet: Support identifying hierarchical partitioning and schema from Spark/Dask metadata files Aug 4, 2021

rcaudy mentioned this issue Aug 6, 2021

Support multiple row groups in Parquet, clean up regioned column sources, and fix dictionary writing/symbol tables #954

Merged

rcaudy closed this as completed Aug 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parquet: Support identifying hierarchical partitioning and schema from Spark/Dask metadata files #868

Parquet: Support identifying hierarchical partitioning and schema from Spark/Dask metadata files #868

rcaudy commented Jul 19, 2021

rcaudy commented Aug 4, 2021

Parquet: Support identifying hierarchical partitioning and schema from Spark/Dask metadata files #868

Parquet: Support identifying hierarchical partitioning and schema from Spark/Dask metadata files #868

Comments

rcaudy commented Jul 19, 2021

rcaudy commented Aug 4, 2021