-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support writing Parquet encoding stats #9569
Support writing Parquet encoding stats #9569
Conversation
b4478e9
to
fd0c063
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. is it testable?
lib/trino-parquet/src/main/java/io/trino/parquet/writer/PrimitiveColumnWriter.java
Outdated
Show resolved
Hide resolved
lib/trino-parquet/src/main/java/io/trino/parquet/writer/PrimitiveColumnWriter.java
Outdated
Show resolved
Hide resolved
@@ -74,6 +78,8 @@ | |||
|
|||
// column meta data stats | |||
private final Set<Encoding> encodings; | |||
private final Map<org.apache.parquet.format.Encoding, Integer> dataPagesWithEncoding; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how small can the smallest page be? can we overflow here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The field this in PageEncodingStats
this gets used for is also an int, so it should be fine.
lib/trino-parquet/src/main/java/io/trino/parquet/writer/PrimitiveColumnWriter.java
Outdated
Show resolved
Hide resolved
@@ -178,6 +186,14 @@ private ColumnMetaData getColumnMetaData() | |||
totalCompressedSize, | |||
-1); | |||
columnMetaData.setStatistics(ParquetMetadataConverter.toParquetStatistics(columnStatistics)); | |||
ImmutableList.Builder<PageEncodingStats> pageEncodingStats = ImmutableList.builder(); | |||
dataPagesWithEncoding.entrySet().stream() | |||
.map(encodingAndCount -> new PageEncodingStats(PageType.DATA_PAGE_V2, encodingAndCount.getKey(), encodingAndCount.getValue())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fd0c063
to
313923b
Compare
For testing, I can add a test with something like: And assert that |
better such test than nothing, imo |
Encoding stats are used by the reader to check if the dictionary pages can be used for predicate pushdown.
313923b
to
6b958f7
Compare
Added |
|
CI #9617 |
Encoding stats are used by the reader to check if the dictionary pages can be used for predicate pushdown.
Fixes #9554
TODO: Still need to add a test, but I did manually test that there's dictionary based pushdown happening now that didn't happen before.