Fix reading parquet column with unused dictionary #15942

raunaqmorarka · 2023-02-02T11:55:47Z

Description

Fix reading parquet column with unused dictionary

A parquet file produced by Impala was found to have an empty dictionary
which is not used in the encoding of data pages in the column.
For such a case we cannot rely on ColumnChunkMetaData#hasDictionaryPage
as that checks for whether the data pages are also encoded using the dictionary.
This change removes usage of hasDictionaryPage to fix query failures with such files.

Additional context and related issues

E.g. error stacktrace

This is a regression from changes in PR #15374 in release 405.

File is produced by Impala and contains an empty unused dictionary in op_umpn and op_tel columns
pages result.txt
footer_result.txt

Release notes

( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text:

# Hive, Hudi, Iceberg, Delta
* Fix query failure when reading parquet files written by Impala. ({issue}`15942`)

A parquet file produced by Impala was found to have an empty dictionary which is not used in the encoding of data pages in the column. For such a case we cannot rely on ColumnChunkMetaData#hasDictionaryPage as that checks for whether the data pages are also encoded using the dictionary. This change removes usage of hasDictionaryPage to fix query failures with such files.

lib/trino-parquet/src/main/java/io/trino/parquet/DataPage.java

lukasz-stec

lgtm

raunaqmorarka added 3 commits February 2, 2023 16:59

Remove unnecessary ColumnChunkDescriptor class

b69c787

Convert DataPage into a sealed class

a68b2b4

cla-bot bot added the cla-signed label Feb 2, 2023

raunaqmorarka requested review from lukasz-stec, sopel39 and skrzypo987 February 2, 2023 11:56

raunaqmorarka added the bug Something isn't working label Feb 2, 2023

skrzypo987 approved these changes Feb 2, 2023

View reviewed changes

lib/trino-parquet/src/main/java/io/trino/parquet/DataPage.java Show resolved Hide resolved

lukasz-stec approved these changes Feb 2, 2023

View reviewed changes

sopel39 approved these changes Feb 2, 2023

View reviewed changes

github-actions bot added the tests:hive label Feb 2, 2023

raunaqmorarka merged commit 51dc650 into trinodb:master Feb 2, 2023

raunaqmorarka deleted the fix-page-reader branch February 2, 2023 19:51

raunaqmorarka mentioned this pull request Feb 2, 2023

Release notes for 407 #15854

Closed

github-actions bot added this to the 407 milestone Feb 3, 2023

colebow mentioned this pull request Feb 10, 2023

Add Trino 407 release notes #15919

Merged

raunaqmorarka mentioned this pull request Mar 28, 2023

'Failed to read Parquet file' found while querying parquet files #16758

Closed

Yiutto mentioned this pull request Aug 7, 2023

io.trino.spi.TrinoException: Failed to read Parquet file aakashnand/trino-ranger-demo#10

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix reading parquet column with unused dictionary #15942

Fix reading parquet column with unused dictionary #15942

raunaqmorarka commented Feb 2, 2023 •

edited

Loading

lukasz-stec left a comment

Fix reading parquet column with unused dictionary #15942

Fix reading parquet column with unused dictionary #15942

Conversation

raunaqmorarka commented Feb 2, 2023 • edited Loading

Description

Additional context and related issues

Release notes

lukasz-stec left a comment

Choose a reason for hiding this comment

raunaqmorarka commented Feb 2, 2023 •

edited

Loading