Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support reading time millis columns in optimized parquet #18535

Merged
merged 2 commits into from
Aug 4, 2023

Conversation

raunaqmorarka
Copy link
Member

@raunaqmorarka raunaqmorarka commented Aug 4, 2023

Description

As per parquet spec https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#time Time(MILLIS) stored as INT32 is a valid parquet encoding. Such a file can be produced by parquet-cpp-arrow and should be possible to read as time(6) from iceberg.

Additional context and related issues

Release notes

( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text:

# Iceberg
* Support reading parquet files with time stored in millisecond precision. ({issue}`18535`)

Comment on lines +331 to +332
// decoded values are millis, round to lower precision and convert to picos
// modulo PICOSECONDS_PER_DAY is applied for the case when a value is rounded up to PICOSECONDS_PER_DAY
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we have similar rounding for getTimeMicrosDecoder where trino type is lower than precision than 6?
(or assertion that type is 6 if only time(6) is supported)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm planning to tackle that separately by adding rounding on lower precision. In practise we're only supporting time(6) in iceberg right now, so it's not a problem yet. But from parquet reader perspective it is better to assume such a scenario might arise in future and support it correctly.

@raunaqmorarka raunaqmorarka merged commit 6d26a29 into trinodb:master Aug 4, 2023
@raunaqmorarka raunaqmorarka deleted the pqr-time-int32 branch August 4, 2023 15:55
@github-actions github-actions bot added this to the 423 milestone Aug 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

2 participants