Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix failure to read nullable time(6) columns #15606

Merged
merged 2 commits into from
Jul 17, 2023

Conversation

joejensen
Copy link
Contributor

@joejensen joejensen commented Jan 5, 2023

Description

Fixes #15603

Additional context and related issues

Fixes issue where ORC files containing nullable Iceberg TIME columns could throw an exception when being read. Also fixed an error in the ORC file validator where time columns did not validate correctly in order to add appropriate tests for the Time type.

Release notes

( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text:

# Iceberg
* Fix Iceberg query failure when reading ORC files with nullable `time` columns. ({issue}`15606`)

@cla-bot
Copy link

cla-bot bot commented Jan 5, 2023

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

Copy link
Member

@ebyhr ebyhr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a test in Iceberg connector as well?

@cla-bot
Copy link

cla-bot bot commented Jan 5, 2023

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@joejensen
Copy link
Contributor Author

Added relevant test in the Iceberg connector.

@joejensen
Copy link
Contributor Author

Rebased onto current master.

@ebyhr ebyhr requested a review from raunaqmorarka February 13, 2023 21:50
@github-actions
Copy link

github-actions bot commented Mar 7, 2023

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

@github-actions github-actions bot added the stale label Mar 7, 2023
@bitsondatadev
Copy link
Member

@raunaqmorarka are you able to review this?

@joejensen
Copy link
Contributor Author

Rebasing to resolve merge conflict in TestIcebergOrcConnectorTest.

Anything I can do to help get this merged?

@bitsondatadev
Copy link
Member

@joejensen looking into it now.

@github-actions github-actions bot added the iceberg Iceberg connector label Mar 21, 2023
@joejensen joejensen force-pushed the ORC-Time-Fix-15603 branch from ec7b0a9 to 86902f0 Compare March 21, 2023 20:59
@joejensen joejensen force-pushed the ORC-Time-Fix-15603 branch from dd91469 to 27e6748 Compare March 24, 2023 19:28
@ebyhr
Copy link
Member

ebyhr commented Jun 28, 2023

@joejensen Could you rebase on master to resolve conflicts?

@ebyhr ebyhr force-pushed the ORC-Time-Fix-15603 branch from 27e6748 to 45122a1 Compare June 30, 2023 07:13
@ebyhr
Copy link
Member

ebyhr commented Jun 30, 2023

Rebased on master to resolve conflicts and addressed comments except for #15606 (comment)

@trinodb trinodb deleted a comment from cla-bot bot Jun 30, 2023
Copy link
Member

@raunaqmorarka raunaqmorarka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This lgtm, but would be great to get another pair of eyes on it

@findepi
Copy link
Member

findepi commented Jul 13, 2023

Fixes #15603

@raunaqmorarka we should have connector-level test coverage for this.
Why TestIcebergMinioOrcConnectorTest.testDataMappingSmokeTest did not catch this for "time(6)" case?

@ebyhr ebyhr force-pushed the ORC-Time-Fix-15603 branch from d8d700d to f1341e3 Compare July 14, 2023 08:23
@raunaqmorarka
Copy link
Member

Fixes #15603

@raunaqmorarka we should have connector-level test coverage for this. Why TestIcebergMinioOrcConnectorTest.testDataMappingSmokeTest did not catch this for "time(6)" case?

That is because of a quirk in that test's data setup and the way orc reader works. There is only 1 null in the setup and that is in the first row. The orc reader reads in batches of 1, 2, 4 ... rows. So the first batch is all nulls, while the 2nd batch is all non-nulls. This bug requires a batch containing null as well as non-null values to cause failure.
I think improving that should be tackled separately from this PR, there is sufficient testing here to land this.

@ebyhr ebyhr merged commit 77dc3fd into trinodb:master Jul 17, 2023
@github-actions github-actions bot added this to the 423 milestone Jul 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed iceberg Iceberg connector
Development

Successfully merging this pull request may close these issues.

Iceberg Connector does not properly support nullable TIME columns using ORC
5 participants