Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change Iceberg to use the new ParquetFileWriter #4424

Conversation

djsstarburst
Copy link
Member

@djsstarburst djsstarburst commented Jul 11, 2020

This commit changes the Iceberg connector to use the new ParquetFileWriter, and fixes bugs in handling of decimal and timestamp columns for both Orc and Parquet.

This commit removes DomainConverter and the test TestDomainConverter, superseded by fixes and conversions in ExpressionConverter, MessageTypeConverter and new class TimestampValueWriter.

This commit adds new TestIcebergSmoke tests for decimal and timestamp, and fixes the existing tests. With this commit, all TestIcebergSmoke tests pass.

#1324

@cla-bot cla-bot bot added the cla-signed label Jul 11, 2020
@djsstarburst djsstarburst requested review from electrum and qqibrow July 11, 2020 04:44
@findepi
Copy link
Member

findepi commented Jul 11, 2020

The second renames some test databases in Hive and Iceberg tests to ensure the databases can't collide even if the tests are run in parallel.

Do you have an example collision in mind?

@djsstarburst
Copy link
Member Author

Do you have an example collision in mind?

With the code in my PR, I saw pretty frequent failures in TestHiveCreateExternalTable accessing table "test_create_external" in ci automated testing. I was unable to reproduce the problem on my laptop. The failures in ci automated testing haven't happened with the tables renamed, and the renaming seems entirely harmless to me.

It's not a very big statistical universe, and perhaps what I observed was just chance and the failures will resurface. @electrum also wondered how a naming conflict could have caused the test to fail.

@@ -54,22 +54,22 @@ public void testCreateExternalTableWithData()
File tableLocation = new File(tempDir, "data");

@Language("SQL") String createTableSql = format("" +
"CREATE TABLE test_create_external " +
"CREATE TABLE test_create_external_with_data " +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename test databases to prevent name collisions

-> Rename test tables ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doh, good point - - It was the tables that were renamed.

I just rebased this PR on tip master, since there were conflicts and removed the commit that renamed the tables. If the Hive test failures recur, I'll submit a separate PR with the renames.

@djsstarburst djsstarburst force-pushed the david.stryker/iceberg-new-parquet-writer-redux branch 2 times, most recently from 5a46032 to a3cfb02 Compare July 19, 2020 17:31
@djsstarburst djsstarburst force-pushed the david.stryker/iceberg-new-parquet-writer-redux branch from a3cfb02 to afe026c Compare July 20, 2020 22:41
@djsstarburst
Copy link
Member Author

djsstarburst commented Jul 20, 2020

Thanks very much for the detailed review, @electrum! I just force-pushed an update to the PR that acts on all your comments.

@djsstarburst djsstarburst force-pushed the david.stryker/iceberg-new-parquet-writer-redux branch from afe026c to 924874c Compare July 21, 2020 01:58
@djsstarburst
Copy link
Member Author

I force-pushed fixes for your remaining comments @electrum

@djsstarburst djsstarburst force-pushed the david.stryker/iceberg-new-parquet-writer-redux branch from 924874c to 9d8cd35 Compare July 21, 2020 04:06
@djsstarburst djsstarburst force-pushed the david.stryker/iceberg-new-parquet-writer-redux branch 2 times, most recently from 4349e93 to d247c41 Compare July 21, 2020 20:55
Copy link
Member

@electrum electrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor comments, otherwise looks good

This commit changes the Iceberg connector to use the new
ParquetFileWriter, and fixes bugs in handling of decimal and
timestamp columns for both Orc and Parquet.

This commit removes DomainConverter and the test
TestDomainConverter, superseded by fixes and conversions
in ExpressionConverter, MessageTypeConverter and
new class TimestampValueWriter.

This commit adds new TestIcebergSmoke tests for decimal
and timestamp, and fixes the existing tests.  With this
commit, all TestIcebergSmoke tests pass.
@djsstarburst djsstarburst force-pushed the david.stryker/iceberg-new-parquet-writer-redux branch from d247c41 to 81db7fe Compare July 21, 2020 21:48
@electrum electrum merged commit 6398acc into trinodb:master Jul 22, 2020
@djsstarburst djsstarburst deleted the david.stryker/iceberg-new-parquet-writer-redux branch July 22, 2020 13:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants