Support changing column types in Iceberg #15651

ebyhr · 2023-01-10T08:26:53Z

Description

Support changing column types in Iceberg
Relates to #15515

Release notes

(x) Release notes are required, with the following suggested text:

# Iceberg
* Add support for changing column types. ({issue}`15515`)

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java

...roduct-tests/src/main/java/io/trino/tests/product/iceberg/TestIcebergSparkCompatibility.java

ebyhr · 2023-01-12T05:39:07Z

CI hit #15367

ebyhr · 2023-01-16T05:44:44Z

Rebased on upstream to resolve conflicts and added some test cases.

findinpath · 2023-01-16T06:30:11Z

plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java

+            assertThat(query("SELECT * FROM " + table.getName()))
+                    .matches("VALUES bigint '123'");
+            assertThat((String) computeScalar("SHOW CREATE TABLE " + table.getName()))
+                    .contains("partitioning = ARRAY['col']");


Can we check that the partition predicate is fully pushed down (isFullyPushedDown() assertion) on a partition query?

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java

ebyhr · 2023-01-17T03:39:09Z

@findepi Could you review this PR when you have time?

ebyhr · 2023-01-17T07:25:21Z

CI hit #13288. check-commits-dispatcher will be fixed in #15739

findepi · 2023-01-17T10:11:30Z

testing/trino-testing/src/main/java/io/trino/testing/BaseConnectorTest.java

@@ -2295,7 +2295,7 @@ public void testSetColumnTypeWithComment()
    {
        skipTestUnless(hasBehavior(SUPPORTS_SET_COLUMN_TYPE) && hasBehavior(SUPPORTS_CREATE_TABLE_WITH_COLUMN_COMMENT));

-        try (TestTable table = new TestTable(getQueryRunner()::execute, "test_set_column_type_comment_", "(col int COMMENT 'test')")) {
+        try (TestTable table = new TestTable(getQueryRunner()::execute, "test_set_column_type_comment_", "(col int COMMENT 'test comment')")) {


Was it a dead code until now?

Yes, it was a dead code because Postgres connector doesn't support creating a table with column comments.

findepi · 2023-01-17T10:13:31Z

testing/trino-testing/src/main/java/io/trino/testing/BaseConnectorTest.java

@@ -2269,6 +2271,7 @@ private List<SetColumnTypeSetup> setColumnTypeSetupData()
                .add(new SetColumnTypeSetup("char(100)", "'shorten-char'", "char(50)", "cast('shorten-char' as char(50))"))
                .add(new SetColumnTypeSetup("char(100)", "'char-to-varchar'", "varchar", "'char-to-varchar'"))
                .add(new SetColumnTypeSetup("varchar", "'varchar-to-char'", "char(100)", "cast('varchar-to-char' as char(100))"))
+                .add(new SetColumnTypeSetup("row(x int)", "row(1)", "row(y int)", "cast(row(1) as row(y int))"))


I am not sure this should be the expected behavior.
I would rather map row fields by name: the x field is gone, the y field is new, it should have a null value.

Changed to row(x int) -> row(x bigint) instead.

findepi · 2023-01-17T10:21:44Z

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java

+        Table icebergTable = catalog.loadTable(session, table.getSchemaTableName());
+        try {
+            icebergTable.updateSchema()
+                    .updateColumn(column.getName(), toIcebergType(type).asPrimitiveType())


What about allowing array(integer) to become array(bigint)
or row(...) to gain a new field?

We can't do that unless fixing Iceberg if my understanding is correct. Changing nested field types (e.g. row(x integer) -> row(x bigint)) in row columns is possible without changing Iceberg.

findepi · 2023-01-17T10:26:01Z

plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java

+        if ((setup.sourceColumnType().equals("bigint") && setup.newColumnType().equals("integer")) ||
+                (setup.sourceColumnType().equals("decimal(5,3)") && setup.newColumnType().equals("decimal(5,2)")) ||
+                (setup.sourceColumnType().equals("varchar") && setup.newColumnType().equals("char(100)")) ||
+                (setup.sourceColumnType().equals("row(x int)") && setup.newColumnType().equals("row(y int)"))) {


That's subjective, but i find the following somewhat easier to read

switch ("%s -> %s".formatted(setup.sourceColumnType(), setup.newColumnType())) { case "bigint -> integer": case "decimal(5,3) -> decimal(5,2)": case "varchar -> char(100)": case "row(x int) -> row(y int)": // Iceberg allows updating column types if the update is safe. Safe updates are: // - int to bigint // - float to double // - decimal(P,S) to decimal(P2,S) when P2 > P (scale cannot change) // https://iceberg.apache.org/docs/latest/spark-ddl/#alter-table--alter-column return Optional.of(setup.asUnsupported()); case "varchar(100) -> varchar(50)": // Iceberg connector ignores the varchar length return Optional.empty(); }

of course, this will become eve nicer when we can use record patterns in a switch (https://openjdk.org/jeps/405)

findepi · 2023-01-17T10:38:22Z

...roduct-tests/src/main/java/io/trino/tests/product/iceberg/TestIcebergSparkCompatibility.java

+        };
+
+        return Stream.of(StorageFormat.values())
+                .flatMap(storageFormat -> Arrays.stream(setColumnTypeData).map(data -> new Object[] {storageFormat, data[0], data[1], data[2], data[3]}))


Would DataProviders.cartesianProduct be applicable?

return cartesianProduct( Stream.of(StorageFormat.values()) .collect(toDataProvider()), new Object[][] { {"integer", "2147483647", "bigint", 2147483647L}, {"real", "10.3", "double", 10.3}, {"real", "'NaN'", "double", Double.NaN}, {"decimal(5,3)", "'12.345'", "decimal(10,3)", BigDecimal.valueOf(12.345)} });

ebyhr · 2023-01-18T08:41:35Z

Rebased on upstream to fix check-commits-dispatcher failures.

findepi · 2023-01-18T11:02:41Z

see failures

ebyhr · 2023-01-18T22:10:38Z

CI hit #13779 & #15313. Both failures are unrelated to Iceberg.

cla-bot bot added the cla-signed label Jan 10, 2023

ebyhr mentioned this pull request Jan 10, 2023

Implement ALTER COLUMN SET DATA TYPE statement in connectors #15515

Closed

10 tasks

ebyhr self-assigned this Jan 10, 2023

github-actions bot added the tests:hive label Jan 10, 2023

ebyhr force-pushed the ebi/iceberg-alter-column-set-data-type branch from 9979ddf to 3095a69 Compare January 10, 2023 10:15

ebyhr requested review from homar, findepi, findinpath and alexjo2144 January 10, 2023 10:16

alexjo2144 reviewed Jan 10, 2023

View reviewed changes

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java Show resolved Hide resolved

krvikash reviewed Jan 11, 2023

View reviewed changes

...roduct-tests/src/main/java/io/trino/tests/product/iceberg/TestIcebergSparkCompatibility.java Outdated Show resolved Hide resolved

ebyhr force-pushed the ebi/iceberg-alter-column-set-data-type branch 3 times, most recently from 7436d3a to 0925f61 Compare January 12, 2023 03:16

ebyhr force-pushed the ebi/iceberg-alter-column-set-data-type branch 2 times, most recently from 560fd7d to 3f8a2fc Compare January 16, 2023 05:42

findinpath approved these changes Jan 16, 2023

View reviewed changes

ebyhr force-pushed the ebi/iceberg-alter-column-set-data-type branch from 3f8a2fc to e24d829 Compare January 17, 2023 03:35

findepi approved these changes Jan 17, 2023

View reviewed changes

ebyhr force-pushed the ebi/iceberg-alter-column-set-data-type branch from b2e432a to 2682f3a Compare January 18, 2023 07:16

ebyhr added 5 commits January 18, 2023 17:41

Fix comment in testSetColumnTypeWithComment

c927be1

Allow specifying unsupported type in testSetColumnTypes

36cdbdb

Add test for changing array, row type, real type to double type

843c0f0

Support changing column types in Iceberg

b94e65a

Use cartesianProduct method in testSparkAlterColumnType

45a3ac6

ebyhr force-pushed the ebi/iceberg-alter-column-set-data-type branch from 2682f3a to 45a3ac6 Compare January 18, 2023 08:41

ebyhr merged commit 6a3d3f8 into trinodb:master Jan 18, 2023

ebyhr deleted the ebi/iceberg-alter-column-set-data-type branch January 18, 2023 22:30

ebyhr mentioned this pull request Jan 18, 2023

Release notes for 406 #15563

Closed

colebow added this to the 406 milestone Jan 18, 2023

colebow mentioned this pull request Jan 19, 2023

Add Trino 406 release notes #15625

Merged

findinpath mentioned this pull request Nov 17, 2023

Document schema evolution for Iceberg #19787

Merged

findinpath mentioned this pull request Jan 10, 2024

Add support for dropping NOT NULL constraint in Trino #20315

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support changing column types in Iceberg #15651

Support changing column types in Iceberg #15651

ebyhr commented Jan 10, 2023

ebyhr commented Jan 12, 2023

ebyhr commented Jan 16, 2023

findinpath Jan 16, 2023 •

edited

Loading

ebyhr commented Jan 17, 2023

ebyhr commented Jan 17, 2023 •

edited

Loading

findepi Jan 17, 2023

ebyhr Jan 18, 2023

findepi Jan 17, 2023

ebyhr Jan 18, 2023

findepi Jan 17, 2023

ebyhr Jan 18, 2023

findepi Jan 17, 2023

findepi Jan 17, 2023

findepi Jan 17, 2023

ebyhr commented Jan 18, 2023

findepi commented Jan 18, 2023

ebyhr commented Jan 18, 2023 •

edited

Loading

Support changing column types in Iceberg #15651

Support changing column types in Iceberg #15651

Conversation

ebyhr commented Jan 10, 2023

Description

Release notes

ebyhr commented Jan 12, 2023

ebyhr commented Jan 16, 2023

findinpath Jan 16, 2023 • edited Loading

Choose a reason for hiding this comment

ebyhr commented Jan 17, 2023

ebyhr commented Jan 17, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ebyhr commented Jan 18, 2023

findepi commented Jan 18, 2023

ebyhr commented Jan 18, 2023 • edited Loading

findinpath Jan 16, 2023 •

edited

Loading

ebyhr commented Jan 17, 2023 •

edited

Loading

ebyhr commented Jan 18, 2023 •

edited

Loading