Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support creating tables with column comment in Delta Lake #12455

Merged
merged 1 commit into from
Jun 7, 2022

Conversation

ebyhr
Copy link
Member

@ebyhr ebyhr commented May 18, 2022

Description

Support creating tables with column comment in Delta Lake

Documentation

( ) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

( ) No release notes entries required.
(x) Release notes entries required with the following suggested text:

# Delta Lake
* Add support for column comments during table creation. ({issue}`12455`)

@cla-bot cla-bot bot added the cla-signed label May 18, 2022
@findepi
Copy link
Member

findepi commented May 18, 2022

in Delta Lake & Kudu

Please split this into two PRs.

@ebyhr ebyhr changed the title Support creating tables with column comment in Delta Lake & Kudu Support creating tables with column comment in Delta Lake May 18, 2022
@ebyhr ebyhr force-pushed the ebi/delta-column-comment branch 2 times, most recently from 12c206c to 495369a Compare May 19, 2022 01:20
@ebyhr
Copy link
Member Author

ebyhr commented May 19, 2022

CI hit #12471

@ebyhr ebyhr marked this pull request as ready for review May 19, 2022 02:54
@ebyhr ebyhr force-pushed the ebi/delta-column-comment branch from 495369a to 7847baa Compare May 24, 2022 00:54
@ebyhr
Copy link
Member Author

ebyhr commented May 24, 2022

Verified Spark compatibility locally.

trino> CREATE TABLE delta.default.test_comment (c1 int comment 'foo') WITH (location = 's3://presto-ci-test/test_comment');
CREATE TABLE

spark-sql> DESC default.test_comment;
c1	int	foo

@ebyhr ebyhr requested review from findinpath, findepi and hashhar May 24, 2022 01:07
private final Optional<String> comment;

@Deprecated
public DeltaLakeColumnHandle(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please remove the usage of this constructor from the current codebase of trino-delta-lake module? I still found 2 usages of it.

Copy link
Member Author

@ebyhr ebyhr May 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, there're a lot of usages more than 2. That is why I marked @Deprecated instead of migrating them. I can migrate all if it's not burden for reviewers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DLCH is used as key in maps, comment is part of its equality.
We unfortunately need to update all usages appropriately, otherwise we cannot reason about the correctness of the codebase.

I would very much prefer not adding comment here, to the table handle.
It bite us a few times in Iceberg, so i would prefer to see if we can transport the comment to ColumnMetadata somehow else.

@ebyhr ebyhr force-pushed the ebi/delta-column-comment branch from 7847baa to 18df43c Compare May 25, 2022 01:54
@ebyhr ebyhr force-pushed the ebi/delta-column-comment branch from 18df43c to 0072698 Compare May 26, 2022 05:08
@findepi
Copy link
Member

findepi commented May 26, 2022

@ebyhr please squash and ping me for a review.

@ebyhr ebyhr force-pushed the ebi/delta-column-comment branch from ee31f4e to 417d9cd Compare May 27, 2022 05:43
@ebyhr
Copy link
Member Author

ebyhr commented May 27, 2022

@findepi Squashed commits.

private final Optional<String> comment;

@Deprecated
public DeltaLakeColumnHandle(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DLCH is used as key in maps, comment is part of its equality.
We unfortunately need to update all usages appropriately, otherwise we cannot reason about the correctness of the codebase.

I would very much prefer not adding comment here, to the table handle.
It bite us a few times in Iceberg, so i would prefer to see if we can transport the comment to ColumnMetadata somehow else.


private static Optional<String> getComment(JsonNode node)
{
return Optional.ofNullable(node.get("metadata"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why node.get("metadata") is nullable?

@@ -1910,6 +1911,24 @@ public void testCreateTableWithTableComment()
assertUpdate("DROP TABLE " + tableName);
}

@Test
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Deny creating tables with column comment if unsupported" looks solid & only slightly related to the Delta.
Let's separate PR for this.

@ebyhr ebyhr force-pushed the ebi/delta-column-comment branch 2 times, most recently from 7d63473 to 183d16e Compare May 31, 2022 05:26
@ebyhr
Copy link
Member Author

ebyhr commented May 31, 2022

@findepi Updated not to touch DeltaLakeColumnHandle.

@@ -385,7 +386,7 @@ public ConnectorTableMetadata getTableMetadata(ConnectorSession session, Connect
DeltaLakeTableHandle tableHandle = (DeltaLakeTableHandle) table;
String location = metastore.getTableLocation(tableHandle.getSchemaTableName(), session);
List<ColumnMetadata> columns = getColumns(tableHandle.getMetadataEntry()).stream()
.map(DeltaLakeMetadata::getColumnMetadata)
.map(column -> getColumnMetadata(column, getColumnComments(tableHandle.getMetadataEntry())))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getColumnComments(tableHandle.getMetadataEntry()) is performed once for every column, should be only once

@@ -494,7 +496,7 @@ public Stream<TableColumnsMetadata> streamTableColumns(ConnectorSession session,
// intentionally skip case when table snapshot is present but it lacks metadata portion
return metastore.getMetadata(metastore.getSnapshot(table, session), session).stream().map(metadata -> {
List<ColumnMetadata> columnMetadata = getColumns(metadata).stream()
.map(DeltaLakeMetadata::getColumnMetadata)
.map(column -> getColumnMetadata(column, getColumnComments(metadata)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getColumnComments(tableHandle.getMetadataEntry()) is performed once for every column, should be only once

.build();
}

public static Map<String, Optional<String>> getColumnComments(MetadataEntry metadataEntry)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a difference between missing entry and entry being Optional.empty()?

maybe we just have a map of non-null comments, passed around as Map<String,String> ?

we would lose ability to validate map contains entries for the columns, but you don't do this anyway (by means of columnComments.getOrDefault...)

.orElseThrow(() -> new IllegalStateException("Serialized schema not found in transaction log for " + metadataEntry.getName()));
}

private static Map<String, Optional<String>> getColumnComment(String json)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getColumnComment -> getColumnComments (plural)

}
}

private static Map.Entry<String, Optional<String>> columnComment(JsonNode node)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inline

@ebyhr ebyhr force-pushed the ebi/delta-column-comment branch 2 times, most recently from a11f6d9 to dcf4f0d Compare June 1, 2022 03:05
@ebyhr
Copy link
Member Author

ebyhr commented Jun 1, 2022

@findepi Addressed comments.

@@ -204,4 +204,32 @@ private static String getTableCommentOnDelta(String schemaName, String tableName
.map(row -> row.get(1))
.findFirst().orElseThrow();
}

@Test(groups = {DELTA_LAKE_DATABRICKS, PROFILE_SPECIFIC_TESTS})
public void testCreateTableWithColumnComment()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a test that goes the other way, creating the comment in Delta and reading from Trino?

@@ -963,6 +969,7 @@ public void addColumn(ConnectorSession session, ConnectorTableHandle tableHandle
handle.getMetadataEntry().getId(),
columnsBuilder.build(),
partitionColumns,
ImmutableMap.of(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test that creates a table with comments and then adds a column? Make sure this doesn't wipe the existing comments

@@ -887,6 +892,7 @@ public Optional<ConnectorOutputMetadata> finishCreateTable(
randomUUID().toString(),
handle.getInputColumns(),
handle.getPartitionedBy(),
ImmutableMap.of(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this match what we do in createTable?

Copy link
Member Author

@ebyhr ebyhr Jun 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's not needed as column comment is unsupported in CTAS syntax level.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, you're totally right. Thanks

@ebyhr ebyhr force-pushed the ebi/delta-column-comment branch from dcf4f0d to 04d0d3f Compare June 2, 2022 01:32
Copy link
Member

@alexjo2144 alexjo2144 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to support alter table add column with a comment here, or create a separate PR?

@ebyhr
Copy link
Member Author

ebyhr commented Jun 2, 2022

I want to separate PR for ADD COLUMN with a comment.

@ebyhr ebyhr requested a review from findepi June 3, 2022 00:44
@ebyhr ebyhr force-pushed the ebi/delta-column-comment branch from 04d0d3f to 5442cfa Compare June 7, 2022 01:23
@ebyhr
Copy link
Member Author

ebyhr commented Jun 7, 2022

Rebased on upstream to resolve conflicts.

@ebyhr ebyhr merged commit fff9862 into master Jun 7, 2022
@ebyhr ebyhr deleted the ebi/delta-column-comment branch June 7, 2022 03:34
@ebyhr ebyhr mentioned this pull request Jun 7, 2022
@github-actions github-actions bot added this to the 385 milestone Jun 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants