Support adding columns in Delta Lake (ALTER TABLE ADD COLUMN) #12371

ebyhr · 2022-05-13T08:04:32Z

Description

Support adding columns in Delta Lake

Documentation

(x) No documentation is needed.

Release notes

(x) Release notes entries required with the following suggested text:

# Delta Lake
* Add support for adding columns. ({issue}`12371`)

homar · 2022-05-13T08:19:44Z

...rino-delta-lake/src/test/java/io/trino/plugin/deltalake/BaseDeltaLakeMinioConnectorTest.java

@@ -265,6 +265,35 @@ public void testCharVarcharComparison()
                .hasStackTraceContaining("Unsupported type: char(3)");
    }

+    @Test
+    public void testAddColumnToPartitionedTable()


Could you please add tests that will check:

if spark can read table with added columns

if optimize/vacuum works after adding columns ?

Added test about optimize & vacuum.

if spark can read table with added columns

Should I wait #11565?

Verifying, even locally if that works would be great imho.
Regarding test - depends how long you would have to wait.
@findinpath what is ETA for #11565 ?

#11565 is in a good shape now. I am assuming that the PR will be ready to be merged in the following days.

@ebyhr so my advice would be to wait for it and write a proper test

I tested on this product test PR #11565

@findinpath may want to disable the cache in https://github.com/trinodb/trino/pull/11565/files#diff-8ecb8972788b7b6221c837540b00b020c2404ef64548ae563e02a9b1a66c8d14

@ebyhr could you check that disabling the cache will actually help with this ?

would running REFRESH TABLE in Spark help?

@homar @findepi Disabling the cache didn't help. REFRESH TABLE logs the same error "ERROR DeltaLog: Change in the table id detected..." and subsequent SELECT don't show the error.

JFYI: The error comes from
https://github.com/delta-io/delta/blob/728bf902542077ce1c2e97ca67a53c53bb460c64/core/src/main/scala/org/apache/spark/sql/delta/SnapshotManagement.scala#L574-L575

It seems Spark reuse the same metaData.id when adding a new column.

trino> ALTER TABLE delta.default.test ADD COLUMN c4 int;

s3://presto-ci-test/test/_delta_log/00000000000000000004.json

{"commitInfo":{"version":4,"timestamp":1653357796162,"userId":"yuya.ebihara","userName":"yuya.ebihara","operation":"ALTER TABLE","operationParameters":{"queryId":"20220524_020316_00004_imnpb"},"clusterId":"trino-dev-ffffffff-ffff-ffff-ffff-ffffffffffff","readVersion":0,"isolationLevel":"WriteSerializable","blindAppend":true}} {"protocol":{"minReaderVersion":1,"minWriterVersion":2}} {"metaData":{"id":"24b68017-cb79-4f0b-8ee7-e72a496bbaf4","format":{"provider":"parquet","options":{}},"schemaString":"{\"fields\":[{\"metadata\":{},\"name\":\"c1\",\"nullable\":true,\"type\":\"integer\"},{\"metadata\":{},\"name\":\"c2\",\"nullable\":true,\"type\":\"integer\"},{\"metadata\":{},\"name\":\"c3\",\"nullable\":true,\"type\":\"integer\"},{\"metadata\":{},\"name\":\"c4\",\"nullable\":true,\"type\":\"integer\"}],\"type\":\"struct\"}","partitionColumns":[],"configuration":{},"createdTime":1653357796162}}

spark-sql> ALTER TABLE default.test ADD COLUMN (x5 int);

s3://presto-ci-test/test/_delta_log/00000000000000000005.json

{"commitInfo":{"timestamp":1653357938234,"operation":"ADD COLUMNS","operationParameters":{"columns":"[{\"column\":{\"name\":\"x5\",\"type\":\"integer\",\"nullable\":true,\"metadata\":{}}}]"},"readVersion":4,"isBlindAppend":true,"operationMetrics":{}}} {"metaData":{"id":"24b68017-cb79-4f0b-8ee7-e72a496bbaf4","format":{"provider":"parquet","options":{}},"schemaString":"{\"type\":\"struct\",\"fields\":[{\"name\":\"c1\",\"type\":\"integer\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c2\",\"type\":\"integer\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c3\",\"type\":\"integer\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c4\",\"type\":\"integer\",\"nullable\":true,\"metadata\":{}},{\"name\":\"x5\",\"type\":\"integer\",\"nullable\":true,\"metadata\":{}}]}","partitionColumns":[],"configuration":{},"createdTime":1653357796162}}

Confirmed setting DeltaLakeTableHandle.getMetadataEntry().getId() in MetadataEntry.id suppresses the Spark error message.

plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java

alexjo2144

One important fix but the rest looks good to me

plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java

findepi · 2022-05-17T08:53:34Z

plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java

+
+        try {
+            FileSystem fileSystem = hdfsEnvironment.getFileSystem(new HdfsContext(session), new Path(handle.getLocation()));
+            long commitVersion = getMandatoryCurrentVersion(fileSystem, new Path(handle.getLocation())) + 1;


Can two concurrent transactions both add a new column with same name?

The first entry will be silently overridden by another transaction as far as I confirmed. I mean, only one json log file will be generated after those two transactions.

The transaction log synchronizer should prevent a new json file from overwriting another

Ah, I was testing with local disk based query runner. Verified another transaction fails on S3.

The transaction log synchronizer should prevent a new json file from overwriting another

Yes.
But we're using getMandatoryCurrentVersion which checks, well, current transaction, without checking whether some other thread did add another column with same name.

Am i missing something?

You're totally right. Instead of calling getMandatoryCurrentVersion we should be committing to tableHandle.getReadVersion() + 1

@alexjo2144 yeah, thanks for confirming.

Thanks for your help. Changed to long commitVersion = handle.getReadVersion() + 1;

plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java

ebyhr · 2022-05-24T05:34:17Z

@findepi @homar @findinpath @alexjo2144 Addressed comments.

plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java

mosabua · 2022-05-24T22:42:18Z

I guess rename and drop column are coming later.. we should update docs for this @colebow .. currently the docs say that ALTER TABLE overall is supported.. which seem to be not true actually .. wdyt @ebyhr ?

ebyhr · 2022-05-25T00:09:57Z

@mosabua I agree with updating the docs.

colebow · 2022-05-25T16:55:27Z

I guess rename and drop column are coming later.. we should update docs for this @colebow .. currently the docs say that ALTER TABLE overall is supported.. which seem to be not true actually .. wdyt @ebyhr ?

From looking into the code, it looks like this isn't an issue? I didn't spend a ton of time looking and don't entirely know how everything is working, but from what I can tell, every statement listed in docs is at least being tested as part of the delta lake connector.

mosabua · 2022-05-26T03:23:24Z

Maybe that means "ADD COLUMN" was the last one missing..

Dearkano · 2022-05-27T01:32:45Z

@mosabua I think RENAME COLUMN is not supported yet...at least in 370, when I rename the column, the query from spark is successful, but Trino returns NULL on that new column.

mosabua · 2022-05-27T04:50:01Z

we need to know where we are at with the current 381 release and @ebyhr can confirm in the code for us

ebyhr · 2022-05-27T08:10:00Z

Trino returns NULL on that new column

@Dearkano It looks a bug in Delta Lake connector. Could you file an issue to https://github.com/trinodb/trino/issues/new?

cla-bot bot added the cla-signed label May 13, 2022

ebyhr marked this pull request as ready for review May 13, 2022 08:04

findepi requested review from homar and alexjo2144 May 13, 2022 08:09

homar reviewed May 13, 2022

View reviewed changes

findinpath reviewed May 13, 2022

View reviewed changes

plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java Show resolved Hide resolved

alexjo2144 suggested changes May 13, 2022

View reviewed changes

plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java Outdated Show resolved Hide resolved

alexjo2144 approved these changes May 16, 2022

View reviewed changes

findepi requested a review from homar May 16, 2022 13:26

homar approved these changes May 16, 2022

View reviewed changes

ebyhr mentioned this pull request May 17, 2022

Support table and column comments in Delta Lake #12424

Closed

3 tasks

findepi reviewed May 17, 2022

View reviewed changes

findepi changed the title ~~Support adding columns in Delta Lake~~ Support adding columns in Delta Lake (ALTER TABLE ADD COLUMN) May 19, 2022

ebyhr force-pushed the ebi/delta-add-column branch from 128fbfb to bc515f0 Compare May 24, 2022 00:36

ebyhr commented May 24, 2022

View reviewed changes

plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java Outdated Show resolved Hide resolved

ebyhr force-pushed the ebi/delta-add-column branch 2 times, most recently from 8e6a944 to 7cf4f6b Compare May 24, 2022 03:34

findepi approved these changes May 24, 2022

View reviewed changes

plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java Outdated Show resolved Hide resolved

Support adding columns in Delta Lake

a4fa86f

ebyhr force-pushed the ebi/delta-add-column branch from 7cf4f6b to a4fa86f Compare May 24, 2022 08:59

findepi approved these changes May 24, 2022

View reviewed changes

ebyhr merged commit 0da6e01 into trinodb:master May 24, 2022

ebyhr deleted the ebi/delta-add-column branch May 24, 2022 10:16

ebyhr mentioned this pull request May 24, 2022

Release notes for 382 #12425

Closed

github-actions bot added this to the 382 milestone May 24, 2022

mosabua mentioned this pull request May 24, 2022

Add Trino 382 release notes #12440

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support adding columns in Delta Lake (ALTER TABLE ADD COLUMN) #12371

Support adding columns in Delta Lake (ALTER TABLE ADD COLUMN) #12371

ebyhr commented May 13, 2022 •

edited

Loading

homar May 13, 2022

ebyhr May 13, 2022 •

edited

Loading

homar May 13, 2022

findinpath May 13, 2022

homar May 13, 2022

ebyhr May 23, 2022

homar May 23, 2022

findepi May 23, 2022

ebyhr May 24, 2022

ebyhr May 24, 2022 •

edited

Loading

alexjo2144 left a comment

findepi May 17, 2022

ebyhr May 18, 2022

alexjo2144 May 18, 2022

ebyhr May 19, 2022 •

edited

Loading

findepi May 19, 2022

alexjo2144 May 23, 2022

findepi May 23, 2022

ebyhr May 24, 2022

ebyhr commented May 24, 2022

mosabua commented May 24, 2022 •

edited

Loading

ebyhr commented May 25, 2022

colebow commented May 25, 2022

mosabua commented May 26, 2022

Dearkano commented May 27, 2022

mosabua commented May 27, 2022

ebyhr commented May 27, 2022

Support adding columns in Delta Lake (ALTER TABLE ADD COLUMN) #12371

Support adding columns in Delta Lake (ALTER TABLE ADD COLUMN) #12371

Conversation

ebyhr commented May 13, 2022 • edited Loading

Description

Documentation

Release notes

Choose a reason for hiding this comment

ebyhr May 13, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ebyhr May 24, 2022 • edited Loading

Choose a reason for hiding this comment

alexjo2144 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ebyhr May 19, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ebyhr commented May 24, 2022

mosabua commented May 24, 2022 • edited Loading

ebyhr commented May 25, 2022

colebow commented May 25, 2022

mosabua commented May 26, 2022

Dearkano commented May 27, 2022

mosabua commented May 27, 2022

ebyhr commented May 27, 2022

ebyhr commented May 13, 2022 •

edited

Loading

ebyhr May 13, 2022 •

edited

Loading

ebyhr May 24, 2022 •

edited

Loading

ebyhr May 19, 2022 •

edited

Loading

mosabua commented May 24, 2022 •

edited

Loading