Support SHALLOW CLONE of Iceberg Tables #1522

jackierwzhang · 2022-12-13T21:16:42Z

Overview

As a followup to the SHALLOW CLONE support for Delta Lake, it would be great if we could enable SHALLOW CLONE on an Iceberg table as well. This will be a CLONVERT (CLONE + CONVERT) operation, in which we will create a Delta catalog table with files pointing to the original Iceberg table in one transaction.

Motivation

It allows users to quickly experiment with Delta Lake without modifying the original Iceberg table's data.
It simplifies the user flow by combining a Delta catalog table creation with an Iceberg conversion.

Further details

Similar to SHALLOW CLONE, it will work as follows:

Clone a Iceberg catalog table (after the setup here)

CREATE TABLE [IF NOT EXISTS] delta SHALLOW CLONE iceberg.db.table [TBLPROPERTIES clause] [LOCATION path]

Clone a path-based Iceberg table

CREATE TABLE [IF NOT EXISTS] delta SHALLOW CLONE iceberg.`/path/to/iceberg/table`[TBLPROPERTIES clause] [LOCATION path]

How was this patch tested?

New unit tests.

Does this PR introduce any user-facing changes?

No.

vkorukanti · 2022-12-14T17:34:12Z

core/src/main/scala/io/delta/sql/parser/DeltaSqlParser.scala

+    // The source relation can be an Iceberg table in form of `catalog.db.table` so we visit
+    // a multipart identifier instead of TableIdentifier (which does not support 3L namespace)
+    // in Spark 3.3.
+    val sourceRelation = new UnresolvedRelation(visitMultipartIdentifier(ctx.source))


Is this something that can be fixed once Delta upgrades to Spark 3.4? If yes, add a comment.

vkorukanti · 2022-12-14T17:46:25Z

core/src/main/scala/org/apache/spark/sql/delta/DeltaAnalysis.scala

+        // the existing files and the newly added files.
+        val cloneSourceTable = sourceTbl match {
+          case source: CloneIcebergSource =>
+            // Reuse the existing schema so that the physical name of columns are consistent between


Sorry this is not clear. Why do we have to use the schema of the existing table when replacing it? Isn't shallow clone just referencing the existing data files in iceberg table? What does the "existing files" mean here?

Yeah, but if you are REPLACE on an existing Delta table and since Iceberg use column mapping, we have to make sure the column mapping metadata match during conversion.

jackierwzhang added 3 commits December 13, 2022 13:15

commit

1f01b8f

bring back test

e1fdbfb

fix

eace65c

vkorukanti approved these changes Dec 14, 2022

View reviewed changes

comments

681098f

allisonport-db closed this in ff805d0 Dec 16, 2022

allisonport-db mentioned this pull request May 24, 2023

[Feature Request] Support SHALLOW CLONVERT on Iceberg Tables #1516

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support SHALLOW CLONE of Iceberg Tables #1522

Support SHALLOW CLONE of Iceberg Tables #1522

jackierwzhang commented Dec 13, 2022

vkorukanti Dec 14, 2022

jackierwzhang Dec 14, 2022

vkorukanti Dec 14, 2022

jackierwzhang Dec 14, 2022

Support SHALLOW CLONE of Iceberg Tables #1522

Support SHALLOW CLONE of Iceberg Tables #1522

Conversation

jackierwzhang commented Dec 13, 2022

Overview

Motivation

Further details

How was this patch tested?

Does this PR introduce any user-facing changes?

vkorukanti Dec 14, 2022

Choose a reason for hiding this comment

jackierwzhang Dec 14, 2022

Choose a reason for hiding this comment

vkorukanti Dec 14, 2022

Choose a reason for hiding this comment

jackierwzhang Dec 14, 2022

Choose a reason for hiding this comment