-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support SHALLOW CLONE of Iceberg Tables #1522
Conversation
// The source relation can be an Iceberg table in form of `catalog.db.table` so we visit | ||
// a multipart identifier instead of TableIdentifier (which does not support 3L namespace) | ||
// in Spark 3.3. | ||
val sourceRelation = new UnresolvedRelation(visitMultipartIdentifier(ctx.source)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this something that can be fixed once Delta upgrades to Spark 3.4? If yes, add a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will add
// the existing files and the newly added files. | ||
val cloneSourceTable = sourceTbl match { | ||
case source: CloneIcebergSource => | ||
// Reuse the existing schema so that the physical name of columns are consistent between |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry this is not clear. Why do we have to use the schema of the existing table when replacing it? Isn't shallow clone just referencing the existing data files in iceberg table? What does the "existing files" mean here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but if you are REPLACE on an existing Delta table and since Iceberg use column mapping, we have to make sure the column mapping metadata match during conversion.
Overview
As a followup to the SHALLOW CLONE support for Delta Lake, it would be great if we could enable SHALLOW CLONE on an Iceberg table as well. This will be a CLONVERT (CLONE + CONVERT) operation, in which we will create a Delta catalog table with files pointing to the original Iceberg table in one transaction.
Motivation
Further details
Similar to SHALLOW CLONE, it will work as follows:
How was this patch tested?
New unit tests.
Does this PR introduce any user-facing changes?
No.