-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support converting Iceberg for CONVERT TO DELTA command in Delta Lake #1463
Support converting Iceberg for CONVERT TO DELTA command in Delta Lake #1463
Conversation
This section should also discuss the new maven artifact your PR is proposing, and how to include it in a spark session along with delta lake. Also - have you tested this on a real cluster (e.g. EMR)? Your Also - we should add an integration test for this. This PR is already large enough so I think we can add this in a future PR. |
Yes, I have tested this on a real cluster.
Yes, so basically this will allow the scala build to cross compile different iceberg versions against
Sure but imo the unit test is already a complete Spark environment with a standard Iceberg setup. |
build.sbt
Outdated
lazy val deltaIcebergCompat = (project in file("delta-iceberg-compat")) | ||
.dependsOn(core % "compile->compile;test->test;provided->provided") | ||
.settings ( | ||
name := "delta-iceberg-compat", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zsxwing do you think this is a good name?
compat or compatibility?
and should we have "spark" in the name?
Yes but this doesn't test the packaged JAR. We've caught many bugs (e.g. shading, dependencies, scala version issues, etc.) before with our integration tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on adding an integration test.
Also docs on what jars needed to be copied (end-2-end steps)
Otherwise LGTM.
core/src/main/scala/org/apache/spark/sql/delta/DeltaErrors.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/sql/delta/commands/ConvertToDeltaCommand.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/sql/delta/commands/ConvertToDeltaCommand.scala
Show resolved
Hide resolved
Are there any limitations on what types/features in Iceberg supported and not supported? |
There are a few remaining things we don't yet support: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
c204c3a
to
4f96c10
Compare
c70f1ae
to
b14bfa0
Compare
Description
Adding support for in-place converting Iceberg to Delta using the CONVERT TO DELTA command in Apache Spark. Specifically, this PR supports converting a Parquet-based Iceberg table inside a path/directory to Delta Lake format.
Here's an example flow:
s3://bucket/catalog/db/table
CREATE TABLE delta_table USING delta LOCATION 's3://bucket/catalog/db/table'
See more detail in this ticket: #1462.
How was this patch tested?
New unit tests.
We have tested Iceberg version from 0.13.1 to 1.0.0.
Does this PR introduce any user-facing changes?
It introduces a
iceberg-delta-compat
module that contains all the Iceberg + Spark dependencies, please include this module during Spark startup so CONVERT TO DELTA command could work.