-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support auto-compaction for Delta tables on [databricks] #7889
Support auto-compaction for Delta tables on [databricks] #7889
Conversation
Still bug: Compaction is not routed through GpuParquetFileFormat.
Signed-off-by: MithunR <mythrocks@gmail.com>
38cca11
to
9923a5b
Compare
Signed-off-by: MithunR <mythrocks@gmail.com>
...rk321db/src/main/scala/com/databricks/sql/transaction/tahoe/rapids/GpuOptimizeExecutor.scala
Outdated
Show resolved
Hide resolved
Also, minor test refactor.
Build |
Build |
This test is not valid, now that auto-compaction works.
Build |
...rk321db/src/main/scala/com/databricks/sql/transaction/tahoe/rapids/GpuDoAutoCompaction.scala
Outdated
Show resolved
Hide resolved
...rk321db/src/main/scala/com/databricks/sql/transaction/tahoe/rapids/GpuOptimizeExecutor.scala
Outdated
Show resolved
Hide resolved
...db/src/main/scala/com/databricks/sql/transaction/tahoe/rapids/GpuOptimisticTransaction.scala
Outdated
Show resolved
Hide resolved
...rk330db/src/main/scala/com/databricks/sql/transaction/tahoe/rapids/GpuDoAutoCompaction.scala
Outdated
Show resolved
Hide resolved
...db/src/main/scala/com/databricks/sql/transaction/tahoe/rapids/GpuOptimisticTransaction.scala
Outdated
Show resolved
Hide resolved
...rk321db/src/main/scala/com/databricks/sql/transaction/tahoe/rapids/GpuDoAutoCompaction.scala
Show resolved
Hide resolved
...rk330db/src/main/scala/com/databricks/sql/transaction/tahoe/rapids/GpuOptimizeExecutor.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docs/additional-functionality/delta-lake-support.md should be updated as part of this PR.
1. Attempted using databricks-specific post-commit hooks for auto compaction. Failed. 2. Changed support class names to use "Gpu" prefix. 3. Added/fixed code comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation in docs/additional-functionality/delta-lake-support.md still has not been updated.
...rk321db/src/main/scala/com/databricks/sql/transaction/tahoe/rapids/GpuDoAutoCompaction.scala
Outdated
Show resolved
Hide resolved
...rk330db/src/main/scala/com/databricks/sql/transaction/tahoe/rapids/GpuDoAutoCompaction.scala
Outdated
Show resolved
Hide resolved
integration_tests/src/main/python/delta_lake_auto_compact_test.py
Outdated
Show resolved
Hide resolved
Right. I'm still working on that. Unexpected issue on 10.4 that I'm trying to resolve first. |
I've updated this now. |
Build |
(Fixed the discrepancy in documentation. Resolved conflicts.) |
Transient failures in the Markdown links check:
|
Build |
Build |
I've merged this. Thank you for the reviews and guidance, @jlowe and @andygrove! |
This change adds support for auto-compaction of Delta tables on Databricks (11.3 and 10.4), via the
spark-rapids
plugin.This change is based on the work in the Delta IO project, specifically in PR#1156.
There are some deviations in the behaviour of auto-compaction on partitioned tables in this implementation, as compared to Databricks (11.3 at least). Specifically, it appears that Databricks does not seem to apply auto-compaction across all partitions uniformly. (It might have to do with the
spark.databricks.delta.autoCompact.target
setting. This is still being investigated.)Note that auto-compaction is also only supported when writing to delta-directories. E.g.
Auto compaction does not trigger via SQL writes. (The
spark-plugin
support for Delta Tables does not extend to replacing theorg.apache.spark.sql.execution.datasources.v2.AppendDataExecV1
exec.)