Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support auto-compaction for Delta tables on [databricks] #7889

Merged
merged 18 commits into from
Mar 27, 2023

Conversation

mythrocks
Copy link
Collaborator

@mythrocks mythrocks commented Mar 14, 2023

This change adds support for auto-compaction of Delta tables on Databricks (11.3 and 10.4), via the spark-rapids plugin.

This change is based on the work in the Delta IO project, specifically in PR#1156.

There are some deviations in the behaviour of auto-compaction on partitioned tables in this implementation, as compared to Databricks (11.3 at least). Specifically, it appears that Databricks does not seem to apply auto-compaction across all partitions uniformly. (It might have to do with the spark.databricks.delta.autoCompact.target setting. This is still being investigated.)

Note that auto-compaction is also only supported when writing to delta-directories. E.g.

df.write.format("delta").mode("append").save("/path/to/delta/table")

Auto compaction does not trigger via SQL writes. (The spark-plugin support for Delta Tables does not extend to replacing the org.apache.spark.sql.execution.datasources.v2.AppendDataExecV1 exec.)

@mythrocks mythrocks self-assigned this Mar 14, 2023
@mythrocks mythrocks changed the title [WIP] Support auto-compaction for Delta tables on Databricks [WIP] Support auto-compaction for Delta tables on [databricks] Mar 14, 2023
Signed-off-by: MithunR <mythrocks@gmail.com>
Signed-off-by: MithunR <mythrocks@gmail.com>
@mythrocks mythrocks changed the title [WIP] Support auto-compaction for Delta tables on [databricks] Support auto-compaction for Delta tables on [databricks] Mar 15, 2023
@mythrocks mythrocks added feature request New feature or request P0 Must have for release labels Mar 15, 2023
@mythrocks
Copy link
Collaborator Author

Build

@mythrocks
Copy link
Collaborator Author

Build

This test is not valid, now that auto-compaction works.
@mythrocks
Copy link
Collaborator Author

Build

Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs/additional-functionality/delta-lake-support.md should be updated as part of this PR.

1. Attempted using databricks-specific post-commit hooks for auto compaction. Failed.
2. Changed support class names to use "Gpu" prefix.
3. Added/fixed code comments.
Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation in docs/additional-functionality/delta-lake-support.md still has not been updated.

@mythrocks
Copy link
Collaborator Author

The documentation in docs/additional-functionality/delta-lake-support.md still has not been updated.

Right. I'm still working on that. Unexpected issue on 10.4 that I'm trying to resolve first.

@mythrocks
Copy link
Collaborator Author

The documentation in docs/additional-functionality/delta-lake-support.md still has not been updated.

I've updated this now.

@mythrocks
Copy link
Collaborator Author

Build

@mythrocks mythrocks requested a review from jlowe March 23, 2023 21:40
@mythrocks
Copy link
Collaborator Author

(Fixed the discrepancy in documentation. Resolved conflicts.)

@mythrocks
Copy link
Collaborator Author

Transient failures in the Markdown links check:

[✖] https://github.com/openucx/ucx/releases/download/v1.14.0/ucx-v1.14.0-centos8-mofed5-cuda11.tar.bz2 → Status: 404

@mythrocks
Copy link
Collaborator Author

Build

@mythrocks
Copy link
Collaborator Author

Build

@mythrocks mythrocks merged commit 0c27dc9 into NVIDIA:branch-23.04 Mar 27, 2023
@mythrocks
Copy link
Collaborator Author

I've merged this. Thank you for the reviews and guidance, @jlowe and @andygrove!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request P0 Must have for release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants