Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose cleanup_metadata or create_checkpoint_from_table_uri_and_cleanup to the Python API #1768

Closed
alisheykhi opened this issue Oct 24, 2023 · 3 comments · Fixed by #1826
Closed
Assignees
Labels
enhancement New feature or request

Comments

@alisheykhi
Copy link

Description

When utilizing the table.create_checkpoint() method in Python, direct access to the cleanup_metadata() function is currently unavailable. This limitation results in an increased number of log entries. In Spark, this feature is integrated as a step at the end of the method (as evident in the Spark implementation). However, it's conspicuously absent from the Python API.

In the Rust implementation (as seen in the Rust implementation), there's a cleanup_metadata and also create_checkpoint_from_table_uri_and_cleanup method, but it's not called at the end of the create_checkpoint process.

There is another Pull Request (PR) that addresses the creation of a checkpoint both before and after the vacuum operation. However, the absence of the cleanup_metadata method during the checkpoint creation process results in the metadata directory remaining uncleared.

Use Case
Cleaning up the Metadata Directory After Checkpoint Creation or Optimization

Related Issue(s)
#1728

@alisheykhi alisheykhi added the enhancement New feature or request label Oct 24, 2023
@mattslack-db
Copy link

I agree that the cleanup_metadata method should be made available in Python, as it can be useful to reduce the file listing time as the number of files in _delta_log grows. However I would definitely not add it to the create_checkpoint function. You generally want to keep the metadata, so you have the ability to roll back to a previous version.

@r3stl355
Copy link
Contributor

r3stl355 commented Nov 8, 2023

take

@alisheykhi
Copy link
Author

thanks @r3stl355 , for more details and discussion on the root cause of the issue, please refer to this discussion thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants