Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cleanup_metadata doesn't remove .checkpoint.parquet files #1420

Closed
cmackenzie1 opened this issue Jun 1, 2023 · 2 comments · Fixed by #1421
Closed

cleanup_metadata doesn't remove .checkpoint.parquet files #1420

cmackenzie1 opened this issue Jun 1, 2023 · 2 comments · Fixed by #1421
Labels
bug Something isn't working

Comments

@cmackenzie1
Copy link
Contributor

Environment

Delta-rs version: v0.12.0

Binding: rust

Environment:

  • Cloud provider:
  • OS:
  • Other:

Bug

What happened:

Running the following snippet does not remove .checkpoint.parquet files.

let n = cleanup_expired_logs_for(
          table.version() + 1,
          table.object_store().as_ref(),
          Utc::now().timestamp_millis(),
      )
      .await
      .unwrap();

What you expected to happen:

.checkpoint.parquet files older than log_retention_timestamp should be removed.

How to reproduce it:

  1. Create table with checkpoint(s)
  2. Run cleanup_expired_logs_for setting log_retention_timestamp to Utc::now()
  3. Observe only .json files are removed

More details:

https://regex101.com/r/OmoZMz/1

@cmackenzie1 cmackenzie1 added the bug Something isn't working label Jun 1, 2023
@cmackenzie1
Copy link
Contributor Author

cmackenzie1 commented Jun 1, 2023

I think it is just missing the . at the end of the expression so it should be r#"_delta_log/(\d{20})\.(json|checkpoint).*$"#

Regex::new(r#"_delta_log/(\d{20})\.(json|checkpoint)*$"#).unwrap();

diff --git a/rust/src/checkpoints.rs b/rust/src/checkpoints.rs
index 98a4e79..41385bb 100644
--- a/rust/src/checkpoints.rs
+++ b/rust/src/checkpoints.rs
@@ -205,7 +205,7 @@ pub async fn cleanup_expired_logs_for(
 ) -> Result<i32, DeltaTableError> {
     lazy_static! {
         static ref DELTA_LOG_REGEX: Regex =
-            Regex::new(r#"_delta_log/(\d{20})\.(json|checkpoint)*$"#).unwrap();
+            Regex::new(r#"_delta_log/(\d{20})\.(json|checkpoint).*$"#).unwrap();
     }
 
     let mut deleted_log_num = 0;

@cmackenzie1
Copy link
Contributor Author

cmackenzie1 commented Jun 1, 2023

Also, should this be advancing by 1? Wouldn't that also remove the latest checkpoint ?

table.version() + 1,

Edit: Removing the latest checkpoint seems to be the case on the table I just tested with.

roeap pushed a commit to roeap/delta-rs that referenced this issue Jun 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant