Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regression : delta.logRetentionDuration don't seems to be respected #2447

Closed
djouallah opened this issue Apr 23, 2024 · 10 comments
Closed

regression : delta.logRetentionDuration don't seems to be respected #2447

djouallah opened this issue Apr 23, 2024 · 10 comments
Labels
bug Something isn't working

Comments

@djouallah
Copy link

djouallah commented Apr 23, 2024

Environment

Delta-rs version:
17.1

Binding:

Environment:
Python


Bug

What happened:

write_deltalake(delta_path, df,configuration = {"delta.logRetentionDuration": "interval 1 day"} ,mode="append",storage_options=storage_options)
dt = DeltaTable(delta_path,storage_options=storage_options)
dt.vacuum(retention_hours=0,dry_run=False,  enforce_retention_duration=False)
dt.create_checkpoint()
dt.cleanup_metadata()

don't seems to be working ?

@djouallah djouallah added the bug Something isn't working label Apr 23, 2024
@echai58
Copy link

echai58 commented Apr 26, 2024

I think the configuration key should be delta.logRetentionDuration.

@djouallah
Copy link
Author

same issue

@ion-elgreco
Copy link
Collaborator

I think the configuration key should be delta.logRetentionDuration.

Correct

@ion-elgreco ion-elgreco closed this as not planned Won't fix, can't repro, duplicate, stale Apr 26, 2024
@djouallah
Copy link
Author

@ion-elgreco delta.logRetentionDuration does not work either ?

@djouallah djouallah changed the title logRetentionDuration don't seems to be respected regression : delta.logRetentionDuration don't seems to be respected Apr 26, 2024
@ion-elgreco
Copy link
Collaborator

@djouallah it does work, you provided interval 1 day, so you can't expect the logs to be deleted immediately : P, change it to 1 seconds interval and you can see they get removed

import polars as pl
from deltalake import DeltaTable, write_deltalake

df = pl.DataFrame({"foo": [1]})
delta_path = "test_Table"

write_deltalake(
    delta_path,
    df.to_arrow(),
    configuration={"delta.logRetentionDuration": "interval 1 seconds"},
    mode="overwrite",
)
dt = DeltaTable(delta_path)
dt.vacuum(retention_hours=0, dry_run=False, enforce_retention_duration=False)
dt.create_checkpoint()
dt.cleanup_metadata()

@djouallah
Copy link
Author

no luck, I am writing to gcp fwiw
image

@ion-elgreco
Copy link
Collaborator

@djouallah please share the table configuration in the delta log

@djouallah
Copy link
Author

Metadata(id: 62adbf63-1e61-479e-8187-8fd7ef308b5c, name: None, description: None, partition_columns: [], created_time: 1713594980946, configuration: {})

@ion-elgreco
Copy link
Collaborator

Metadata(id: 62adbf63-1e61-479e-8187-8fd7ef308b5c, name: None, description: None, partition_columns: [], created_time: 1713594980946, configuration: {})

Yeah, you didn't pass a configuration during creating so it's using the default of 30 days.

@djouallah
Copy link
Author

ah, I see it has to be in the first time it was created, adding the option later using append or overwrite does not works, thanks !!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants