Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Describe logRetentionDuration in PROTOCOL #888

Closed
wjones127 opened this issue Jan 10, 2022 · 3 comments
Closed

Describe logRetentionDuration in PROTOCOL #888

wjones127 opened this issue Jan 10, 2022 · 3 comments
Labels
acknowledged This issue has been read and acknowledged by Delta admins question Questions on how to use Delta Lake waiting for merge

Comments

@wjones127
Copy link
Contributor

If I understand right, delta log files older than logRetentionDuration may be deleted if they are no longer needed. This is documented at docs.delta.io, but is not mentioned in the PROTOCOL.md.

Which reader version does this behavior go back to? Version 1?

@scottsand-db scottsand-db added the acknowledged This issue has been read and acknowledged by Delta admins label Jan 12, 2022
@scottsand-db scottsand-db added the question Questions on how to use Delta Lake label Jan 26, 2022
@scottsand-db
Copy link
Collaborator

Hi @wjones127, thanks for your question.

Do you think this configuration needs to be documented in PROTOCOL.md? Wouldn't that then mean that all of our configurations should be documented in PROTOCOL.md too?

You can check out the reader version requirements section of the PROTOCOL.md which documents what features new reader versions introduce. Since logRetentionDuration isn't mentioned there for version 2, you can tell that this behaviour is in version 1.

@wjones127
Copy link
Contributor Author

@scottsand-db

Do you think this configuration needs to be documented in PROTOCOL.md? Wouldn't that then mean that all of our configurations should be documented in PROTOCOL.md too?

You're right that we don't need to document this setting in particular. But I think the behavior of deleting old log files should be documented. When I read the protocol I thought reading checkpoints was an optional optimization. But this setting means it's required for a reader, and that readers shouldn't consider delta tables missing old log corrupt or invalid.

@scottsand-db
Copy link
Collaborator

Would you like to submit a short PR adding this description?

jbguerraz pushed a commit to jbguerraz/delta that referenced this issue Jul 6, 2022
Existing writers may delete old JSON log entries if there are newer checkpoints.

Fixes delta-io#888.

Closes delta-io#913

Signed-off-by: Shixiong Zhu <zsxwing@gmail.com>
GitOrigin-RevId: 79cce715d78edb9aca33f2f8db7861e15634e812
jbguerraz pushed a commit to jbguerraz/delta that referenced this issue Jul 6, 2022
Existing writers may delete old JSON log entries if there are newer checkpoints.

Fixes delta-io#888.

Closes delta-io#913

Signed-off-by: Shixiong Zhu <zsxwing@gmail.com>
GitOrigin-RevId: 79cce715d78edb9aca33f2f8db7861e15634e812
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
acknowledged This issue has been read and acknowledged by Delta admins question Questions on how to use Delta Lake waiting for merge
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants