Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DELTA_FILE_PATTERN regex is incorrectly matching tmp commit files #2201

Closed
echai58 opened this issue Feb 21, 2024 · 0 comments · Fixed by #2213
Closed

DELTA_FILE_PATTERN regex is incorrectly matching tmp commit files #2201

echai58 opened this issue Feb 21, 2024 · 0 comments · Fixed by #2213
Labels
bug Something isn't working

Comments

@echai58
Copy link

echai58 commented Feb 21, 2024

Environment

Delta-rs version: 0.15.3

Binding: python


Bug

What happened:

Because this regex https://github.com/delta-io/delta-rs/blob/main/crates/core/src/kernel/snapshot/log_segment.rs#L33 used here does not specify to match the entire string, a tmp commit file can match the regex, if it contains (some numbers).json.tmp.

This makes the history() list a tmp commit, which is incorrect. This happened when I was doing concurrent merges and it errored out (see: #2084 (comment)).

It seems like get_add_actions is robust to this regex, I'm not sure what checks it has to not include tmp commit files.

What you expected to happen:

history and anything else that relies on is_commit_file should not list tmp commits files.

How to reproduce it:
If you do concurrent merges / anything else that leads to tmp commit files, you can sometimes see files that look like (if the randomly generated id ends in an int): _delta_log/_commit_2132c4fe-4077-476c-b8f5-e77fea04f170.json.tmp, and this then gets listed in a history call.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant