Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add out_of_range flag to load_cdf #3040

Merged
merged 4 commits into from
Dec 7, 2024
Merged

Conversation

pblocz
Copy link
Contributor

@pblocz pblocz commented Dec 1, 2024

Description

This PR changes load_cdf so:

By default

  • load_cdf with starting_version out of range returns an InvalidVersion
  • load_cdf with a starting_timestamp out of range returns a ChangeDataTimestampGreaterThanCommit

And you can use .with_allow_out_of_range() in rust or allow_out_of_range=True in python to make it so that both cases return an empty dataframe.

This aligns it closer with how reading the change feed works in spark and makes the behaviour consistent.

This in contrast to the current behavior:

  • load_cdf with starting_version out of range returns an InvalidVersion
  • load_cdf with a starting_timestamp out of range returns an empty dataframe

Related Issue(s)

@github-actions github-actions bot added binding/python Issues for the Python package binding/rust Issues for the Rust crate labels Dec 1, 2024
Copy link

github-actions bot commented Dec 1, 2024

ACTION NEEDED

delta-rs follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

Copy link

codecov bot commented Dec 1, 2024

Codecov Report

Attention: Patch coverage is 81.60000% with 23 lines in your changes missing coverage. Please review.

Project coverage is 72.72%. Comparing base (ba671c9) to head (b96b93b).
Report is 8 commits behind head on main.

Files with missing lines Patch % Lines
crates/core/src/operations/load_cdf.rs 84.29% 3 Missing and 16 partials ⚠️
python/src/lib.rs 0.00% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3040      +/-   ##
==========================================
+ Coverage   72.68%   72.72%   +0.03%     
==========================================
  Files         129      129              
  Lines       41462    41578     +116     
  Branches    41462    41578     +116     
==========================================
+ Hits        30137    30237     +100     
  Misses       9376     9376              
- Partials     1949     1965      +16     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@ion-elgreco ion-elgreco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just couple comments

python/tests/test_cdf.py Outdated Show resolved Hide resolved
python/tests/test_cdf.py Outdated Show resolved Hide resolved
@ion-elgreco ion-elgreco changed the title add out_of_range flag to load_cdf feat: add out_of_range flag to load_cdf Dec 1, 2024
@rtyler rtyler added this to the v0.23 milestone Dec 1, 2024
Copy link
Collaborator

@hntd187 hntd187 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me now, just rerunning CI right now local concurrency test is notoriously flakey.

@pblocz
Copy link
Contributor Author

pblocz commented Dec 1, 2024

Awesome, seems that all the tests passed now

@hntd187
Copy link
Collaborator

hntd187 commented Dec 2, 2024

@pblocz if you could sign off on your commits we can get this merged soon

Signed-off-by: Pablo Cabeza <josepablocg@gmail.com>
Signed-off-by: Pablo Cabeza <josepablocg@gmail.com>
… clarity

Signed-off-by: Pablo Cabeza <josepablocg@gmail.com>
Signed-off-by: Pablo Cabeza <josepablocg@gmail.com>
@pblocz
Copy link
Contributor Author

pblocz commented Dec 2, 2024

@hntd187 Done. I don't know why, but thought the sign off was something else 😅

@ion-elgreco ion-elgreco added this pull request to the merge queue Dec 7, 2024
Merged via the queue into delta-io:main with commit 2501f2f Dec 7, 2024
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/python Issues for the Python package binding/rust Issues for the Rust crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

load_cdf return an empty dataframe when a version is out of range
4 participants