Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(python): add DeltaTable.is_deltatable static method (#2662) #2715

Merged
merged 2 commits into from
Jul 31, 2024

Conversation

omkar-foss
Copy link
Contributor

@omkar-foss omkar-foss commented Jul 29, 2024

closes #2662.

Description

This adds a static method is_deltatable(path, opts) to the
DeltaTable Python class, which returns True if able to
locate a delta table at specified path and False otherwise.

It does so by reusing the Rust internal is_delta_table_location()
via the DeltaTableBuilder.

Additionally, this also adds documentation of the usage with
examples for the DeltaTable.is_deltatable() method.

@github-actions github-actions bot added binding/python Issues for the Python package documentation Improvements or additions to documentation labels Jul 29, 2024
@ion-elgreco
Copy link
Collaborator

ion-elgreco commented Jul 29, 2024

@omkar-foss thanks for the PR, I do believe we should do a bit differently. Now we are loading the state of the table, which could be potentially very large. We could also simply look if _delta_log exists or not, and whether that is then also not empty.

Or we do some other kind of lazy loading where we simply read the first log entry

@omkar-foss
Copy link
Contributor Author

omkar-foss commented Jul 29, 2024

@ion-elgreco Yes I completely agree. I wasn't aware that the DeltaTable() constructor loads the entire table state Delta table eagerly (thanks for letting know!). If that's the case then indeed, checking the _delta_log and it's emptiness will be a lighter and a constant-time operation than checking the table state which might grow large and expensive.

As a matter of fact, the IsDeltaTable in Delta Lake OSS (Spark version) also checks for _delta_log and it's emptiness (see here), so I suppose it should be perfectly appropriate to apply the same logic here as well. It'll provide consistency of functionality as well for Delta Lake users.

I'll update the PR accordingly, thanks for your quick review 😁👍🏽

@ion-elgreco
Copy link
Collaborator

@omkar-foss there might be some internal code on the rust side that does something like this already, I am not certain, but worth it to check

@rtyler rtyler marked this pull request as draft July 30, 2024 10:36
This adds a static method `is_deltatable(path, opts)` to the
`DeltaTable` Python class, which returns `True` if able to
locate a delta table at specified `path` and `False` otherwise.

It does so by reusing the Rust internal `is_delta_table_location()`
via the `DeltaTableBuilder`.

Additionally, this also adds documentation of the usage with
examples for the `DeltaTable.is_deltatable()` method.
@github-actions github-actions bot added the binding/rust Issues for the Rust crate label Jul 30, 2024
@omkar-foss
Copy link
Contributor Author

omkar-foss commented Jul 30, 2024

Hey @ion-elgreco, I've updated the PR as discussed.

I've attempted to reuse the Rust side internal is_delta_table_location() (code here) which checks for the _delta_log and returns a boolean result, and it's working well.

Kindly check out the PR when you get some time and let me know if any improvement suggestions or changes required. Thanks!

@omkar-foss omkar-foss marked this pull request as ready for review July 30, 2024 20:00
crates/core/src/table/builder.rs Outdated Show resolved Hide resolved
python/src/lib.rs Outdated Show resolved Hide resolved
@ion-elgreco
Copy link
Collaborator

@omkar-foss Nice work! Thanks for the contribution :)

@ion-elgreco ion-elgreco added this pull request to the merge queue Jul 31, 2024
Merged via the queue into delta-io:main with commit 6f6769d Jul 31, 2024
21 checks passed
@omkar-foss omkar-foss deleted the feat-is-deltatable branch July 31, 2024 20:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/python Issues for the Python package binding/rust Issues for the Rust crate documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Way to check if Delta table exists at specified path
2 participants