-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Are there plans to support delta reader/writer? #2858
Comments
I think it will be very unlikely. As far as I can see delta lake does not use the Arrow format, but requires spark . You can use to read the data in a pyarrow table which you then can convert to a polars dataframe. Seems like there are 2 python packages to do it (which seem to have the same name): https://databricks.com/blog/2020/12/22/natively-query-your-delta-lake-with-scala-java-and-python.html import polars as pl
# Import Delta Table
from deltalake import DeltaTable
# Read the Delta Table using the Rust API
dt = DeltaTable("../rust/tests/data/simple_table")
# Create a Polars Dataframe by initially converting the Delta Lake
# table into a PyArrow table
df = pl.DataFrame(dt.to_pyarrow_table()) https://pypi.org/project/delta-lake-reader/ import polars as pl
from deltalake import DeltaTable
# native file path. Can be relative or absolute
table_path = "somepath/mytable"
# Create a Polars Dataframe by initially converting the Delta Lake
# table into a PyArrow table.
df = pl.DataFrame(DeltaTable(table_path).to_table()) |
There were plans to do so. But |
cc @houqp |
There is ongoing work to migrate delta-rs to arrow2 and parquet2, see: delta-io/delta-rs#465. The current branch is mostly complete except map and list type suport. We also need to update to the latest arrow2/parquet2 version :D Once the port is completed, plugging it into polars should be pretty trivia. |
I'm also looking forward for the Delta Lake support! |
@ritchie46 - a new version of delta-rs was recently released with parquet2 support, see here. Thanks for adding this @houqp! Will you be able to add delta-rs support now? |
Also want to say this would be great. I bet your implementation will be close to fast as the photon compute engine Databricks charges way too much for. |
Is anyone willing to take on this work? There are a lot of delta-rs developers that are willing to help with code reviews and any issues you might come across. Feel free to ping me directly or here if you're interested. |
Integrate the delta-rs library for interacting with the Delta reader. pola-rs#2858
I didn't realize that my playpen pushes would link back here 😂 @MrPowers I am not really sure what my next steps are, let me know if you are still available for some guidance on how to implement this feature. |
I'm working in this feature, will raise a PR soon. |
@chitralverma - that's awesome. Let me know if you need any help. We can jump on a call with the core delta-rs devs anytime. Really excited about this feature. I'll blog / promote it as soon as it is live 🚀 |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
Update: The https://pola-rs.github.io/polars/py-polars/html/reference/io.html#delta-lake |
hi ! |
I was working on it, but the plan is to put it on python side. but then, for now it's blocked by delta-io/delta-rs#1024 delta-rs doesn't support |
Any update on this ? waiting for this feature |
the lazy/ eager reader is already in place. for the writer, #7574 is now open |
I'll close this in favor of #7574 , as it's more specific. read/scan functionality has been implemented, write functionality is being worked on. |
@stinodego do you know if there is any plan to support streaming? |
Yes, it's planned but won't be there anytime soon: |
Any integration with delta lake in the horizon by any chance?
https://delta.io/
There is a native delta lake implementation in Rust
https://github.com/delta-io/delta-rs/tree/main/rust
The text was updated successfully, but these errors were encountered: