-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support delta lake reader #221
Comments
The current py-polars implementation:deltalake.read_table() -> delta-rs has first class support from python. A potential r-polars pathway via rust api could be:Read with delta-rs (I'm not sure if this could work out of the box with any cloudstorage uri): https://docs.rs/deltalake/0.11.0/deltalake/delta/fn.open_table.htmlmake a record-batch-reader with delta-rs: https://docs.rs/deltalake/0.11.0/deltalake/table_state/struct.DeltaTableState.html#method.add_actions_tableimport from a record batch reader to r-polars via arrow2-rs... |
Hi @wjones127 can I ask, do you think it is realistic to make a minimal data-lake reader for r-polars via delta-rs rust-api and arrow2 ? Or is there some filesystem magic from python which is also needed? |
I don't think filesystems are a blocker there; you can use the object stores that come with delta-rs. But, especially if you are using arrow2, there's no ready-to-use scan function in delta-rs that you could plug into, so there's quite a bit of code you would have to read. Currently in the python package, delta-rs provides the file list and their statistics, and then the Python package provides the actual file scanners through PyArrow. Eventually, we'll have the scanner available in delta-rs and then it will be a lot easier to implement the R package, but that will take time. |
@sorhawell @wjones127 myself and @Ploppz might have some capacity to investigate/contribute but will need some pointers/guidance would it be helpful to connect over Discord? |
@dseynaev sure :) what discord channel do you prefer? it could be the r-polars subchannel of polars discord One stepping stone would be an interface for r-arrow dataset, then r-polars must a make a scanner-adaptor to that. It will take a week or two for me to write I think, but very parallel to the py-polars/py-arrow interface. Then would be to good reasons to go ahead with #165 |
Waiting for pola-rs/polars#17244 |
polars seems to support it but it's implemented on the python side: https://pola-rs.github.io/polars/py-polars/html/reference/api/polars.read_delta.html
the underlying delta lake interface lib is written in Rust though: https://docs.rs/deltalake/latest/deltalake/
The text was updated successfully, but these errors were encountered: