-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove pyarrow required dependency #64
Conversation
@nmandery can you allow CI on this PR, or, preferably, just turn off CI approval in the settings? |
It should now work without manual approval. |
Ok great now the rust side compiles, and we just have to fix up the python side |
The dependency updates were:
🙃 |
@nmandery I'll need some feedback in terms of how open you are to breaking changes. In particular, you have separate APIs for Now that Polars supports the PyCapsule Interface, there's simple zero-copy conversion between the two, so you can pass any Arrow array-like or chunked array-like output into the In my opinion this means that we should return Arrow output and let the user decide which library they want to use it with. We should document how easy it is to use in this way with polars/pyarrow/pandas etc, but only provide a single API for ease of maintenance. |
Feels free to break whatever you like. The whole concept of subpackages for the different dataframe libraries was born to provide a better integration. I suppose the PyCapsule-interface makes a lot of this unnecessary, while achieving a far better integration. Additionally I feel these different subpackages are not too easy to maintain. The only thing I would like to keep is the polars extensions to the I also would prefer sticking with arrow output. +1 |
👍 I definitely agree there |
CI failed on commit
which is a super annoying compiler bug (geoarrow/geoarrow-rs#716). Updating to the latest git of geoarrow-rs fixed compilation on my computer, even though I don't know exactly what changed to fix that bug. 🤷 |
@nmandery is it ok with you if the raster -specific code still has a pyarrow dependency for now? It would be nice to come back to that in a follow up so that we can unblock this and minimize the size of the PR |
How do you want to handle these polars and pandas tests? Should we keep them and update them to use the current arrow-based APIs? |
Lets keep the pyarrow-dependency in the raster-part for now. It will be easier to port this later on once the main work of this PR is done. The pandas and polars-specific tests can be removed. I only distributed the tests over the different dataframe libraries to have some test-coverage for these APIs as well. I would like to keep the polars extension tests, though. |
At least locally the tests are passing now. Hopefully CI will be green as well |
Great. I will look through your changes |
Thanks for this - I will try next to get around to fix the docs etc so we can release this. I suppose you are waiting to integrate this into lonboard. |
Actually @zacharydez is working on a project that I think is using h3ronpy in AWS Lambda, and this would benefit them because pyarrow is too big to put in Lambda (along with other geospatial dependencies). I would like to integrate this into Lonboard in some way as well, but that's not urgent. |
Just putting up some scratch work for discussion re #55
Change list
CellArray
,DirectedEdgeArray
, etc (names can be changed) that represent arrow arrays of h3 arrays.CellArray
intoarro3.core.Array
orpyarrow.Array
orpolars.Series
.