Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove pyarrow required dependency #64

Merged
merged 30 commits into from
Nov 15, 2024

Conversation

kylebarron
Copy link
Contributor

Just putting up some scratch work for discussion re #55

Change list

  • Export classes like CellArray, DirectedEdgeArray, etc (names can be changed) that represent arrow arrays of h3 arrays.
    • This means that in theory we don't need to require pyarrow or arro3 or polars, but rather the user can use whatever python Arrow package they want.
  • Implement the PyCapsule Interface directly on these arrays, so you can pass a CellArray into arro3.core.Array or pyarrow.Array or polars.Series.

@kylebarron
Copy link
Contributor Author

@nmandery can you allow CI on this PR, or, preferably, just turn off CI approval in the settings?
image

@nmandery
Copy link
Owner

nmandery commented Oct 9, 2024

It should now work without manual approval.

h3ronpy/Cargo.toml Outdated Show resolved Hide resolved
@kylebarron
Copy link
Contributor Author

Ok great now the rust side compiles, and we just have to fix up the python side

@kylebarron
Copy link
Contributor Author

The dependency updates were:

  • CI was failing because it was trying to install the wrong pip wheel. I figured better to do dependency updates than change how CI is laid out.
  • To restore abi3 wheels, I had to update to pyo3-arrow 0.5.1, which required numpy 0.22, which required pyo3 0.22. That then required ndarray 0.16 which required a bump of geo-interface and rasterh3

🙃

@kylebarron
Copy link
Contributor Author

@nmandery I'll need some feedback in terms of how open you are to breaking changes. In particular, you have separate APIs for h3ronpy.polars, h3ronpy.pandas, and h3ronpy.arrow. This is really complex and hard to work through.

Now that Polars supports the PyCapsule Interface, there's simple zero-copy conversion between the two, so you can pass any Arrow array-like or chunked array-like output into the polars.Series constructor:
image

In my opinion this means that we should return Arrow output and let the user decide which library they want to use it with. We should document how easy it is to use in this way with polars/pyarrow/pandas etc, but only provide a single API for ease of maintenance.

@nmandery
Copy link
Owner

Feels free to break whatever you like. The whole concept of subpackages for the different dataframe libraries was born to provide a better integration. I suppose the PyCapsule-interface makes a lot of this unnecessary, while achieving a far better integration. Additionally I feel these different subpackages are not too easy to maintain. The only thing I would like to keep is the polars extensions to the Expr and Series namespaces.

I also would prefer sticking with arrow output. +1

@kylebarron
Copy link
Contributor Author

The only thing I would like to keep is the polars extensions to the Expr and Series namespaces.

👍 I definitely agree there

@kylebarron
Copy link
Contributor Author

kylebarron commented Nov 6, 2024

CI failed on commit 5132e01 (#64) with:

error[E0275]: overflow evaluating the requirement `<impl GeometryTrait<T = f64> as geo_traits::geometry::GeometryTrait>::GeometryCollection<'_>: geo_traits::geometry_collection::GeometryCollectionTrait`
    |
    = help: consider increasing the recursion limit by adding a `#![recursion_limit = "256"]` attribute to your crate (`geoarrow`)
note: required for `GeometryCollectionIterator<'_, f64, <... as GeometryCollectionTrait>::ItemType<'_>, ...>` to implement `Iterator`
   --> /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/geoarrow-0.4.0-beta.1/src/geo_traits/iterator.rs:39:15
    |
37  |                   ItemType: 'a + $item_trait<T = T>,
    |                                              ----- unsatisfied trait bound introduced here
38  |                   G: $self_trait<T = T, ItemType<'a> = ItemType>,
39  |               > Iterator for $struct_name<'a, T, ItemType, G>
    |                 ^^^^^^^^     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...
109 | / impl_iterator!(
110 | |     GeometryCollectionIterator,
111 | |     GeometryCollectionTrait,
112 | |     GeometryTrait,
113 | |     geometry_unchecked
114 | | );
    | |_- in this macro invocation
    = note: required for `GeometryCollectionIterator<'_, f64, <... as GeometryCollectionTrait>::ItemType<'_>, ...>` to implement `IntoIterator`
    = note: the full name for the type has been written to '/home/runner/work/h3ronpy/h3ronpy/target/x86_64-unknown-linux-gnu/release/deps/geoarrow-b79[580](https://github.com/nmandery/h3ronpy/actions/runs/11711779349/job/32621138289?pr=64#step:5:586)3b66daebd2.long-type-15103887714521775568.txt'
    = note: consider using `--verbose` to print the full type name to the console
    = note: the full name for the type has been written to '/home/runner/work/h3ronpy/h3ronpy/target/x86_64-unknown-linux-gnu/release/deps/geoarrow-b795803b66daebd2.long-type-15103887714521775568.txt'
    = note: consider using `--verbose` to print the full type name to the console
    = note: this error originates in the macro `impl_iterator` (in Nightly builds, run with -Z macro-backtrace for more info)
For more information about this error, try `rustc --explain E0275`.

which is a super annoying compiler bug (geoarrow/geoarrow-rs#716).

Updating to the latest git of geoarrow-rs fixed compilation on my computer, even though I don't know exactly what changed to fix that bug. 🤷

@kylebarron kylebarron marked this pull request as ready for review November 6, 2024 22:06
@kylebarron
Copy link
Contributor Author

kylebarron commented Nov 6, 2024

@nmandery is it ok with you if the raster -specific code still has a pyarrow dependency for now? It would be nice to come back to that in a follow up so that we can unblock this and minimize the size of the PR

@kylebarron kylebarron changed the title WIP: array classes Remove pyarrow required dependency Nov 6, 2024
@kylebarron
Copy link
Contributor Author

How do you want to handle these polars and pandas tests? Should we keep them and update them to use the current arrow-based APIs?

@nmandery
Copy link
Owner

nmandery commented Nov 7, 2024

Lets keep the pyarrow-dependency in the raster-part for now. It will be easier to port this later on once the main work of this PR is done.

The pandas and polars-specific tests can be removed. I only distributed the tests over the different dataframe libraries to have some test-coverage for these APIs as well. I would like to keep the polars extension tests, though.

@kylebarron
Copy link
Contributor Author

At least locally the tests are passing now. Hopefully CI will be green as well

@nmandery
Copy link
Owner

Great. I will look through your changes

h3ronpy/pyproject.toml Outdated Show resolved Hide resolved
h3ronpy/python/h3ronpy/polars/vector.py Outdated Show resolved Hide resolved
h3ronpy/tests/pandas/test_vector.py Show resolved Hide resolved
h3ronpy/python/h3ronpy/arrow/__init__.py Outdated Show resolved Hide resolved
h3ronpy/python/h3ronpy/arrow/__init__.py Show resolved Hide resolved
@nmandery nmandery merged commit 1876aa5 into nmandery:main Nov 15, 2024
8 checks passed
@nmandery
Copy link
Owner

Thanks for this - I will try next to get around to fix the docs etc so we can release this. I suppose you are waiting to integrate this into lonboard.

@kylebarron
Copy link
Contributor Author

Actually @zacharydez is working on a project that I think is using h3ronpy in AWS Lambda, and this would benefit them because pyarrow is too big to put in Lambda (along with other geospatial dependencies).

I would like to integrate this into Lonboard in some way as well, but that's not urgent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants