-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(geospatial): use geoarrow extension types when returning geometry columns as pyarrow #9549
feat(geospatial): use geoarrow extension types when returning geometry columns as pyarrow #9549
Conversation
ACTION NEEDED Ibis follows the Conventional Commits specification for release automation. The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification. |
This looks awesome! Along with #9143, with this PR you should be able to pass an |
ibis/formats/pyarrow.py
Outdated
# Warn for dropped CRS? Or geoarrow.types would need a lookup table | ||
# for srid -> PROJJSON | ||
crs = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is pyproj a dependency? Is it safe to assume that an srid refers to an EPSG code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it comes along, when we install pip install 'ibis-framework[duckdb, geospatial]'
.
Installing collected packages: pytz, tzdata, typing-extensions, toolz, sqlglot, six, pygments,
pyarrow-hotfix, parsy, packaging, numpy, mdurl, duckdb, click, certifi, bidict, attrs, atpublic,
shapely, python-dateutil, pyproj, pyarrow, markdown-it-py, cligj, click-plugins, rich, pandas,
fiona, ibis-framework, geopandas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
Is it safe to assume that an srid refers to an EPSG code?
It depends where it comes from...a user typing it probably meant EPSG code, but otherwise it would be backend-specific (e.g., queryable using https://postgis.net/docs/using_postgis_dbmanagement.html#spatial_ref_sys_table ). It is usually an EPSG code in most practical use-cases (but could be a user-defined ID).
I'll workshop this a little and see what we can do!
@paleolimbot Let us know when you're ready for a review! |
3ba384d
to
7d79072
Compare
I think it's ready!
I think that would be better in the long run and would perhaps remove the pieces here where geo-specific details creep in. I think for this PR keeping the srid is fine and maybe less disruptive.
I've made this assumption for now. I think user-defined CRSes generally have very large srids and an authority/code like |
cbda02e
to
3749b22
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent!
@@ -64,6 +64,8 @@ db-dtypes = { version = ">=0.3,<2", optional = true } | |||
deltalake = { version = ">=0.9.0,<1", optional = true } | |||
duckdb = { version = ">=0.8.1,<2", optional = true } | |||
geopandas = { version = ">=0.6,<2", optional = true } | |||
geoarrow-types = { version = ">=0.2,<1", optional = true } | |||
pyproj = { version = ">=3.3.0,<4", optional = true } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be sure, we are making pyproj
an explicit dependency. I know it comes when installing geopandas, which is already a dependency. I guess explicit better than implicit in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's generally a bad idea to assume that a package will forever and always include a dependency that you require. If there's a top-level import for a package, it needs to be included as an explicit dependency.
Description of changes
Currently, GeoSpatial types are returned to pyarrow as
pa.binary()
, which drops the SRID and geometry/geography-ness of the column. Returning these columns marked as a GeoSpatial type allows other tools (e.g., lonboard for visualization) to pick this up.I haven't kicked the tires on this at all yet and I almost certainly haven't thought through the implications here!