Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ DatashaderRasterizer for burning vector shapes to xarray grids #35

Merged
merged 7 commits into from
Aug 14, 2022

Conversation

weiji14
Copy link
Owner

@weiji14 weiji14 commented Aug 13, 2022

An iterable-style DataPipe for turning vector geometries into raster grids! Uses datashader to do the rasterization.

Preview at https://zen3geo--35.org.readthedocs.build/en/35/api.html#zen3geo.datapipes.DatashaderRasterizer

Part 2 out of 2 of superseding #32. Recall that 1st step (#34) was to define a canvas, and 2nd step (this PR) is to burn the vector (points/lines/polygons) onto that canvas via some aggregation function.

TODO:

  • Initial implementation of DatashaderRasterizerIterDataPipe
  • Add unit tests that test when input canvas and/or vector geometries don't have a proper .crs
  • Handle rasterizing lines and polygons
  • Document the default aggregration function for points/lines/polygons
  • Refactor unit tests to use common fixtures and geometries

Won't do:

  • Support crs override? No, because
    • Typically, reprojecting the vector geometry (higher resolution) to the raster canvas (lower resolution) is the proper way.
    • If users do want a different projection system, it's better to let handle the raster reprojection logic at an earlier stage (i.e. before XarrayCanvasIterDataPipe or this DatashaderRasterizer)
    • Want to have less nested if-then statements in this already convoluted DatashaderRasterizer code. See 'Zen of Python'.

An iterable-style DataPipe for turning vector geometries into raster grids! Uses datashader to do the rasterization. Included a doctest for rasterizing geopandas.GeoDataFrame to xarray.DataArray. Added a new section in the API docs too. Also made a small change to XarrayCanvasIterDataPipe so that the datashader.Canvas being yielded has a crs attribute containing the original xarray object's coordinate reference system!
@weiji14 weiji14 added the feature New feature or request label Aug 13, 2022
@weiji14 weiji14 added this to the 0.3.0 milestone Aug 13, 2022
@weiji14 weiji14 self-assigned this Aug 13, 2022
Improved traceback error and added a unit test for when GeometryCollection vector types (i.e. those with an assortment of point, line or polygon types) are passed in to DatashaderRasterizer. Limitation is on spatialpandas really, and hence datashader. Also fallback having datashader.Canvas's CRS to None to prevent AttributeError, though that might cause some issues when the vector has a CRS but the canvas doesn't.
Decided that coordinate reference systems are a must now, for both the datashader.Canvas and geopandas.GeoDataFrame inputs, because geospatial context matters. Added a unit test to ensure these checks work.
Enable rasterization of line and polygon inputs too! Pretty much just two more elif statements. However, because rasterizing line and polygons using datashader results in boolean type xarray.DataArray outputs that can't be reprojected by rioxarray, had to cast them to uint8. Added parametrized unit tests that ensures the three vector input types work.
Improving the DatashaderRasterizer docstring so that people know what is happening. Mention that the default aggregation is 'count' for points, and 'any' for lines and polygons. Document AttributeError that is raised when either the canvas or vector input is missing a `.crs` attribute, and ValueError raised when vector geometry type is not supported. Also added an intersphinx link for shapely.
Tidy up the three test_datashader_rasterizer_* unit tests that were using unique datashader.Canvas objects with different widths/heights (because they were written somewhat independently). Using pytest fixtures to do so. Split the single missing_crs tests into two to make it more unit-like. For the vector geometries, there has also been some swapping of GeoDataFrame vs GeoSeries for different tests. Might still be a bit hard to follow but will suffice for now.
@weiji14 weiji14 marked this pull request as ready for review August 14, 2022 17:12
Ensure that the output dataarray's coordinate reference system and affine transform is correct in the doctest (which is like a mini-integration test).
@weiji14 weiji14 merged commit 675ad67 into main Aug 14, 2022
@weiji14 weiji14 deleted the datashader/rasterize branch August 14, 2022 17:30
weiji14 added a commit that referenced this pull request Sep 7, 2022
Just a random collection of mostly documentation-related patches. Patches type-hints in #52, isort imports in #35, mention functional name of IterDataPipe in walkthroughs #8 and #20, and remove mention of returned tuple to patch #33.

* 🏷️ Add specific type hints for mask_datapipe in geopandas.py

Should be either an xarray.DataArray or xarray.Dataset.

* 🚨 Sort spatialpandas imports in datashader.py

Ran isort to sort spatialpandas.geometry imports alphabetically. Also intersphinx linked the `.crs` attribute to geopandas.GeoDataFrame.crs.

* 💬 Mention functional name of IterDataPipe in walkthroughs

So people don't get confused on why the class-form like `Collator` is mentioned but `.collate` was used instead.

* 📝 Remove mention of tuple being returned in test_pyogrio_reader

Forgot to edit the unit test's docstring. Patches #33.

* 🍻 It's GeoPackage and GeoDataFrame, not GeoTIFF and DataArray

Need to be more careful when copying and pasting stuff.
weiji14 added a commit that referenced this pull request May 30, 2023
Probably wanted to preserve all the columns when converting from geopandas.GeoDataFrame to spatialpandas.GeoDataFrame, but it doesn't work sometimes when the vector is wrapped by StreamWrapper. Decided to pass the vector.geometry GeoSeries as input instead (alternative was to do a view like vector.loc[:]). Partially reverts 6805418 in #35.

Wanted to add a unit test, but it was hard to get a minimal reproducible example. Only know that this helps with a complicated data pipeline reading vector GeoJSON data from a HTTP request.
weiji14 added a commit that referenced this pull request May 30, 2023
…104)

* 🥅 Catch specific ValueError on conversion to spatialpandas

On converting a vector geometry in a geopandas.GeoDataFrame (which could be wrapped in StreamWrapper) to a spatialpandas.GeoDataFrame, there could be several different types of `ValueError`s raised. This modifies the exception raising to target only the one specific ValueError caused by invalid geometry type. See logic at https://github.com/holoviz/spatialpandas/blame/v0.4.8/spatialpandas/geometry/base.py#L805-L849 for how the original ValueError is raised. Also clarified that MultiPoint, MultiLineString and MultiPolygon geometry types are supported.

* 🐛 Convert just the geometry column to spatialpandas.GeoDataFrame

Probably wanted to preserve all the columns when converting from geopandas.GeoDataFrame to spatialpandas.GeoDataFrame, but it doesn't work sometimes when the vector is wrapped by StreamWrapper. Decided to pass the vector.geometry GeoSeries as input instead (alternative was to do a view like vector.loc[:]). Partially reverts 6805418 in #35.

Wanted to add a unit test, but it was hard to get a minimal reproducible example. Only know that this helps with a complicated data pipeline reading vector GeoJSON data from a HTTP request.

* ✅ Add test for empty vector raising proper ValueError

Test to ensure that the ValueError raised when an invalid geopandas.GeoDataFrame is passed into DatashaderRasterizer is not about unsupported geometry type, but something else instead. Not exactly a perfect regression test for #104, but it does help with code coverage.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant