-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Polygon subsetting #285
Polygon subsetting #285
Conversation
Pull Request Test Coverage Report for Build 1354
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a fan of the decorator switching shape to geometry. I think it obfuscate what's going on.
Add a small test.
Add a notebook example.
FYI, this is how ESMValTool does it: https://github.com/ESMValGroup/ESMValCore/blob/92ba9538b51eec92db578c3cb92eb343d9a93e24/esmvalcore/preprocessor/_area.py#L313 How does it compare to your approach ? |
@huard No comparison. Much of the work I do is to ensure that the data is well understoud by rioxarray. I'm operating under the assumption that that extension is optimized for cutting data. From my interactions with the maintainer of rioxarray, they seem keen to see contributors suggest changes. I can possibly contribute some of this handling to that project to help simplify the logic here. One gain I have in my approach is that it handles multi-dimensional data fairly well while the approach you linked, from what I can tell only handles 2-dimensional data. |
FYI Failed tests for subset.py seem to be due to the release of cftime>=1.0.4. released 10/21/2019. https://github.com/Unidata/cftime XARRAY devs seems to be on it as the build versus their master seem to pass. |
@Zeitsperre I'm simply wary of
I'm not against it, but it should be the result of an exercise of pros vs cons, especially if you feel we'll need to invest time in upstream development. On the minus side, I see for example that it breaks on Python3.5, which I think we're still running internally, right ? On the plus side, we could replace our subset_bbox by the rioxarray implementation, and I see other valuable features that we could exploit. I'd be surprised ESMValTool's subset only applied to 2D fields. What makes you think that ? |
@huard I'm with you on this. Fairly disappointed about the Python3.5 non-support. PyPI reported that it was 3.5-compatible so I figured the failures were due to my code. This may be a bust for the time being. Regrettably, I can't port this to miranda either as I'm trying to keep it Python3.5-compatible as well. It might be best to put a pin in this until a later time (or make a few functions exclusive to 3.6+, an idea I really don't like). I was under that impression ESMValTools doesn't support it based on the logic of some of the checks. I should give it another look. The other method out there is to leverage GeoPandas and Shapely to make a virtual grid that's used to perform basic map algebra. It's currently not clean but I can try that approach next. Too bad. |
xarray is dropping support for 3.5, so we'll have to drop it too. Basically, I'd like you to take stock of what our options are, and suggest a solution that sounds solid from a development and maintenance point of view. |
@huard As I mentioned briefly in person to @tlogan2000 and yourself, here's a game plan:
I should be able to start on this later today or tomorrow. |
…ygon_subsetting
@huard Most of it seems to be working. There's a persistent error on the Python3.8 build that makes me think that rasterio and xarray are not getting along. It seems to be an HDF5 file access error. Have you run into this in the past? |
Do we still need rasterio for this PR? |
What's the error, exactly? I haven't tested your new version of the polygon subsetting, but last week I kept having the following error when using it alongside Googling this error points towards an incompatibility with pandas (or possibly geopandas) and hdf5: pandas-dev/pandas#24839 and https://stackoverflow.com/questions/54210073/pd-read-hdf-throws-cannot-set-writable-flag-to-true-of-this-array |
Okay, I want to say it's 98% there. We just need to figure out how to install libspatialindex-dev on the Windows build and I want to say we can push this. @huard What say you? |
@huard Will merge this as soon as Travis CI clears. Windows build is still not working, but that's not due to the library, but the trickiness of its dependencies. Will push a new version with that and then we can bump the xarray version and all the dependencies. Work for you? |
Pull Request Checklist:
After merging to master and before closing the branch:
master
branchWhat kind of change does this PR introduce?
This adds the
subset.subset_shape()
function which allows polygon subsetting using a GeoJSON, Shapefile, single-layer GeoPackage, or any otherfiona
readable formats. The function can be called as follows:ds_subset = xclim.subset.subset_shape(ds, shape=geo_file)
Does this PR introduce a breaking change?
No*. Because of the need to project NetCDF files in order for them to be understoud by tools like rasterio, netCDFs that typically don't have CRS information need to be converted via
rioxarray
. This requires users to have loaded their NetCDF with the rioxarray library installed. Examples can be found in the docstring for the function.Other information:
rioxarray
opens up new opportunities for features, such as writing DataArrays directly to GeoTIFF format and other GIS functions. This demands more investigation.