Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ XbatcherSlicerIterDataPipe for slicing xarray.DataArray #22

Merged
merged 4 commits into from
Jul 5, 2022

Conversation

weiji14
Copy link
Owner

@weiji14 weiji14 commented Jul 4, 2022

An iterable-style DataPipe for creating chips from xarray.DataArray objects! The windowed slicing is done using xbatcher.

Note that xbatcher is only used here to create slices of data (or chips/tiles), the actual creation of mini-batches (lumping together several slices/chips/tiles of images) will be handled by https://pytorch.org/data/0.4.0/generated/torchdata.datapipes.iter.Batcher.html

Preview at https://zen3geo--22.org.readthedocs.build/en/19/api.html#module-zen3geo.datapipes.xbatcher

Note that since xbatcher is made an optional dependency, users would need to do pip install zen3geo[raster] to install the extra 'raster' packages that includes xbatcher (and possibly more in the future).

TODO:

  • Add xbatcher as optional dependency
  • Initial implementation of XbatcherSlicerIterDataPipe
  • Update CI build matrix to include optional xbatcher dependency in full tests run
  • Add unit test for slicing xarray.Dataset objects
  • Think about refactoring RioXarrayReader and PyogrioReader to not return tuples of (filename, dataobj) (maybe do in separate PR, edit: at ♻️ Let RioXarrayReader return dataarray only instead of tuple #24)

Batch generation from xarray datasets!

Currently using commit xarray-contrib/xbatcher@d3e1c2f
An interable-style DataPipe for creating chips from xarray.DataArray objects! Uses xbatcher to do the windowed slicing. Included a doctest and a unit test, added a new section in the API docs and an intersphinx mapping for xbatcher.
@weiji14 weiji14 added the feature New feature or request label Jul 4, 2022
@weiji14 weiji14 added this to the 0.2.0 milestone Jul 4, 2022
@weiji14 weiji14 self-assigned this Jul 4, 2022
New extras group that includes `xbatcher`. Documented in CONTRIBUTING.md and updated GitHub Actions CI so that the full unit tests runs with `xbatcher` too.
Making sure that the climate science folks with their n-dimensional datasets can use XbatcherSlicer properly too!
@weiji14 weiji14 marked this pull request as ready for review July 5, 2022 00:44
Comment on lines +103 to +104
# def __len__(self) -> int:
# return len(self.source_datapipe)
Copy link
Owner Author

@weiji14 weiji14 Jul 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to future self, figure out whether there's a way to get the length of the datapipe lazily via https://github.com/pangeo-data/xbatcher/blob/d3e1c2f75dd0eea4e699b3398ba4d8bc1035a5e5/xbatcher/generators.py#L138-L139

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in #75!

@weiji14 weiji14 merged commit ef3e763 into main Jul 5, 2022
@weiji14 weiji14 deleted the xbatcher branch July 5, 2022 00:51
weiji14 added a commit that referenced this pull request Aug 14, 2022
Left out on adding this if-statement in #22. So here's the patch!
weiji14 added a commit that referenced this pull request Aug 14, 2022
Ensure that a helpful ModuleNotFoundError is raised when attempting to use XbatcherSlicer without xbatcher being installed.

* 🩹 Raise ModuleNotFoundError when xbatcher not installed

Left out on adding this if-statement in #22. So here's the patch!

* ✏️ Fix type hints for xbatcher and datashader source_datapipe

Need to use the full name xarray.DataArray instead of xr.DataArray to have the intersphinx link work.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant