Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ StackSTACStackerIterDataPipe for stacking STAC items #61

Merged
merged 5 commits into from
Sep 19, 2022

Conversation

weiji14
Copy link
Owner

@weiji14 weiji14 commented Sep 18, 2022

An iterable-style DataPipe to stack STAC items! Uses stackstac to do the reprojection and stacking along time.

Preview at https://zen3geo--61.org.readthedocs.build/en/61/api.html#zen3geo.datapipes.StackSTACStacker

Usage:

import pystac
import stackstac

from torchdata.datapipes.iter import IterableWrapper
from zen3geo.datapipes import StackSTACStacker

# Stack different bands in a STAC Item using DataPipe
item_url: str = "https://planetarycomputer.microsoft.com/api/stac/v1/collections/sentinel-1-grd/items/S1A_IW_GRDH_1SDV_20220914T093226_20220914T093252_044999_056053"
stac_item = pystac.Item.from_file(href=item_url)
dp = IterableWrapper(iterable=[stac_item])
dp_stackstac = dp.stack_stac_items(assets=["vh", "vv"], epsg=32652, resolution=10)

# Loop or iterate over the DataPipe stream
it = iter(dp_stackstac)
dataarray = next(it)
print(dataarray.sizes)
# Frozen({'time': 1, 'band': 2, 'y': 20686, 'x': 28043})

print(dataarray.coords)
# Coordinates:
#   * time                                   (time) datetime64[ns] 2022-09-14T0...
#     id                                     (time) <U62 'S1A_IW_GRDH_1SDV_2022...
#   * band                                   (band) <U2 'vh' 'vv'
#   * x                                      (x) float64 1.354e+05 ... 4.158e+05
#   * y                                      (y) float64 4.305e+06 ... 4.098e+06
# ...

print(dataarray.attrs["spec"])
# RasterSpec(epsg=32652, bounds=(135370, 4098080, 415800, 4304940), resolutions_xy=(10, 10))

TODO:

  • Initial implementation with a doctest
  • Add unit test
  • Handle different values of epsg (TODO in a follow-up PR)

Consider handling different values of resolution/bounds/bounds_latlon in the future. Limitation is that torchdata.datapipes.iter.ZipperLongest's fill_value only allows for one fill_value for all IterDataPipes, rather than individual fill_values for each IterDataPipe as is needed here. OR, have a parametrized IterDataPipe that holds a dictionary instead of a single variable?

Part of #48

References:

Turn a STAC catalog into a dask-based xarray!
An iterable-style DataPipe to stack STAC items! Uses stackstac to do the reprojection and stacking along time. Included a doctest that stacks the VH and VV channels of a Sentinel-1 image into an xarray.DataArray. Added a new section in the API docs and intersphinx mappings for dask and stackstac.
@weiji14 weiji14 added the feature New feature or request label Sep 18, 2022
@weiji14 weiji14 added this to the 0.5.0 milestone Sep 18, 2022
@weiji14 weiji14 self-assigned this Sep 18, 2022
Ensure that STAC Item asset bands can be stacked together into an xarray.DataArray object. Also updated the ci-tests.yml workflow to include testing of datapipes under the `stac` extras.
Hint that the DataPipe yields xarray.DataArray objects from stackstac.stack().
@weiji14 weiji14 marked this pull request as ready for review September 19, 2022 19:34
Just a small nitpick to make all the intersphinx URLs consistent.
@weiji14 weiji14 merged commit 3860f9a into main Sep 19, 2022
@weiji14 weiji14 deleted the stackstac/stacker branch September 19, 2022 20:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant