Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ PySTACItemReaderIterDataPipe for reading STAC Items #46

Merged
merged 6 commits into from
Sep 9, 2022

Conversation

weiji14
Copy link
Owner

@weiji14 weiji14 commented Sep 1, 2022

An iterable-style DataPipe for STAC items! Uses pystac for reading the files or URLs.

Preview at https://zen3geo--46.org.readthedocs.build/en/46/api.html#zen3geo.datapipes.PySTACItemReader

Usage

from torchdata.datapipes.iter import IterableWrapper
from zen3geo.datapipes import PySTACItemReader

# Read in STAC Item using DataPipe
item_url: str = "https://planetarycomputer.microsoft.com/api/stac/v1/collections/sentinel-2-l2a/items/S2A_MSIL2A_20220115T032101_R118_T48NUG_20220115T170435"
dp = IterableWrapper(iterable=[item_url])
dp_pystac = dp.read_to_pystac_item()

# Loop or iterate over the DataPipe stream
it = iter(dp_pystac)
stac_item = next(it)

print(stac_item.bbox)
# [103.20205689, 0.81602476, 104.18934086, 1.8096362]

print(stac_item.properties)
# {'datetime': '2022-01-15T03:21:01.024000Z',
#  'platform': 'Sentinel-2A',
#  'proj:epsg': 32648,
#  'instruments': ['msi'],
#  's2:mgrs_tile': '48NUG',
#  'constellation': 'Sentinel 2',
#  's2:granule_id': 'S2A_OPER_MSI_L2A_TL_ESRI_20220115T170436_A034292_T48NUG_N03.00',
#  'eo:cloud_cover': 17.352597,
#  's2:datatake_id': 'GS2A_20220115T032101_034292_N03.00',
#  's2:product_uri': 'S2A_MSIL2A_20220115T032101_N0300_R118_T48NUG_20220115T170435.SAFE',
#  's2:datastrip_id': 'S2A_OPER_MSI_L2A_DS_ESRI_20220115T170436_S20220115T033502_N03.00',
#  's2:product_type': 'S2MSI2A',
#  'sat:orbit_state': 'descending',
#  ...

This dp_pystac DataPipe could then be chained to RioxarrayReader to load the data into an xarray.DataArray, see walkthrough at https://zen3geo.readthedocs.io/en/v0.3.0/walkthrough.html#find-cloud-optimized-geotiffs.

TODO:

  • Initial implementation with a doctest that checks the metadata within the pystac.item.Item object.
  • Add unit tests
  • etc

Part of #48.

References:

An iterable-style DataPipe for STAC items! Uses pystac for reading the files or URLs. Included a doctest that checks the metadata within the pystac.item.Item object. Added a new section in the API docs and an intersphinx mapping.
@weiji14 weiji14 added the feature New feature or request label Sep 1, 2022
@weiji14 weiji14 added this to the 0.4.0 milestone Sep 1, 2022
@weiji14 weiji14 self-assigned this Sep 1, 2022
Ensure that zen3geo works even when `pystac` is not installed and add `pystac` to the spatial section of the extras dependencies in pyproject.toml.
Comment on lines 14 to 19
@functional_datapipe("read_from_pystac_item")
class PySTACItemReaderIterDataPipe(IterDataPipe):
"""
Takes files from local disk or URLs (as long as they can be read by pystac)
and yields :py:class:`pystac.item.Item` objects (functional name:
``read_from_pystac_item``).
Copy link
Owner Author

@weiji14 weiji14 Sep 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe read_to_pystac_item instead, since we're reading URLs/files to a pystac.Item? See also https://pytorch.org/data/0.4/tutorial.html#naming

Suggested change
@functional_datapipe("read_from_pystac_item")
class PySTACItemReaderIterDataPipe(IterDataPipe):
"""
Takes files from local disk or URLs (as long as they can be read by pystac)
and yields :py:class:`pystac.item.Item` objects (functional name:
``read_from_pystac_item``).
@functional_datapipe("read_to_pystac_item")
class PySTACItemReaderIterDataPipe(IterDataPipe):
"""
Takes files from local disk or URLs (as long as they can be read by pystac)
and yields :py:class:`pystac.item.Item` objects (functional name:
``read_to_pystac_item``).

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in d66f1f3

New poetry 1.2.0 resolver and other optional dependencies like adlfs and contextily!
Decided that since the returned object is a `pystac.Item`, it should probably be `read_to_pystac_item`.
Ensure that a JSON STAC item can be read into a pystac.Item object that contains various spatiotemporal metadata.
@weiji14 weiji14 marked this pull request as ready for review September 9, 2022 18:22
Use pytest.importorskip to skip running the doctest when pystac cannot be imported.
@weiji14 weiji14 merged commit da0728a into main Sep 9, 2022
@weiji14 weiji14 deleted the pystac/item_reader branch September 9, 2022 18:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant