Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ PySTACAPISearchIterDataPipe to query dynamic STAC Catalogs #59

Merged
merged 8 commits into from
Sep 17, 2022

Conversation

weiji14
Copy link
Owner

@weiji14 weiji14 commented Sep 16, 2022

An iterable-style DataPipe to make STAC API queries to dynamic STAC Catalogs! Uses pystac-client to perform the STAC API Item Search on the /search endpoint.

Preview at https://zen3geo--59.org.readthedocs.build/en/59/api.html#zen3geo.datapipes.PySTACAPISearch

Usage (requires planetary-computer>=0.4.7 and pystac-client>=0.5.0):

import planetary_computer
from torchdata.datapipes.iter import IterableWrapper
from zen3geo.datapipes import PySTACAPISearch

# Peform STAC API query using DataPipe
query = dict(
    bbox=[174.5, -41.37, 174.9, -41.19],
    datetime=["2012-02-20T00:00:00Z", "2022-12-22T00:00:00Z"],
    collections=["cop-dem-glo-30"],
)
dp = IterableWrapper(iterable=[query])
dp_pystac_client = dp.search_for_pystac_item(
    catalog_url="https://planetarycomputer.microsoft.com/api/stac/v1",
    modifier=planetary_computer.sign_inplace,
)
# Loop or iterate over the DataPipe stream
it = iter(dp_pystac_client)
stac_item_search = next(it)
stac_items = list(stac_item_search.items())

print(stac_items)
# [<Item id=Copernicus_DSM_COG_10_S42_00_E174_00_DEM>]

print(stac_items[0].properties)
# {'gsd': 30,
#  'datetime': '2021-04-22T00:00:00Z',
#  'platform': 'TanDEM-X',
#  'proj:epsg': 4326,
#  'proj:shape': [3600, 3600],
#  'proj:transform': [0.0002777777777777778,
#   0.0,
#   173.9998611111111,
#   0.0,
#   -0.0002777777777777778,
#   -40.99986111111111]}

TODO:

Part of #48.

References:

Python client for STAC Catalogs and APIs!
An iterable-style DataPipe to make STAC API queries! Uses pystac-client to perform the STAC API Item Search. Included a doctest that does a STAC API search which returns an iterable list of STAC item objects. Added a new section in the API docs and an intersphinx mapping.
@weiji14 weiji14 added the feature New feature or request label Sep 16, 2022
@weiji14 weiji14 added this to the 0.5.0 milestone Sep 16, 2022
@weiji14 weiji14 self-assigned this Sep 16, 2022
Ensure that a spatiotemporal query can be made to a STAC API /search/ endpoint to produce an instance of a pystac_client.ItemSearch object containing pointers to multiple pystac.Item objects. Also removed a for-loop that was accidentally added after a yield statement.
Decided that it is cleaner to pass in all the STAC API query parameters via the source_datapipe, rather than through both the source_datapipe and kwargs. The kwargs can then be kept for keyword arguments into pystac.Client.open. Documented some example parameters for pystac_client.Client.search and pystac_client.Client.open. Updated the unit test and doctest, adding a teaser way to sign planetary_computer assets inplace too!
>>> dp = IterableWrapper(iterable=[query])
>>> dp_pystac_client = dp.search_for_pystac_item(
... catalog_url="https://planetarycomputer.microsoft.com/api/stac/v1",
... # modifier=planetary_computer.sign_inplace,
Copy link
Owner Author

@weiji14 weiji14 Sep 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented out this modifier parameter (see also stac-utils/pystac-client#259) because

  1. It's a (cool) new feature done in Add a post-init modifier keyword stac-utils/pystac-client#286, released in pystac-client=0.5.0, but we don't want to pin to a particular pystac-client version.
  2. It requires the planetary-computer library to be installed, specifically version 0.4.7+ which includes Feature/sign inplace microsoft/planetary-computer-sdk-for-python#42

So leaving this as an easter egg 🥚 for now, but will include it in the next walkthrough tutorial.

Ensure that parameters such as authentication keys can be passed to pystac_client.Client.open. Using a fake API key for Radiant MLHub, which can only take us so far as to getting a title and description.
Fix Continuous Integration failure because jsonschema had to be installed but it is an optional dependency.
Move pystac and pystac-client to a dedicated `stac` extras section for spatiotemporal dependencies! Also updated index page with instructions on installing these STAC dependencies.
@weiji14 weiji14 marked this pull request as ready for review September 17, 2022 14:39
@weiji14 weiji14 changed the title ✨ PySTACAPISearchIterDataPipe for searching STAC Catalogs ✨ PySTACAPISearchIterDataPipe to query dynamic STAC Catalogs Sep 17, 2022
@weiji14 weiji14 merged commit c816914 into main Sep 17, 2022
@weiji14 weiji14 deleted the pystac_client/item_search branch September 17, 2022 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant